I'm streaming data in a pig script through an executable that returns an xml fragment for each line of input I stream to it. That xml fragment happens to span multiple lines and I have no control whatsoever over the output of the executable I stream to
In relation to Use Hadoop Pig to load data from text file w/ each record on multiple lines?<http://stackoverflow.com/questions/6726407/use-hadoop-pig-to-load-data-from-text-file-w-each-record-on-multiple-lines>, the answer was suggesting writing a custom record reader. The problem is, this works fine if you want to implement a LoadFunc that reads from a file, but to be able to use streaming, it has to implement StreamToPig. StreamToPig allows you to only read one line at a time as far as I understood Does anyone know how to handle such a situation? http://stackoverflow.com/questions/9910138/is-it-possible-to-use-pig-streaming-streamtopig-in-a-way-that-handles-multiple -- Best Regards, Ahmed Sobhi http://about.me/humanzz/bio
