I'm streaming data in a pig script through an executable that returns an
xml fragment for each line of input I stream to it. That xml fragment
happens to span multiple lines and I have no control whatsoever over the
output of the executable I stream to

In relation to Use Hadoop Pig to load data from text file w/ each record on
multiple 
lines?<http://stackoverflow.com/questions/6726407/use-hadoop-pig-to-load-data-from-text-file-w-each-record-on-multiple-lines>,
the answer was suggesting writing a custom record reader. The problem is,
this works fine if you want to implement a LoadFunc that reads from a file,
but to be able to use streaming, it has to implement StreamToPig.
StreamToPig allows you to only read one line at a time as far as I
understood

Does anyone know how to handle such a situation?

http://stackoverflow.com/questions/9910138/is-it-possible-to-use-pig-streaming-streamtopig-in-a-way-that-handles-multiple

-- 
Best Regards,
Ahmed Sobhi
http://about.me/humanzz/bio

Reply via email to