Hi all and merry Christmas !

I generate a file using a Pig script embedded in a Java process and store it using a BinStorage.

Then, I would like to read this file directly from another Java client, but without starting a Pig script (i.e only by using Hadoop API and Pig's BinStorage class). The goal is to achieve some real-time computation by scanning the file in realtime, and so I cannot offer to start a Pig script to do the computation, as the time overhead to start the script and get the result is too long for my realtime objectives (I need a result in a few seconds).

Of course, I could use a JsonStorage and read my file using a Json deserializer, but my guess is it would be much slower, and also painful to handle the various parts generated for the output file (part-r-XXXXX).

Best regards,

Reply via email to