I haven't done it myself, so I can't give you a detailed answer. But every storage is associated with Input/outputFormat as well as RecordReader/Writer.
As for BinStorage, you can take a look at BinStorageRecordReader- https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/io/BinStorageRecordReader.java#L40 On Thu, Dec 26, 2013 at 3:35 AM, Vincent Barat <[email protected]>wrote: > Hi all and merry Christmas ! > > I generate a file using a Pig script embedded in a Java process and store > it using a BinStorage. > > Then, I would like to read this file directly from another Java client, > but without starting a Pig script (i.e only by using Hadoop API and Pig's > BinStorage class). > The goal is to achieve some real-time computation by scanning the file in > realtime, and so I cannot offer to start a Pig script to do the > computation, as the time overhead to start the script and get the result is > too long for my realtime objectives (I need a result in a few seconds). > > Of course, I could use a JsonStorage and read my file using a Json > deserializer, but my guess is it would be much slower, and also painful to > handle the various parts generated for the output file (part-r-XXXXX). > > Best regards, >
