Re: How to read a file generated by Pig+BinStorage using the HDFS API ?

Vincent Barat Fri, 10 Jan 2014 05:24:35 -0800

Thanks for your help. I succeeded in reading my data. Here is the code:


    Path path = new Path("/mydata");
    BinStorageRecordReader recordReader = new BinStorageRecordReader();
    FileStatus fileStatus = fileSystem.getFileStatus(path);

recordReader.initialize(new FileSplit(path, 0,fileStatus.getLen(), null),new TaskAttemptContext(new Configuration(), newTaskAttemptID()));


    while (recordReader.nextKeyValue())
    {
      Tuple tuple = recordReader.getCurrentValue();
       ...
    }

Best regards,

Le 29/12/2013 03:22, Cheolsoo Park a écrit :

I haven't done it myself, so I can't give you a detailed answer. But every
storage is associated with Input/outputFormat as well as
RecordReader/Writer.

As for BinStorage, you can take a look at BinStorageRecordReader-
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/io/BinStorageRecordReader.java#L40


On Thu, Dec 26, 2013 at 3:35 AM, Vincent Barat <[email protected]>wrote:

Hi all and merry Christmas !

I generate a file using a Pig script embedded in a Java process and store
it using a BinStorage.

Then, I would like to read this file directly from another Java client,
but without starting a Pig script (i.e only by using Hadoop API and Pig's
BinStorage class).
The goal is to achieve some real-time computation by scanning the file in
realtime, and so I cannot offer to start a Pig script to do the
computation, as the time overhead to start the script and get the result is
too long for my realtime objectives (I need a result in a few seconds).

Of course, I could use a JsonStorage and read my file using a Json
deserializer, but my guess is it would be much slower, and also painful to
handle the various parts generated for the output file (part-r-XXXXX).

Best regards,

Re: How to read a file generated by Pig+BinStorage using the HDFS API ?

Reply via email to