Thanks for your help. I succeeded in reading my data. Here is the code:
Path path = new Path("/mydata");
BinStorageRecordReader recordReader = new BinStorageRecordReader();
FileStatus fileStatus = fileSystem.getFileStatus(path);
recordReader.initialize(new FileSplit(path, 0,
fileStatus.getLen(), null),
new TaskAttemptContext(new Configuration(), new
TaskAttemptID()));
while (recordReader.nextKeyValue())
{
Tuple tuple = recordReader.getCurrentValue();
...
}
Best regards,
Le 29/12/2013 03:22, Cheolsoo Park a écrit :
I haven't done it myself, so I can't give you a detailed answer. But every
storage is associated with Input/outputFormat as well as
RecordReader/Writer.
As for BinStorage, you can take a look at BinStorageRecordReader-
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/io/BinStorageRecordReader.java#L40
On Thu, Dec 26, 2013 at 3:35 AM, Vincent Barat <[email protected]>wrote:
Hi all and merry Christmas !
I generate a file using a Pig script embedded in a Java process and store
it using a BinStorage.
Then, I would like to read this file directly from another Java client,
but without starting a Pig script (i.e only by using Hadoop API and Pig's
BinStorage class).
The goal is to achieve some real-time computation by scanning the file in
realtime, and so I cannot offer to start a Pig script to do the
computation, as the time overhead to start the script and get the result is
too long for my realtime objectives (I need a result in a few seconds).
Of course, I could use a JsonStorage and read my file using a Json
deserializer, but my guess is it would be much slower, and also painful to
handle the various parts generated for the output file (part-r-XXXXX).
Best regards,