Re: PigStorage and ElephantBird's JsonLoader - InputFormat

Jonathan Holloway Fri, 17 Jun 2011 03:03:17 -0700

Thanks Dmitriy, I extended and overrode to return a PigInputTextFormat, all
is fine for now as a workaround.


Cheers,
Jon.

On 16 June 2011 05:45, Dmitriy Ryaboy <[email protected]> wrote:

> Yep, that's the problem. I will make it use the pigtextinputformat instead.
> Did the same thing for Lzo but not the uncompressed version.
>
> On Jun 15, 2011, at 6:57 PM, Jonathan Holloway <
> [email protected]> wrote:
>
> > Hi all,
> >
> > I was wondering whether somebody could explain how Pig deals with nested
> > directories of log files,
> > Something like:
> >
> > /logs/2011-01-01/a.log
> > /logs/2011-01-01/b.log
> > /logs/2011-01-01/c.log
> >
> > I'm pretty sure if I give a Pig script the /logs directory as input it
> will
> > successfully process all logs (a.log, b.log, c.log)
> > within that structure.
> >
> > However, I'm seeing a discrepancy with JsonLoader in elephant bird,
> because
> > if I do the same thing then it errors with the following:
> >
> > Backend error message
> > ---------------------
> > java.io.IOException: Cannot open filename /logs/2011-01-01
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488)
> >        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376)
> >        at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
> >        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
> >        at
> >
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
> >        at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
> >        at
> >
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
> >        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> > Pig Stack Trace
> > ---------------
> > ERROR 2998: Unhandled internal error. Cannot open filename
> /logs/2011-01-01
> >
> > java.io.IOException: Cannot open filename /logs/2011-01-01
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
> >        at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488)
> >        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376)
> >        at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
> >        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
> >        at
> >
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
> >        at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
> >        at
> >
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
> >        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >
> ================================================================================
> > Failing Oozie Launcher, Main class
> [org.apache.oozie.action.hadoop.PigMain],
> > exit code [2]
> >
> > I think it returns a TextInputFormat currently, where PigStorage can
> handle
> > this because it returns a PigTextInputFormat
> > which uses the MapRedUtil.getAllFileRecursively() workaround for
> > MAPREDUCE-1577.
> >
> > Can anybody confirm this is actually the case, and whether there's some
> sort
> > of workaround for it?
> >
> > I'm using Pig 0.8.0, Apache Hadoop 0.20.2 and Oozie 3.0.0
> >
> > Many thanks in advance,
> > Jon.
>

Re: PigStorage and ElephantBird's JsonLoader - InputFormat

Reply via email to