Re: PigStorage and ElephantBird's JsonLoader - InputFormat

Dmitriy Ryaboy Wed, 15 Jun 2011 21:47:20 -0700

Yep, that's the problem. I will make it use the pigtextinputformat instead. Did 
the same thing for Lzo but not the uncompressed version.


On Jun 15, 2011, at 6:57 PM, Jonathan Holloway <[email protected]> 
wrote:

> Hi all,
> 
> I was wondering whether somebody could explain how Pig deals with nested
> directories of log files,
> Something like:
> 
> /logs/2011-01-01/a.log
> /logs/2011-01-01/b.log
> /logs/2011-01-01/c.log
> 
> I'm pretty sure if I give a Pig script the /logs directory as input it will
> successfully process all logs (a.log, b.log, c.log)
> within that structure.
> 
> However, I'm seeing a discrepancy with JsonLoader in elephant bird, because
> if I do the same thing then it errors with the following:
> 
> Backend error message
> ---------------------
> java.io.IOException: Cannot open filename /logs/2011-01-01
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488)
>        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376)
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
>        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
>        at
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
>        at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> Pig Stack Trace
> ---------------
> ERROR 2998: Unhandled internal error. Cannot open filename /logs/2011-01-01
> 
> java.io.IOException: Cannot open filename /logs/2011-01-01
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488)
>        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376)
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178)
>        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
>        at
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
>        at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> ================================================================================
> Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain],
> exit code [2]
> 
> I think it returns a TextInputFormat currently, where PigStorage can handle
> this because it returns a PigTextInputFormat
> which uses the MapRedUtil.getAllFileRecursively() workaround for
> MAPREDUCE-1577.
> 
> Can anybody confirm this is actually the case, and whether there's some sort
> of workaround for it?
> 
> I'm using Pig 0.8.0, Apache Hadoop 0.20.2 and Oozie 3.0.0
> 
> Many thanks in advance,
> Jon.

Re: PigStorage and ElephantBird's JsonLoader - InputFormat

Reply via email to