Thanks Dmitriy, I extended and overrode to return a PigInputTextFormat, all is fine for now as a workaround.
Cheers, Jon. On 16 June 2011 05:45, Dmitriy Ryaboy <[email protected]> wrote: > Yep, that's the problem. I will make it use the pigtextinputformat instead. > Did the same thing for Lzo but not the uncompressed version. > > On Jun 15, 2011, at 6:57 PM, Jonathan Holloway < > [email protected]> wrote: > > > Hi all, > > > > I was wondering whether somebody could explain how Pig deals with nested > > directories of log files, > > Something like: > > > > /logs/2011-01-01/a.log > > /logs/2011-01-01/b.log > > /logs/2011-01-01/c.log > > > > I'm pretty sure if I give a Pig script the /logs directory as input it > will > > successfully process all logs (a.log, b.log, c.log) > > within that structure. > > > > However, I'm seeing a discrepancy with JsonLoader in elephant bird, > because > > if I do the same thing then it errors with the following: > > > > Backend error message > > --------------------- > > java.io.IOException: Cannot open filename /logs/2011-01-01 > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488) > > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356) > > at > > > org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67) > > at > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176) > > at > > > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > > Pig Stack Trace > > --------------- > > ERROR 2998: Unhandled internal error. Cannot open filename > /logs/2011-01-01 > > > > java.io.IOException: Cannot open filename /logs/2011-01-01 > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488) > > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356) > > at > > > org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67) > > at > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176) > > at > > > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > > ================================================================================ > > Failing Oozie Launcher, Main class > [org.apache.oozie.action.hadoop.PigMain], > > exit code [2] > > > > I think it returns a TextInputFormat currently, where PigStorage can > handle > > this because it returns a PigTextInputFormat > > which uses the MapRedUtil.getAllFileRecursively() workaround for > > MAPREDUCE-1577. > > > > Can anybody confirm this is actually the case, and whether there's some > sort > > of workaround for it? > > > > I'm using Pig 0.8.0, Apache Hadoop 0.20.2 and Oozie 3.0.0 > > > > Many thanks in advance, > > Jon. >
