Yep, that's the problem. I will make it use the pigtextinputformat instead. Did the same thing for Lzo but not the uncompressed version.
On Jun 15, 2011, at 6:57 PM, Jonathan Holloway <[email protected]> wrote: > Hi all, > > I was wondering whether somebody could explain how Pig deals with nested > directories of log files, > Something like: > > /logs/2011-01-01/a.log > /logs/2011-01-01/b.log > /logs/2011-01-01/c.log > > I'm pretty sure if I give a Pig script the /logs directory as input it will > successfully process all logs (a.log, b.log, c.log) > within that structure. > > However, I'm seeing a discrepancy with JsonLoader in elephant bird, because > if I do the same thing then it errors with the following: > > Backend error message > --------------------- > java.io.IOException: Cannot open filename /logs/2011-01-01 > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356) > at > org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > Pig Stack Trace > --------------- > ERROR 2998: Unhandled internal error. Cannot open filename /logs/2011-01-01 > > java.io.IOException: Cannot open filename /logs/2011-01-01 > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356) > at > org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > ================================================================================ > Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], > exit code [2] > > I think it returns a TextInputFormat currently, where PigStorage can handle > this because it returns a PigTextInputFormat > which uses the MapRedUtil.getAllFileRecursively() workaround for > MAPREDUCE-1577. > > Can anybody confirm this is actually the case, and whether there's some sort > of workaround for it? > > I'm using Pig 0.8.0, Apache Hadoop 0.20.2 and Oozie 3.0.0 > > Many thanks in advance, > Jon.
