Inconsistency with Hadoop in Pig load statements involving globs with 
subdirectories
------------------------------------------------------------------------------------

                 Key: PIG-569
                 URL: https://issues.apache.org/jira/browse/PIG-569
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
         Environment: FC Linux x86/64
            Reporter: Kevin Weil
             Fix For: types_branch


Pig cannot handle LOAD statements with Hadoop globs where the globs have 
subdirectories.  For example, 

A = LOAD 'dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ...

A similar statement in Hadoop, hadoop dfs -ls 
dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}, does work correctly.

The output of running the above load statement in pig, built from svn revision 
724576, is:

2008-12-17 12:02:28,480 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2008-12-17 12:02:28,480 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Map reduce job failed
2008-12-17 12:02:28,480 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- java.io.IOException: Unable to get collect for pattern 
dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}} [Failed to obtain glob for 
dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}]
        at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:231)
        at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40)
        at 
org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486)
        at 
org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455)
        at 
org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:108)
        at 
org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
        at 
org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
        at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
        at 
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.pig.backend.datastorage.DataStorageException: Failed to 
obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}
        ... 13 more
Caused by: java.io.IOException: Illegal file pattern: Expecting set closure 
character or end of range, or } for glob {dir1 at 5
        at 
org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1084)
        at 
org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1069)
        at 
org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:987)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:953)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:902)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:862)
        at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215)
        ... 12 more


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to