Difference in Semantics between Load statement in Pig and HDFS client on Command line -------------------------------------------------------------------------------------
Key: PIG-1576 URL: https://issues.apache.org/jira/browse/PIG-1576 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0, 0.6.0 Reporter: Viraj Bhat Here is my directory structure on HDFS which I want to access using Pig. This is a sample, but in real use case I have more than 100 of these directories. {code} $ hadoop fs -ls /user/viraj/recursive/ Found 3 items drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080615 drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080616 drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 /user/viraj/recursive/20080617 {code} Using the command line I am access them using variety of options: {code} $ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/ -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt $ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/ -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080615/kv2.txt -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080616/kv2.txt -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 /user/viraj/recursive/20080617/kv2.txt {code} I have written a Pig script, all the below combination of load statements do not work? {code} --A = load '/user/viraj/recursive/{200806}{15..17}/' using PigStorage('\u0001') as (k:int, v:chararray); A = load '/user/viraj/recursive/{20080615..20080617}/' using PigStorage('\u0001') as (k:int, v:chararray); AL = limit A 10; dump AL; {code} I get the following error in Pig 0.8 {noformat} 2010-08-27 16:34:27,704 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2010-08-27 16:34:27,711 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2 0.8.0-SNAPSHOT viraj 2010-08-27 16:34:24 2010-08-27 16:34:27 LIMIT Failed! Failed Jobs: JobId Alias Feature Message Outputs N/A A,AL Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: /user/viraj/recursive/{20080615..20080617}/ at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617} matches 0 files at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268) ... 7 more hdfs://localhost:9000/tmp/temp241388470/tmp987803889, {noformat} The following works: {code} A = load '/user/viraj/recursive/{200806}{15,16,17}/' using PigStorage('\u0001') as (k:int, v:chararray); AL = limit A 10; dump AL; {code} Why is there an inconsistency between HDFS client and Pig? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.