I've located my problem. It was a difference I believe with the classpath from 0.9.0 and 0.9.1. It might be somewhat machine dependent as a lot of these jars are probably found dynamically via the /bin/pig script which has changed quite a bit from 0.9.0. When debugging it looked like GenericOptionsParser was the culprit so maybe the classpath differences caused a different version of this class to get loaded.
Anyway the short of it is I have to escape the asterisk * character in my globbing pattern. === Lets do it with 0.9.0 === $ /usr/lib/pig-0.9.0/bin/pig -d INFO -p in_file='/chukwa/repos/Insight-Demo/' -p process_glob='20111226/*/*/*.evt' -p out_file='dashboard-daily-2011-12-26' -p in_file1='dashboard-daily-2011-12-26' -p out_file1='dashboard-daily-2011-12-26' -p current_date_num='20111226' -p timeperiod='1' ap.pig *(system.out.println()s added for effect)* 0.9.0 java.class.path = /etc/hbase:/usr/lib/pig-0.9.0/bin/../conf:/usr/java/default/lib/tools.jar:/usr/lib/pig-0.9.0/bin/../build/classes:/usr/lib/pig-0.9.0/bin/../build/test/classes:/usr/lib/pig-0.9.0/bin/../pig-0.9.0-core.jar:/usr/lib/pig-0.9.0/bin/../build/pig-0.9.1-SNAPSHOT.jar:/usr/lib/pig-0.9.0/bin/../lib/automaton.jar:/etc/hadoop/conf:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.4.9.jar Parameter found: in_file=/chukwa/repos/Insight-Demo/ Parameter found: process_glob=20111226/*/*/*.evt Parameter found: out_file=dashboard-daily-2011-12-26 Parameter found: in_file1=dashboard-daily-2011-12-26 Parameter found: out_file1=dashboard-daily-2011-12-26 Parameter found: current_date_num=20111226 Parameter found: timeperiod=1 === now with 0.9.1 === $ /usr/lib/pig-0.9.1/bin/pig -d INFO -p in_file='/chukwa/repos/Insight-Demo/' -p process_glob='20111226/*/*/*.evt' -p out_file='dashboard-daily-2011-12-26' -p in_file1='dashboard-daily-2011-12-26' -p out_file1='dashboard-daily-2011-12-26' -p current_date_num='20111226' -p timeperiod='1' ap.pig *(system.out.println()s added for effect)* 0.9.1 java.class.path = /usr/lib/hadoop-0.20/conf:/usr/java/default/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/cloudera-desktop-plugins-0.3.0.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-capacity-scheduler-0.20.2-cdh3u0-SNAPSHOT.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.9.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/etc/hbase:/usr/lib/pig-0.9.1/bin/../conf:/usr/java/default/lib/tools.jar:/etc/hadoop/conf:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.4.9.jar:/usr/lib/pig-0.9.1/bin/../lib/automaton.jar:/usr/lib/pig-0.9.1/bin/../lib/jython-2.5.0.jar:/usr/lib/pig-0.9.1/bin/../pig-withouthadoop.jar::/usr/local/hbase/hbase-0.90.4.jar:/usr/local/hbase/lib/zookeeper-3.3.2.jar:/usr/local/hbase/conf:/usr/local/hbase/hbase-0.90.4.jar:/usr/local/hbase/lib/zookeeper-3.3.2.jar:/usr/local/hbase/conf Parameter found: in_file=/chukwa/repos/Insight-Demo/ Parameter found: null Parameter found: out_file=dashboard-daily-2011-12-26 Parameter found: in_file1=dashboard-daily-2011-12-26 Parameter found: out_file1=dashboard-daily-2011-12-26 Parameter found: current_date_num=20111226 Parameter found: timeperiod=1 The 2nd parameter "process_glob" isn't parsed correctly and needs to be escaped now like this: /usr/lib/pig-0.9.1/bin/pig -d INFO -p in_file='/chukwa/repos/Insight-Demo/' *-p process_glob='20111226/\*/\*/\*.evt' *-p out_file='dashboard-daily-2011-12-26' -p in_file1='dashboard-daily-2011-12-26' -p out_file1='dashboard-daily-2011-12-26' -p current_date_num='20111226' -p timeperiod='1' ap.pig On Tue, Dec 27, 2011 at 6:15 PM, Aniket Mokashi <[email protected]> wrote: > I tried > pig --param "input=s3n://bucket_path/*/" test.pig > > It worked for me. I am on EMR Pig 0.9.1. > > Thanks, > Aniket > > On Tue, Dec 27, 2011 at 3:35 PM, Corbin Hoenes <[email protected]> wrote: > > > I am not sure Ayon doesn't have something here. I am seeing a similar > > problem with the 0.9.1 build of pig. But when I run with 0.9.0 it > doesn't > > have that problem. > > > > Did something with pattern substitution change from 0.9.0 --> 0.9.1? > > Haven't run it through a debugger yet but that is the next step tomorrow > > if someone doesn't know of some patch I'm missing? > > > > On Dec 15, 2011, at 12:25 PM, <[email protected]> < > > [email protected]> wrote: > > > > > If > > > -param input=s3n://foo/bar/baz/*/ blah.pig > > > is part of a command line, you'd have to add quotes: > > > -param 'input=s3n://foo/bar/baz/*/' blah.pig > > > to inhibit your shell from trying to interpret the *. > > > > > > > > > William F Dowling > > > Senior Technologist > > > Thomson Reuters > > > 0 +1 215 823 3853 > > > > > > > > > -----Original Message----- > > > From: Ayon Sinha [mailto:[email protected]] > > > Sent: Thursday, December 15, 2011 2:18 PM > > > To: Pig Mailinglist > > > Subject: Possible Pig 9.1 globing bug in parameter substitution > > > > > > when using -param input=s3n://foo/bar/baz/*/ blah.pig > > > it throws > > > > > > java.lang.NullPointerException > > > at > > > org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:79) > > > at org.apache.pig.Main.runParamPreprocessor(Main.java:710) > > > at org.apache.pig.Main.run(Main.java:517) > > > at org.apache.pig.Main.main(Main.java:108) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > > > It works when my load statement is changed from: > > > a = load '$input' using PigStorage(); > > > > > > to > > > > > > a = load 's3n://foo/bar/baz/*/' using PigStorage(); > > > > > > (I'm under a deadline so can't file a JIRA bug rightaway) > > > > > > -Ayon > > > See My Photos on Flickr > > > Also check out my Blog for answers to commonly asked questions. > > > > > > > -- > "...:::Aniket:::... Quetzalco@tl" >
