I've located my problem.  It was a difference I believe with the classpath
from 0.9.0 and 0.9.1.  It might be somewhat machine dependent as a lot of
these jars are probably found dynamically via the /bin/pig script which has
changed quite a bit from 0.9.0.  When debugging it looked like
GenericOptionsParser was the culprit so maybe the classpath differences
caused a different version of this class to get loaded.

Anyway the short of it is I have to escape the asterisk * character in my
globbing pattern.

=== Lets do it with 0.9.0 ===

$ /usr/lib/pig-0.9.0/bin/pig -d INFO -p
in_file='/chukwa/repos/Insight-Demo/' -p process_glob='20111226/*/*/*.evt'
-p out_file='dashboard-daily-2011-12-26' -p
in_file1='dashboard-daily-2011-12-26' -p
out_file1='dashboard-daily-2011-12-26' -p current_date_num='20111226' -p
timeperiod='1' ap.pig

*(system.out.println()s added for effect)*
0.9.0 java.class.path =
 
/etc/hbase:/usr/lib/pig-0.9.0/bin/../conf:/usr/java/default/lib/tools.jar:/usr/lib/pig-0.9.0/bin/../build/classes:/usr/lib/pig-0.9.0/bin/../build/test/classes:/usr/lib/pig-0.9.0/bin/../pig-0.9.0-core.jar:/usr/lib/pig-0.9.0/bin/../build/pig-0.9.1-SNAPSHOT.jar:/usr/lib/pig-0.9.0/bin/../lib/automaton.jar:/etc/hadoop/conf:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.4.9.jar

Parameter found: in_file=/chukwa/repos/Insight-Demo/
Parameter found: process_glob=20111226/*/*/*.evt
Parameter found: out_file=dashboard-daily-2011-12-26
Parameter found: in_file1=dashboard-daily-2011-12-26
Parameter found: out_file1=dashboard-daily-2011-12-26
Parameter found: current_date_num=20111226
Parameter found: timeperiod=1

=== now with 0.9.1 ===

$ /usr/lib/pig-0.9.1/bin/pig -d INFO -p
in_file='/chukwa/repos/Insight-Demo/' -p process_glob='20111226/*/*/*.evt'
-p out_file='dashboard-daily-2011-12-26' -p
in_file1='dashboard-daily-2011-12-26' -p
out_file1='dashboard-daily-2011-12-26' -p current_date_num='20111226' -p
timeperiod='1' ap.pig

*(system.out.println()s added for effect)*
0.9.1 java.class.path =
/usr/lib/hadoop-0.20/conf:/usr/java/default/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/cloudera-desktop-plugins-0.3.0.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-capacity-scheduler-0.20.2-cdh3u0-SNAPSHOT.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.9.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/etc/hbase:/usr/lib/pig-0.9.1/bin/../conf:/usr/java/default/lib/tools.jar:/etc/hadoop/conf:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.4.9.jar:/usr/lib/pig-0.9.1/bin/../lib/automaton.jar:/usr/lib/pig-0.9.1/bin/../lib/jython-2.5.0.jar:/usr/lib/pig-0.9.1/bin/../pig-withouthadoop.jar::/usr/local/hbase/hbase-0.90.4.jar:/usr/local/hbase/lib/zookeeper-3.3.2.jar:/usr/local/hbase/conf:/usr/local/hbase/hbase-0.90.4.jar:/usr/local/hbase/lib/zookeeper-3.3.2.jar:/usr/local/hbase/conf

Parameter found: in_file=/chukwa/repos/Insight-Demo/
Parameter found: null
Parameter found: out_file=dashboard-daily-2011-12-26
Parameter found: in_file1=dashboard-daily-2011-12-26
Parameter found: out_file1=dashboard-daily-2011-12-26
Parameter found: current_date_num=20111226
Parameter found: timeperiod=1

The 2nd parameter "process_glob" isn't parsed correctly and needs to be
escaped now like this:

/usr/lib/pig-0.9.1/bin/pig -d INFO -p in_file='/chukwa/repos/Insight-Demo/'
*-p process_glob='20111226/\*/\*/\*.evt' *-p
out_file='dashboard-daily-2011-12-26' -p
in_file1='dashboard-daily-2011-12-26' -p
out_file1='dashboard-daily-2011-12-26' -p current_date_num='20111226' -p
timeperiod='1' ap.pig




On Tue, Dec 27, 2011 at 6:15 PM, Aniket Mokashi <[email protected]> wrote:

> I tried
> pig --param "input=s3n://bucket_path/*/" test.pig
>
> It worked for me. I am on EMR Pig 0.9.1.
>
> Thanks,
> Aniket
>
> On Tue, Dec 27, 2011 at 3:35 PM, Corbin Hoenes <[email protected]> wrote:
>
> > I am not sure Ayon doesn't have something here.  I am seeing a similar
> > problem with the 0.9.1 build of pig.  But when I run with 0.9.0 it
> doesn't
> > have that problem.
> >
> > Did something with pattern substitution change from 0.9.0 --> 0.9.1?
> >  Haven't run it through a debugger yet but that is the next step tomorrow
> > if someone doesn't know of some patch I'm missing?
> >
> > On Dec 15, 2011, at 12:25 PM, <[email protected]> <
> > [email protected]> wrote:
> >
> > > If
> > >  -param input=s3n://foo/bar/baz/*/ blah.pig
> > > is part of a command line, you'd have to add quotes:
> > >  -param 'input=s3n://foo/bar/baz/*/' blah.pig
> > > to inhibit your shell from trying to interpret the *.
> > >
> > >
> > > William F Dowling
> > > Senior Technologist
> > > Thomson Reuters
> > > 0 +1 215 823 3853
> > >
> > >
> > > -----Original Message-----
> > > From: Ayon Sinha [mailto:[email protected]]
> > > Sent: Thursday, December 15, 2011 2:18 PM
> > > To: Pig Mailinglist
> > > Subject: Possible Pig 9.1 globing bug in parameter substitution
> > >
> > > when using -param input=s3n://foo/bar/baz/*/ blah.pig
> > > it throws
> > >
> > > java.lang.NullPointerException
> > > at
> >
> org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:79)
> > > at org.apache.pig.Main.runParamPreprocessor(Main.java:710)
> > > at org.apache.pig.Main.run(Main.java:517)
> > > at org.apache.pig.Main.main(Main.java:108)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > at java.lang.reflect.Method.invoke(Method.java:597)
> > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > >
> > > It works when my load statement is changed from:
> > > a = load '$input' using PigStorage();
> > >
> > > to
> > >
> > > a = load 's3n://foo/bar/baz/*/' using PigStorage();
> > >
> > > (I'm under a deadline so can't file a JIRA bug rightaway)
> > >
> > > -Ayon
> > > See My Photos on Flickr
> > > Also check out my Blog for answers to commonly asked questions.
> >
> >
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>

Reply via email to