Nice job figuring out a fix! You should seriously file a bug with AMR for that. That's kind of ridiculous.
D On Wed, Aug 17, 2011 at 6:03 PM, Dexin Wang <[email protected]> wrote: > I solved my own problem and just want to share with whoever might encounter > the same issue. > > I pass colon separated list then convert it to comma separated list inside > pig script using declare command. > > Submit pig job like this: > > -p SOURCE_DIRS="2011-08:2011-07:2011-06" > > and in Pig script > > % declare SOURCE_DIRS_CONVERTED `echo $SOURCE_DIRS | tr ':' ','`; > LOAD '/root_dir/{$SOURCE_DIRS_CONVERTED}' ... > > > On Wed, Aug 17, 2011 at 4:21 PM, Dexin Wang <[email protected]> wrote: > > > Hi, > > > > I'm running pig jobs using Amazon pig support, where you submit jobs with > > comma concatenated parameters like this: > > > > elastic-mapreduce --pig-script --args myscript.pig --args > > -p,PARAM1=value1,-p,PARAM2=value2,-p,PARAM3=value3 > > > > In my script, I need to pass multiple directories for the pig script to > > load like this: > > > > raw = LOAD '/root_dir/{$SOURCE_DIRS}' > > > > and SOURCE_DIRS is computed. For example, it can be > > "2011-08,2011-07,20110-06", meaning my pig script need to load data for > the > > past 3 months. This works fine when I run my job using local or direct > > hadoop mode. But with Amazon pig, I have to do something like this: > > > > elastic-mapreduce --pig-script --args myscript.pig > > -p,SOURCE_DIRS="2011-08,2011-07,2011-06" > > > > but emr will just replace commas with spaces so it breaks the parameter > > passing syntax. I've tried adding backslashes before commas, but I simply > > end up with back slash with space in between. > > > > So question becomes: > > > > 1. can I do something differently than what I'm doing to pass multiple > > folders to pig script (without commas), or > > 2. anyone knows how to properly pass commas to elastic-mapreduce ? > > > > Thanks! > > > > Dexin > > >
