[ 
https://issues.apache.org/jira/browse/PIG-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-1586:
----------------------------

    Description: 
I have a Pig script as a template:

{code}
register Countwords.jar;
A = $INPUT;
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);
STORE D INTO $OUTPUT;
{code}


I attempt to do Parameter substitutions using the following:

Using Shell script:

{code}
#!/bin/bash
java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r -file 
sub.pig \
             -param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' USING 
PigStorage() AS (word:chararray,num:int)) by (word),(load 
'/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by 
(word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
             -param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
{code}

{code}
register Countwords.jar;

A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS 
(word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING 
PigStorage() AS (word:chararray,num:int)) by (word)) generate 
flatten(examples.udf.CountWords(runsub.sh,,)));
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);

STORE D INTO /user/viraj/output;
{code}

The shell substitutes the $0 before passing it to java. 
a) Is there a workaround for this?  
b) Is this is Pig param problem?


Viraj

  was:
I have a Pig script as a template:

{code}
register Countwords.jar;
A = $INPUT;
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);
STORE D INTO $OUTPUT;
{code}


I attempt to do Parameter substitutions using the following:

Using Shell script:

{code}
#!/bin/bash
java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r -file 
sub.pig \
             -param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' USING 
PigStorage() AS (word:chararray,num:int)) by (word),(load 
'/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by 
(word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
             -param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
{code}

register Countwords.jar;

A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS 
(word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING 
PigStorage() AS (word:chararray,num:int)) by (word)) generate 
flatten(examples.udf.CountWords(runsub.sh,,)));
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);

STORE D INTO /user/viraj/output;
{code}

The shell substitutes the $0 before passing it to java. 
a) Is there a workaround for this?  
b) Is this is Pig param problem?


Viraj




> Parameter subsitution using -param option runs into problems when substituing 
> entire pig statements in a shell script (maybe this is a bash problem)
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1586
>                 URL: https://issues.apache.org/jira/browse/PIG-1586
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Viraj Bhat
>
> I have a Pig script as a template:
> {code}
> register Countwords.jar;
> A = $INPUT;
> B = FOREACH A GENERATE
> examples.udf.SubString($0,0,1),
> $1 as num;
> C = GROUP B BY $0;
> D = FOREACH C GENERATE group, SUM(B.num);
> STORE D INTO $OUTPUT;
> {code}
> I attempt to do Parameter substitutions using the following:
> Using Shell script:
> {code}
> #!/bin/bash
> java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r 
> -file sub.pig \
>              -param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' 
> USING PigStorage() AS (word:chararray,num:int)) by (word),(load 
> '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by 
> (word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
>              -param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
> {code}
> {code}
> register Countwords.jar;
> A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS 
> (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING 
> PigStorage() AS (word:chararray,num:int)) by (word)) generate 
> flatten(examples.udf.CountWords(runsub.sh,,)));
> B = FOREACH A GENERATE
> examples.udf.SubString($0,0,1),
> $1 as num;
> C = GROUP B BY $0;
> D = FOREACH C GENERATE group, SUM(B.num);
> STORE D INTO /user/viraj/output;
> {code}
> The shell substitutes the $0 before passing it to java. 
> a) Is there a workaround for this?  
> b) Is this is Pig param problem?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to