[
https://issues.apache.org/jira/browse/PIG-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-1586:
----------------------------
Description:
I have a Pig script as a template:
{code}
register Countwords.jar;
A = $INPUT;
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);
STORE D INTO $OUTPUT;
{code}
I attempt to do Parameter substitutions using the following:
Using Shell script:
{code}
#!/bin/bash
java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r -file
sub.pig \
-param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' USING
PigStorage() AS (word:chararray,num:int)) by (word),(load
'/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by
(word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
-param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
{code}
{code}
register Countwords.jar;
A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS
(word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING
PigStorage() AS (word:chararray,num:int)) by (word)) generate
flatten(examples.udf.CountWords(runsub.sh,,)));
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);
STORE D INTO /user/viraj/output;
{code}
The shell substitutes the $0 before passing it to java.
a) Is there a workaround for this?
b) Is this is Pig param problem?
Viraj
was:
I have a Pig script as a template:
{code}
register Countwords.jar;
A = $INPUT;
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);
STORE D INTO $OUTPUT;
{code}
I attempt to do Parameter substitutions using the following:
Using Shell script:
{code}
#!/bin/bash
java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r -file
sub.pig \
-param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' USING
PigStorage() AS (word:chararray,num:int)) by (word),(load
'/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by
(word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
-param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
{code}
register Countwords.jar;
A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS
(word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING
PigStorage() AS (word:chararray,num:int)) by (word)) generate
flatten(examples.udf.CountWords(runsub.sh,,)));
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);
STORE D INTO /user/viraj/output;
{code}
The shell substitutes the $0 before passing it to java.
a) Is there a workaround for this?
b) Is this is Pig param problem?
Viraj
> Parameter subsitution using -param option runs into problems when substituing
> entire pig statements in a shell script (maybe this is a bash problem)
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-1586
> URL: https://issues.apache.org/jira/browse/PIG-1586
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Viraj Bhat
>
> I have a Pig script as a template:
> {code}
> register Countwords.jar;
> A = $INPUT;
> B = FOREACH A GENERATE
> examples.udf.SubString($0,0,1),
> $1 as num;
> C = GROUP B BY $0;
> D = FOREACH C GENERATE group, SUM(B.num);
> STORE D INTO $OUTPUT;
> {code}
> I attempt to do Parameter substitutions using the following:
> Using Shell script:
> {code}
> #!/bin/bash
> java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r
> -file sub.pig \
> -param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1'
> USING PigStorage() AS (word:chararray,num:int)) by (word),(load
> '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by
> (word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
> -param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
> {code}
> {code}
> register Countwords.jar;
> A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS
> (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING
> PigStorage() AS (word:chararray,num:int)) by (word)) generate
> flatten(examples.udf.CountWords(runsub.sh,,)));
> B = FOREACH A GENERATE
> examples.udf.SubString($0,0,1),
> $1 as num;
> C = GROUP B BY $0;
> D = FOREACH C GENERATE group, SUM(B.num);
> STORE D INTO /user/viraj/output;
> {code}
> The shell substitutes the $0 before passing it to java.
> a) Is there a workaround for this?
> b) Is this is Pig param problem?
> Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.