[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error

Viraj Bhat (JIRA) Mon, 26 Apr 2010 14:01:01 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861106#action_12861106
 ]


Viraj Bhat commented on PIG-1211:
---------------------------------

Ashutosh, yes as more and more people adopt Pig, they expect some type of 
guarantees, since Pig is designed to help people with no experience in writing 
M/R programs.

If I am a novice user I have a small typo, do I wait for 3-4 hours to discover 
that there is a syntax error? I have not only wasted the CPU cycles but also 
the users productivity.

The problem here is that dump and hadoop shell commands are treated differently 
in Pig scripts and Multi-query optimizations are ignored.

I have listed what Milind and Dmitry is suggesting. Maybe this is the way 
future Pig Language will compile to give you a hadoop jar file in sequence or 
as a DAG.

Pigcc -L myScript.pig -> parses pig script, generates logical plan, and stores 
it in myScript.pig.l

Pigcc -P myScript.pig.l -> produces physical plan from the logical plan, and 
stores it in myScript.pig.p

Pigcc -M myScript.pig.p -> produces map-reduce plan, myScript.pig.m

Pig myScript.pig.m -> interprets the MR plan. This can be split into multiple 
sequential MR jobs plans too,  myScript.pig.m.{1,2,3..}, so that a way to 
execute the pig script is to run

Hadoop jar pigRT.jar myScript.pig.m.1
Hadoop jar pigRT.jar myScript.pig.m.2
Hadoop jar pigRT.jar myScript.pig.m.3
Hadoop jar pigRT.jar myScript.pig.m.4

Thanks Viraj


> Pig script runs half way after which it reports syntax error
> ------------------------------------------------------------
>
>                 Key: PIG-1211
>                 URL: https://issues.apache.org/jira/browse/PIG-1211
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>             Fix For: 0.8.0
>
>
> I have a Pig script which is structured in the following way
> {code}
> register cp.jar
> dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
> col3, col4, col5);
> filtered_dataset = filter dataset by (col1 == 1);
> proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
> rmf $output1;
> store proj_filtered_dataset into '$output1' using PigStorage();
> second_stream = foreach filtered_dataset  generate col2, col4, col5;
> group_second_stream = group second_stream by col4;
> output2 = foreach group_second_stream {
>  a =  second_stream.col2
>  b =   distinct second_stream.col5;
>  c = order b by $0;
>  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
> }
> rmf  $output2;
> --syntax error here
> store output2 to '$output2' using PigStorage();
> {code}
> I run this script using the Multi-query option, it runs successfully till the 
> first store but later fails with a syntax error. 
> The usage of HDFS option, "rmf" causes the first store to execute. 
> The only option the I have is to run an explain before running his script 
> grunt> explain -script myscript.pig -out explain.out
> or moving the rmf statements to the top of the script
> Here are some questions:
> a) Can we have an option to do something like "checkscript" instead of 
> explain to get the same syntax error?  In this way I can ensure that I do not 
> run for 3-4 hours before encountering a syntax error
> b) Can pig not figure out a way to re-order the rmf statements since all the 
> store directories are variables
> Thanks
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error

Reply via email to