[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1211: Status: Resolved (was: Patch Available) Hadoop Flags: [Incompatible change, Reviewed] Release Note: -c (-cluster) was earlier documented as the option to provide cluster information - this was not being used in the Pig code though - with PIG-1211, -c is being reused as the option to check syntax of the pig script Resolution: Fixed Patch committed to trunk Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1211: Status: Patch Available (was: Open) Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1211: Attachment: PIG-1211.patch Attached patch addresses the issue by adding support for a check script option. For this purpose, the -c command line option is reused thus fixing https://issues.apache.org/jira/browse/PIG-1382 (Command line option -c doesn't work ...Currently this option is not used...). The implementation of this check option piggybacks on explain -script and just modifies the GruntParser code to not output the explain output. Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.