[jira] Commented: (PIG-1076) Make PigOutputCommitter conform with new FileOututCommitter in hadoop trunk
[ https://issues.apache.org/jira/browse/PIG-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913242#action_12913242 ] Pradeep Kamath commented on PIG-1076: - The patch will need new hadoop sources which have not yet been released on apache - so until then the patch can be used against hadoop trunk but since pig build picks released hadoop this would not be seemless. Make PigOutputCommitter conform with new FileOututCommitter in hadoop trunk --- Key: PIG-1076 URL: https://issues.apache.org/jira/browse/PIG-1076 Project: Pig Issue Type: Improvement Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1076.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [VOTE] Pig to become a top level Apache project
+1 -Original Message- From: Alan Gates [mailto:ga...@yahoo-inc.com] Sent: Wednesday, August 18, 2010 10:34 AM To: pig-dev@hadoop.apache.org Subject: [VOTE] Pig to become a top level Apache project Earlier this week I began a discussion on Pig becoming a TLP (http://bit.ly/byD7L8 ). All of the received feedback was positive. So, let's have a formal vote. I propose we move Pig to a top level Apache project. I propose that the initial PMC of this project be the list of all currently active Pig committers (http://hadoop.apache.org/pig/whoweare.html ) as of 18 August 2010. I nominate Olga Natkovich as the chair of the PMC. (PMC chairs have no more power than other PMC members, but they are responsible for writing regular reports for the Apache board, assigning rights to new committers, etc.) I propose that as part of the resolution that will be forwarded to the Apache board we include that one of the first tasks of the new Pig PMC will be to adopt bylaws for the governance of the project. Alan. P.S. If this vote passes, the next step is that the proposal will be forwarded to the Hadoop PMC for discussion and vote. If the Hadoop PMC vote passes, a formal resolution is then drafted (see http://bit.ly/bvOTRq for an example resolution) and sent to the Apache board. The Apache board will then vote on whether to make Pig a TLP.
[jira] Commented: (PIG-1546) Incorrect assert statements in operator evaluation
[ https://issues.apache.org/jira/browse/PIG-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899566#action_12899566 ] Pradeep Kamath commented on PIG-1546: - Results from running the test-patch ant target [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Incorrect assert statements in operator evaluation -- Key: PIG-1546 URL: https://issues.apache.org/jira/browse/PIG-1546 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Ajay Kidave Assignee: Ajay Kidave Priority: Minor Fix For: 0.8.0 Attachments: pig_1546.patch The physical operator evaluation code path for , =, and = have incorrect assert statements. These asserts fail if the jvm have asserts enabled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1546) Incorrect assert statements in operator evaluation
[ https://issues.apache.org/jira/browse/PIG-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1546: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed - thanks Ajay! Incorrect assert statements in operator evaluation -- Key: PIG-1546 URL: https://issues.apache.org/jira/browse/PIG-1546 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Ajay Kidave Assignee: Ajay Kidave Priority: Minor Fix For: 0.8.0 Attachments: pig_1546.patch The physical operator evaluation code path for , =, and = have incorrect assert statements. These asserts fail if the jvm have asserts enabled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1520) Remove Owl from Pig contrib
[ https://issues.apache.org/jira/browse/PIG-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896977#action_12896977 ] Pradeep Kamath commented on PIG-1520: - Some of the files in the patch seem to retain the Apache Header after deletion (or it might just be vim showing me different colors for the header and throwing me off) - either way if eventually this will just be a svn rm contrib/owl followed by svn commit, it should be fine. +1 for commit. Remove Owl from Pig contrib --- Key: PIG-1520 URL: https://issues.apache.org/jira/browse/PIG-1520 Project: Pig Issue Type: Task Components: impl Affects Versions: 0.8.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Attachments: PIG-1520.patch Yahoo has transitioned work on Owl to Howl (which will not be a Pig contrib project). Since no one else is working on Owl and there will be no one to support it we should remove it from our contrib before releasing 0.8. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1534) Code discovering UDFs in the script has a bug in a order by case
[ https://issues.apache.org/jira/browse/PIG-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895769#action_12895769 ] Pradeep Kamath commented on PIG-1534: - Thanks for the review Daniel, patch committed to trunk. Code discovering UDFs in the script has a bug in a order by case Key: PIG-1534 URL: https://issues.apache.org/jira/browse/PIG-1534 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1534.patch Consider the following commandline: {noformat} java -cp /tmp/svncheckout/pig.jar:udf.jar:clusterdir org.apache.pig.Main -e a = load 'studenttab' using udf.MyPigStorage(); b = order a by $0; dump b; {noformat} Notice there is no register udf.jar, instead udf.jar (which contains udf.MyPigStorage) is in the classpath. Pig handles this case by shipping udf.jar to the backend. However the above script with order by triggers the bug with the following error message: ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException: could not instantiate 'org.apache.pig.impl.builtin.RandomSampleLoader' with arguments '[udf.MyPigStorage, 100]' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1534) Code discovering UDFs in the script has a bug in a order by case
[ https://issues.apache.org/jira/browse/PIG-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1534: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Code discovering UDFs in the script has a bug in a order by case Key: PIG-1534 URL: https://issues.apache.org/jira/browse/PIG-1534 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1534.patch Consider the following commandline: {noformat} java -cp /tmp/svncheckout/pig.jar:udf.jar:clusterdir org.apache.pig.Main -e a = load 'studenttab' using udf.MyPigStorage(); b = order a by $0; dump b; {noformat} Notice there is no register udf.jar, instead udf.jar (which contains udf.MyPigStorage) is in the classpath. Pig handles this case by shipping udf.jar to the backend. However the above script with order by triggers the bug with the following error message: ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException: could not instantiate 'org.apache.pig.impl.builtin.RandomSampleLoader' with arguments '[udf.MyPigStorage, 100]' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1534) Code discovering UDFs in the script has a bug in a order by case
[ https://issues.apache.org/jira/browse/PIG-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1534: Status: Patch Available (was: Open) Assignee: Pradeep Kamath Code discovering UDFs in the script has a bug in a order by case Key: PIG-1534 URL: https://issues.apache.org/jira/browse/PIG-1534 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1534.patch Consider the following commandline: {noformat} java -cp /tmp/svncheckout/pig.jar:udf.jar:clusterdir org.apache.pig.Main -e a = load 'studenttab' using udf.MyPigStorage(); b = order a by $0; dump b; {noformat} Notice there is no register udf.jar, instead udf.jar (which contains udf.MyPigStorage) is in the classpath. Pig handles this case by shipping udf.jar to the backend. However the above script with order by triggers the bug with the following error message: ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException: could not instantiate 'org.apache.pig.impl.builtin.RandomSampleLoader' with arguments '[udf.MyPigStorage, 100]' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1534) Code discovering UDFs in the script has a bug in a order by case
[ https://issues.apache.org/jira/browse/PIG-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1534: Attachment: PIG-1534.patch Patch fixes SampleOptimizer to add the loadFunc funcspecs into the Mapreduce operators after optimization - this fixes the above order by error. Here are results from running the test-patch target locally [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] The javadoc warning is present on trunk and not related to this patch: {noformat} ... [javadoc] Standard Doclet version 1.6.0_01 [javadoc] Building tree for all the packages and classes... [javadoc] /tmp/svncheckout/src/org/apache/pig/newplan/logical/expression/ProjectExpression.java:192: warning - @param argument currentOp is not a parameter name. [javadoc] Building index for all the packages and classes... ... {noformat} Will run unit tests locally and update with results. Code discovering UDFs in the script has a bug in a order by case Key: PIG-1534 URL: https://issues.apache.org/jira/browse/PIG-1534 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1534.patch Consider the following commandline: {noformat} java -cp /tmp/svncheckout/pig.jar:udf.jar:clusterdir org.apache.pig.Main -e a = load 'studenttab' using udf.MyPigStorage(); b = order a by $0; dump b; {noformat} Notice there is no register udf.jar, instead udf.jar (which contains udf.MyPigStorage) is in the classpath. Pig handles this case by shipping udf.jar to the backend. However the above script with order by triggers the bug with the following error message: ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException: could not instantiate 'org.apache.pig.impl.builtin.RandomSampleLoader' with arguments '[udf.MyPigStorage, 100]' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1534) Code discovering UDFs in the script has a bug in a order by case
[ https://issues.apache.org/jira/browse/PIG-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895522#action_12895522 ] Pradeep Kamath commented on PIG-1534: - Ran all unit tests - TestScriptUDF fails but the failure is unrelated to the change in this patch and the failure occurs even with a fresh svn checkout. Patch is ready for review. Code discovering UDFs in the script has a bug in a order by case Key: PIG-1534 URL: https://issues.apache.org/jira/browse/PIG-1534 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1534.patch Consider the following commandline: {noformat} java -cp /tmp/svncheckout/pig.jar:udf.jar:clusterdir org.apache.pig.Main -e a = load 'studenttab' using udf.MyPigStorage(); b = order a by $0; dump b; {noformat} Notice there is no register udf.jar, instead udf.jar (which contains udf.MyPigStorage) is in the classpath. Pig handles this case by shipping udf.jar to the backend. However the above script with order by triggers the bug with the following error message: ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException: could not instantiate 'org.apache.pig.impl.builtin.RandomSampleLoader' with arguments '[udf.MyPigStorage, 100]' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1457) Pig will run complete zebra test even we give -Dtestcase=xxx
[ https://issues.apache.org/jira/browse/PIG-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879922#action_12879922 ] Pradeep Kamath commented on PIG-1457: - +1 Pig will run complete zebra test even we give -Dtestcase=xxx Key: PIG-1457 URL: https://issues.apache.org/jira/browse/PIG-1457 Project: Pig Issue Type: Test Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1457.patch After [PIG-1302|https://issues.apache.org/jira/browse/PIG-1302], even we want to run an individual test using -Dtestcase=, pig will still invoke complete zebra test. We shall pass -Dtestcase to zebra pigtest to suppress running unwantted test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target
[ https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877229#action_12877229 ] Pradeep Kamath commented on PIG-1302: - +1 Include zebra's pigtest ant target as a part of pig's ant test target --- Key: PIG-1302 URL: https://issues.apache.org/jira/browse/PIG-1302 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Giridharan Kesavan Attachments: PIG-1302.patch There are changes made in Pig interfaces which break zebra loaders/storers. It would be good to run the pig tests in the zebra unit tests as part of running pig's core-test for each patch submission. So essentially in the test ant target in pig, we would need to invoke zebra's pigtest target. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1433: Attachment: PIG-1433-for-branch-0.7.patch The original patch was committed to trunk. It did not apply for branch-0.7 - so I have attached a new patch with minor modifications for branch-0.7. This latter patch was committed to branch-0.7 pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0, 0.8.0 Attachments: PIG-1433-for-branch-0.7.patch, PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1433: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.7.0 Resolution: Fixed pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0, 0.7.0 Attachments: PIG-1433-for-branch-0.7.patch, PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875292#action_12875292 ] Pradeep Kamath commented on PIG-1433: - Hudson seems to be unresponsive - I ran unit tests locally and they completed successfully. The test-patch ant target also came back successfully except for a html page change in the release audit warnings which can be ignored. Patch is ready for review. pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1433: Status: Patch Available (was: Open) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1433: Attachment: PIG-1433.patch Attached patch addresses the issue in MapReduceLauncher by creating an _SUCCESS file for stores which are part of successful jobs if the property is set in the job. pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1419) Remove user.name from JobConf
[ https://issues.apache.org/jira/browse/PIG-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871845#action_12871845 ] Pradeep Kamath commented on PIG-1419: - +1 Remove user.name from JobConf --- Key: PIG-1419 URL: https://issues.apache.org/jira/browse/PIG-1419 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1419-1.patch, PIG-1419-2.patch In hadoop security, hadoop will use kerberos id instead of unix id. Pig should not set user.name entry in jobconf. This should be decided by hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1419) Remove user.name from JobConf
[ https://issues.apache.org/jira/browse/PIG-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871389#action_12871389 ] Pradeep Kamath commented on PIG-1419: - +1 Minor observation in GruntParser.java: {noformat} 565 if (path == null) { 566 if (mDfs instanceof HDataStorage) { 567 container = mDfs.asContainer(((HDataStorage)mDfs). 568 getHFS().getHomeDirectory().toString()); 569 } else 570 container = mDfs.asContainer(/user/ + System.getProperty(user.name)); {noformat} Would the else ever get executed? (I think currently mDfs is always an instance of HDataStorage right?) If this is just to make it future proof, then I am fine keeping it. Minor style comment - would be good to enclose the else in {} even though it is a single statement - there is another statement right below the container = ... statement - so it would be more readable with {} block. Remove user.name from JobConf --- Key: PIG-1419 URL: https://issues.apache.org/jira/browse/PIG-1419 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1419-1.patch In hadoop security, hadoop will use kerberos id instead of unix id. Pig should not set user.name entry in jobconf. This should be decided by hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1403) Make Pig work with remote HDFS in secure mode
[ https://issues.apache.org/jira/browse/PIG-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867610#action_12867610 ] Pradeep Kamath commented on PIG-1403: - In QueryParser.jjt: if (uri.getHost()!=null - this check seems redundant since there is already a check before in the code. Why is port not considered? It it not needed to be added into the hadoop conf? I have a url of the form hdfs://differentnamenode:non standard port will it work with hadoop security? Why is the following change in HExecutionEngine required? -jc.addResource(pig-cluster-hadoop-site.xml); Am wondering if this will be a backward incompatible change if users have been using pig-cluster-hadoop-site.xml for site specific properties. Otherwise +1 Make Pig work with remote HDFS in secure mode - Key: PIG-1403 URL: https://issues.apache.org/jira/browse/PIG-1403 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Daniel Dai Fix For: 0.7.0, 0.8.0 Attachments: PIG-1403-1.patch Access to remote HDFS is currently broken. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1403) Make Pig work with remote HDFS in secure mode
[ https://issues.apache.org/jira/browse/PIG-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867626#action_12867626 ] Pradeep Kamath commented on PIG-1403: - +1 Make Pig work with remote HDFS in secure mode - Key: PIG-1403 URL: https://issues.apache.org/jira/browse/PIG-1403 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Daniel Dai Fix For: 0.7.0, 0.8.0 Attachments: PIG-1403-1.patch, PIG-1403-2.patch Access to remote HDFS is currently broken. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864386#action_12864386 ] Pradeep Kamath commented on PIG-1211: - core unit tests are pass on my local machine - the errors reported above seem to be related to the environment. The release audit warning is due to a html file change and can be ignored - the patch is ready for review. Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1378) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works)
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Summary: har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works) (was: har url not usable in Pig scripts) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works) --- Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits
[jira] Resolved: (PIG-1378) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works)
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1378. - Release Note: The fix for this issue described in this jira depends on a issue with Hadoop code which was fixed on the hadoop trunk ( https://issues.apache.org/jira/browse/MAPREDUCE-1522). Until that goes into a hadoop release which is used by pig, this will remain an issue Resolution: Fixed Am closing this bug since the pig changes are in and hadoop changes are in trunk - this should work once we use the appropriate hadoop release. har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works) --- Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175
[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1211: Status: Resolved (was: Patch Available) Hadoop Flags: [Incompatible change, Reviewed] Release Note: -c (-cluster) was earlier documented as the option to provide cluster information - this was not being used in the Pig code though - with PIG-1211, -c is being reused as the option to check syntax of the pig script Resolution: Fixed Patch committed to trunk Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1401: Status: Patch Available (was: Open) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863859#action_12863859 ] Pradeep Kamath commented on PIG-1401: - The release audit warning is due to the new test script file added in the patch and can be ignored - the patch is ready for review. explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863997#action_12863997 ] Pradeep Kamath commented on PIG-1378: - Spoke with a developer on the hadoop team to confirm that this is an issue with Hadoop code fixed on the hadoop trunk ( https://issues.apache.org/jira/browse/MAPREDUCE-1522). Until that goes into a hadoop release which is used by pig, this will remain an issue - not sure if we should keep this jira open until that point - am fine if we should. har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208
[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1211: Status: Patch Available (was: Open) Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1211: Attachment: PIG-1211.patch Attached patch addresses the issue by adding support for a check script option. For this purpose, the -c command line option is reused thus fixing https://issues.apache.org/jira/browse/PIG-1382 (Command line option -c doesn't work ...Currently this option is not used...). The implementation of this check option piggybacks on explain -script and just modifies the GruntParser code to not output the explain output. Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1382) Command line option -c doesn't work
[ https://issues.apache.org/jira/browse/PIG-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1382. - Hadoop Flags: [Incompatible change] Release Note: -c (-cluster) was earlier documented as the option to provide cluster information - this was not being used in the Pig code though - with PIG-1211, -c is being reused as the option to check syntax of the pig script Assignee: Pradeep Kamath Resolution: Fixed Fixed through https://issues.apache.org/jira/browse/PIG-1211?focusedCommentId=12864002page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12864002 Command line option -c doesn't work --- Key: PIG-1382 URL: https://issues.apache.org/jira/browse/PIG-1382 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Richard Ding Assignee: Pradeep Kamath Fix For: 0.8.0 Currently this option is not used, but it's documented: -c, -cluster clustername, kryptonite is default We should either remove it from documentation or find someway to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-740) Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF
[ https://issues.apache.org/jira/browse/PIG-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-740: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed The javac warning is due to generated javacc code and cannot be avoided. I ran all unit tests on my local machine and they passed - patch committed to trunk. Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF -- Key: PIG-740 URL: https://issues.apache.org/jira/browse/PIG-740 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Priority: Minor Fix For: 0.8.0 Attachments: PIG-740.patch Consider the Pig script with the error that a String with double quotes {code}www\\.{code} is used instead of a single quote {code}'www\\.'{code} in the UDF string.REPLACEALL() {code} register string-2.0.jar; A = load 'inputdata' using PigStorage() as ( curr_searchQuery ); B = foreach A { domain = string.REPLACEALL(curr_searchQuery,^www\\.,''); generate domain; }; dump B; {code} I get the following error message where Line 11 points to the end of file. The error message should point to Line 5. === 2009-03-31 01:33:38,403 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2009-03-31 01:33:39,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 2009-03-31 01:33:39,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 11, column 0. Encountered: EOF after : Details at logfile: /home/viraj/pig-svn/trunk/pig_1238463218046.log === The log file contains the following contents === ERROR 1000: Error during parsing. Lexical error at line 11, column 0. Encountered: EOF after : org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 11, column 0. Encountered: EOF after : at org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2739) at org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:778) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:89) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:352) === -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1401: Attachment: PIG-1401.patch Attached patch addresses the issue by checking internal state in GruntParser to check if the current execution is in explain -script mode and if so, ignores grunt commands like run, copy etc. explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1401: Status: Patch Available (was: Open) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1398) Marking Pig interfaces for org.apache.pig.data package
[ https://issues.apache.org/jira/browse/PIG-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863502#action_12863502 ] Pradeep Kamath commented on PIG-1398: - +1 Marking Pig interfaces for org.apache.pig.data package -- Key: PIG-1398 URL: https://issues.apache.org/jira/browse/PIG-1398 Project: Pig Issue Type: Sub-task Components: documentation Affects Versions: 0.8.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Minor Fix For: 0.8.0 Attachments: PIG-1398.patch Marking Pig interfaces for stability and audience, as well as javadoc cleanup, for the data package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1401: Status: Open (was: Patch Available) The patch did not contain the test script file- will attach new patch shortly explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1401: Attachment: PIG-1401-2.patch New patch includes test script file needed for the unit test. It also has some changes in code to not call executeBatch() in explain -script mode. Also fs .. commands also invoke executeBatch() now - this was missing but is required since the fs command could be a delete/move/copy command which should result in an execution of the current batch just like the rm, mv and cp grunt statements do. explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401-2.patch, PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1401: Status: Patch Available (was: Open) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401-2.patch, PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed The hudson test failures seem to be due to some temporary env. issue - I ran all unit tests locally and the run was successful - patch committed to trunk. har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246
[jira] Updated: (PIG-740) Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF
[ https://issues.apache.org/jira/browse/PIG-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-740: --- Assignee: Pradeep Kamath Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF -- Key: PIG-740 URL: https://issues.apache.org/jira/browse/PIG-740 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Priority: Minor Fix For: 0.8.0 Attachments: PIG-740.patch Consider the Pig script with the error that a String with double quotes {code}www\\.{code} is used instead of a single quote {code}'www\\.'{code} in the UDF string.REPLACEALL() {code} register string-2.0.jar; A = load 'inputdata' using PigStorage() as ( curr_searchQuery ); B = foreach A { domain = string.REPLACEALL(curr_searchQuery,^www\\.,''); generate domain; }; dump B; {code} I get the following error message where Line 11 points to the end of file. The error message should point to Line 5. === 2009-03-31 01:33:38,403 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2009-03-31 01:33:39,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 2009-03-31 01:33:39,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 11, column 0. Encountered: EOF after : Details at logfile: /home/viraj/pig-svn/trunk/pig_1238463218046.log === The log file contains the following contents === ERROR 1000: Error during parsing. Lexical error at line 11, column 0. Encountered: EOF after : org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 11, column 0. Encountered: EOF after : at org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2739) at org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:778) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:89) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:352) === -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-740) Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF
[ https://issues.apache.org/jira/browse/PIG-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-740: --- Status: Patch Available (was: Open) Fix Version/s: 0.8.0 Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF -- Key: PIG-740 URL: https://issues.apache.org/jira/browse/PIG-740 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Priority: Minor Fix For: 0.8.0 Attachments: PIG-740.patch Consider the Pig script with the error that a String with double quotes {code}www\\.{code} is used instead of a single quote {code}'www\\.'{code} in the UDF string.REPLACEALL() {code} register string-2.0.jar; A = load 'inputdata' using PigStorage() as ( curr_searchQuery ); B = foreach A { domain = string.REPLACEALL(curr_searchQuery,^www\\.,''); generate domain; }; dump B; {code} I get the following error message where Line 11 points to the end of file. The error message should point to Line 5. === 2009-03-31 01:33:38,403 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2009-03-31 01:33:39,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 2009-03-31 01:33:39,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 11, column 0. Encountered: EOF after : Details at logfile: /home/viraj/pig-svn/trunk/pig_1238463218046.log === The log file contains the following contents === ERROR 1000: Error during parsing. Lexical error at line 11, column 0. Encountered: EOF after : org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 11, column 0. Encountered: EOF after : at org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2739) at org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:778) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:89) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:352) === -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Status: Patch Available (was: Open) har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245) {noformat} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Attachment: PIG-1378-3.patch Looks like the golden file change from the last patch was not correct - updated patch with just that change attached - all unit tests ran successfully locally with this new patch - patch is ready for review har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Status: Open (was: Patch Available) har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245) {noformat} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Status: Open (was: Patch Available) har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245) {noformat} Viraj -- This message is automatically generated by JIRA. - You can reply
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Attachment: PIG-1378-4.patch Realized that a stray change in TestMRCompiler got into my previous patch - attaching new patch with just that change removed. har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits
[jira] Updated: (PIG-740) Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF
[ https://issues.apache.org/jira/browse/PIG-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-740: --- Attachment: PIG-740.patch GruntParser was not handling double quotes within foreach blocks correctly by incorrectly treating them the same way as single quote for the starting double quote and not handling the end double quote - the patch addresses the bug by treating double quotes correctly. Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF -- Key: PIG-740 URL: https://issues.apache.org/jira/browse/PIG-740 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Priority: Minor Attachments: PIG-740.patch Consider the Pig script with the error that a String with double quotes {code}www\\.{code} is used instead of a single quote {code}'www\\.'{code} in the UDF string.REPLACEALL() {code} register string-2.0.jar; A = load 'inputdata' using PigStorage() as ( curr_searchQuery ); B = foreach A { domain = string.REPLACEALL(curr_searchQuery,^www\\.,''); generate domain; }; dump B; {code} I get the following error message where Line 11 points to the end of file. The error message should point to Line 5. === 2009-03-31 01:33:38,403 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2009-03-31 01:33:39,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 2009-03-31 01:33:39,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 11, column 0. Encountered: EOF after : Details at logfile: /home/viraj/pig-svn/trunk/pig_1238463218046.log === The log file contains the following contents === ERROR 1000: Error during parsing. Lexical error at line 11, column 0. Encountered: EOF after : org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 11, column 0. Encountered: EOF after : at org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2739) at org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:778) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:89) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:352) === -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1397) GruntParser should invoke executeBatch() first in processFsCommand()
GruntParser should invoke executeBatch() first in processFsCommand() Key: PIG-1397 URL: https://issues.apache.org/jira/browse/PIG-1397 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Fix For: 0.8.0 If a script has multiple stores which can be optimized using multiquery optimization and if the script also has some file system state modifiying commands like cp, mv, rm then currently Gruntparser executes the plan until the filesystem command so that multiquery optimization will work on the file system after it has been modified (for example some portion of the multi query optimized script might be depending on the cp/mv or rm command to have run first). This is not done for fs ... commands - GruntParser should do the same even for fs .. commands in processFsCommand() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1394) POCombinerPackage hold too much memory for InternalCachedBag
[ https://issues.apache.org/jira/browse/PIG-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862430#action_12862430 ] Pradeep Kamath commented on PIG-1394: - +1 POCombinerPackage hold too much memory for InternalCachedBag Key: PIG-1394 URL: https://issues.apache.org/jira/browse/PIG-1394 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1394-1.patch, PIG-1394-2.patch In POCombinerPackage, we create bunch of InternalCachedBag, the number of which is the number of algebraic UDFs we use. However, when we create InternalCachedBag, we use the default construct which assume we only create 1 InternalCachedBag in the system. It turns out we reserve way to much memory to InternalCachedBag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Status: Open (was: Patch Available) har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245) {noformat} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Status: Patch Available (was: Open) har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245) {noformat} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Attachment: PIG-1378-2.patch Attached new patch addressing unit test failures - mostly due to the fact that the new patch no longer converts locations which are already absolute like '/foo/bar' har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Attachment: PIG-1378.patch Attached patch addresses the issue in the description by changing LoadFunc.relativeToAbsolutePath() implementation to only convert input locations if the location does not have a scheme or the path in the location is not absolute. har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Status: Patch Available (was: Open) Assignee: Pradeep Kamath har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245) {noformat} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment
[jira] Commented: (PIG-1395) Mapside cogroup runs out of memory
[ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861611#action_12861611 ] Pradeep Kamath commented on PIG-1395: - +1, the comment can be updated to reflect the nature of the comparison in the code - currently the comment and code seem to be different. - otherwise the change looks good. Mapside cogroup runs out of memory -- Key: PIG-1395 URL: https://issues.apache.org/jira/browse/PIG-1395 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: cogrp_mem.patch In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1371) Pig should handle deep casting of complex types
[ https://issues.apache.org/jira/browse/PIG-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1371: Attachment: PIG-1371-partial.patch partial patch - attaching here for future reference Pig should handle deep casting of complex types Key: PIG-1371 URL: https://issues.apache.org/jira/browse/PIG-1371 Project: Pig Issue Type: Bug Reporter: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1371-partial.patch Consider input data in BinStorage format which has a field of bag type - bg:{t:(i:int)}. In the load statement if the schema specified has the type for this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig thinks of the field to be of type specified in the load statement (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag of chararray (the user specified schema) is made. There are two issues currently: 1) The TypeCastInserter only considers the byte 'type' between the loader presented schema and user specified schema to decided whether to introduce a cast or not. In the above case since both schema have the type bag no cast is inserted. This check has to be extended to consider the full FieldSchema (with inner subschema) in order to decide whether a cast is needed. 2) POCast should be changed to handle casting a complex type to the type specified the user supplied FieldSchema. Here is there is one issue to be considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} and the real data had only one field what should the result of the cast be: * A bag with two fields - the int field and a null? - In this approach pig is assuming the lone field in the data is the first field which might be incorrect if it in fact is the second field. * A null bag to indicate that the bag is of unknown value - this is the one I personally prefer * The cast throws an IncompatibleCastException -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1392) Parser fails to recognize valid field
[ https://issues.apache.org/jira/browse/PIG-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1392: Fix Version/s: 0.8.0 (was: 0.7.0) Unlinking this from 0.7 release and moving to 0.8 since there is a workaround. Parser fails to recognize valid field - Key: PIG-1392 URL: https://issues.apache.org/jira/browse/PIG-1392 Project: Pig Issue Type: Bug Reporter: Ankur Fix For: 0.8.0 Using this script below, parser fails to recognize a valid field in the relation and throws error A = LOAD '/tmp' as (a:int, b:chararray, c:int); B = GROUP A BY (a, b); C = FOREACH B { bg = A.(b,c); GENERATE group, bg; } ; The error thrown is 2010-04-23 10:16:20,610 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: c in {group: (a: int,b: chararray),A: {a: int,b: chararray,c: int}} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859449#action_12859449 ] Pradeep Kamath commented on PIG-1378: - Adding to previous comment the har url has to be of the form (note the hdfs- prefix in the authority part): har://hdfs-namenodehost:namenodeport/datalocation har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Fix For: 0.8.0 I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245) {noformat} Viraj -- This message is automatically
[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
[ https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1372: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed All tests pass on my local machine - patch committed to 0.7 and trunk Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1372-2.patch, PIG-1372.patch The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1363) Unnecessary loadFunc instantiations
[ https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857139#action_12857139 ] Pradeep Kamath commented on PIG-1363: - +1 Unnecessary loadFunc instantiations --- Key: PIG-1363 URL: https://issues.apache.org/jira/browse/PIG-1363 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: pig-1363.patch In MRCompiler loadfuncs are instantiated at multiple locations in different visit methods. This is inconsistent and confusing. LoadFunc should be instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). A getter should be added to POLoad to retrieve this instantiated loadFunc wherever it is needed in later stages of compilation. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
[ https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1372: Attachment: PIG-1372-2.patch Regenerated patch against latest trunk (same changes). Here are the results of running test-patch ant target: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1372-2.patch, PIG-1372.patch The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
[ https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1372: Status: Patch Available (was: Open) Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1372-2.patch, PIG-1372.patch The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
[ https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1372: Status: Open (was: Patch Available) Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1372-2.patch, PIG-1372.patch The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856496#action_12856496 ] Pradeep Kamath commented on PIG-1370: - bq. But it is accepted as an arg to one of ResourceSchema's constructors. I think that makes it public, unless we want to say that constructor isn't intended for public use (in which case, why is it public?). This constructor is called from internal Pig code and we should not expose this to users - if we don't make the constructor public we cannot call this constructor since the callers are in different packages - I really think we need an annotation to say internal-use so we can annotate some of the public methods which we don't want users to use. bq. I did mark ComparisonFunc as deprecated. Are you saying we should just remove it instead of deprecate it? I think for now deprecated is fine. Marking Pig interfaces for org.apache.pig package - Key: PIG-1370 URL: https://issues.apache.org/jira/browse/PIG-1370 Project: Pig Issue Type: Sub-task Components: documentation Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Attachments: PIG-1370.patch, PIG-1370_2.patch Done as a separate JIRA from PIG-1311 since this alone contains quite a lot of changes. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1369: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.7.0 Resolution: Fixed Patch committed to trunk and branch-0.7 POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1369.patch If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (PIG-1371) Pig should handle deep casting of complex types
Pig should handle deep casting of complex types Key: PIG-1371 URL: https://issues.apache.org/jira/browse/PIG-1371 Project: Pig Issue Type: Bug Reporter: Pradeep Kamath Fix For: 0.8.0 Consider input data in BinStorage format which has a field of bag type - bg:{t:(i:int)}. In the load statement if the schema specified has the type for this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig thinks of the field to be of type specified in the load statement (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag of chararray (the user specified schema) is made. There are two issues currently: 1) The TypeCastInserter only considers the byte 'type' between the loader presented schema and user specified schema to decided whether to introduce a cast or not. In the above case since both schema have the type bag no cast is inserted. This check has to be extended to consider the full FieldSchema (with inner subschema) in order to decide whether a cast is needed. 2) POCast should be changed to handle casting a complex type to the type specified the user supplied FieldSchema. Here is there is one issue to be considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} and the real data had only one field what should the result of the cast be: * A bag with two fields - the int field and a null? - In this approach pig is assuming the lone field in the data is the first field which might be incorrect if it in fact is the second field. * A null bag to indicate that the bag is of unknown value - this is the one I personally prefer * The cast throws an IncompatibleCastException -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1369: Status: Patch Available (was: Open) POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1369.patch If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1369: Status: Open (was: Patch Available) The unit tests all run successfully on my local machine - the hudson QA failure was due to a temporal port conflict issue - will resubmit - meantime the patch is ready for review. POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1369.patch If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend
[ https://issues.apache.org/jira/browse/PIG-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1323: Status: Resolved (was: Patch Available) Resolution: Invalid There is already a hadoop property mapred.task.id which is set to the map/reduce task id in the backend and is not set in the front end which can be used to figure this out. Hence it is best not to introduce new properties in the configuration for this purpose. Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend --- Key: PIG-1323 URL: https://issues.apache.org/jira/browse/PIG-1323 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1323.patch Loaders which interact with external systems like a metadata server may need to know if the LoadFunc.setLocation call happens from the frontend (on the client machine) or in the backend (on each map task). The Configuration in the Job argument to setLocation() can contain this information. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
[ https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1372: Attachment: PIG-1372.patch Attached patch restores PigInputFormat.sJob - however it is deprecated (and so also PigMapReduce.sJobConf for user code) and the javadoc comment indicates to use UDFContext.getUDFContext().getJobConf() instead. No tests are included since this simply restores a static variable for backward compatibility and is not used in pig code. Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1372.patch The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility
[ https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1372: Status: Patch Available (was: Open) Restore PigInputFormat.sJob for backward compatibility -- Key: PIG-1372 URL: https://issues.apache.org/jira/browse/PIG-1372 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1372.patch The preferred method to get the job's Configuration object would be to use UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob (but we will be marking it deprecated and indicating to use UDFContext.getJobConf() instead) to be backward compatible - we can remove it from pig in a future release. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions
[ https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1366: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk and branch-0.7 PigStorage's pushProjection implementation results in NPE under certain data conditions --- Key: PIG-1366 URL: https://issues.apache.org/jira/browse/PIG-1366 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1366.patch Under the following conditions, a NullPointerException is caused when PigStorage is used: If in the script, only the 2nd and 3rd column of the data (say) are used, the PruneColumns optimization passes this information to PigStorage through the pushProjection() method. If the data contains a row with only one column (malformed data due to missing cols in certain rows), PigStorage returns a Tuple backed by a null ArrayList. Subsequent projection operations on this tuple result in the NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1365) WrappedIOException is missing from Pig.jar
[ https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854970#action_12854970 ] Pradeep Kamath commented on PIG-1365: - No unit tests have been added since this is just restoring an old class for backward compatibility for users and is no longer used in the pig code. The release audit warning is about a html file and can be ignored. WrappedIOException is missing from Pig.jar -- Key: PIG-1365 URL: https://issues.apache.org/jira/browse/PIG-1365 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Pradeep Kamath Priority: Critical Fix For: 0.7.0 Attachments: PIG-1365.patch We need to put it back since UDFs rely on it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1365) WrappedIOException is missing from Pig.jar
[ https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1365: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk and branch-0.7 WrappedIOException is missing from Pig.jar -- Key: PIG-1365 URL: https://issues.apache.org/jira/browse/PIG-1365 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Pradeep Kamath Priority: Critical Fix For: 0.7.0 Attachments: PIG-1365.patch We need to put it back since UDFs rely on it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1368) Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened for corner cases
Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened for corner cases Key: PIG-1368 URL: https://issues.apache.org/jira/browse/PIG-1368 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Consider the following data: 1\t ( hello , bye ) \n 1\t( hello , bye )a\n 2 \t (good , bye)\n The following script gives the results below: a = load 'junk' as (i:int, t:tuple(s:chararray, r:chararray)); dump a; (1,( hello , bye )) (1,( hello , bye )) (2,(good , bye)) The current bytesToTuple implementation discards leading and trailing characters before the tuple delimiters and parses the tuple out - I think instead it should treat any leading and trailing characters (including space) near the delimiters as an indication of a malformed tuple and return null. Also in the code, consumeBag() should handle the special case of {} and not delegate the handling to consumeTuple(). In consumeBag() null tuples should not be skipped. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files
[ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855157#action_12855157 ] Pradeep Kamath commented on PIG-1299: - +1 Implement Pig counter to track number of output rows for each output files Key: PIG-1299 URL: https://issues.apache.org/jira/browse/PIG-1299 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1299.patch, PIG-1299.patch When running a multi-store query, the Hadoop job tracker often displays only 0 for Reduce output records or Map output records counters, This is incorrect and misleading. Pig should implement an output records counter for each output files in the query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1369: Attachment: PIG-1369.patch Attached patch addresses the issues mentioned in the description by catching NullPointerException and IndexOutofBoundsException at appropriate places. POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1369.patch If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1369: Status: Patch Available (was: Open) POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1369.patch If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1362) Provide udf context signature in ensureAllKeysInSameSplit() method of loader
[ https://issues.apache.org/jira/browse/PIG-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1362: Resolution: Fixed Status: Resolved (was: Patch Available) +1 Provide udf context signature in ensureAllKeysInSameSplit() method of loader Key: PIG-1362 URL: https://issues.apache.org/jira/browse/PIG-1362 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Critical Fix For: 0.7.0 Attachments: backport.patch As a part of PIG-1292 a check was introduced to make sure loader used in collected group-by implements CollectableLoader (new interface in that patch). In its method, loader may use udf context to store some info. We need to make sure that udf context signature is setup correctly in such cases. This is already the case in trunk, need to backport it to 0.7 branch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions
PigStorage's pushProjection implementation results in NPE under certain data conditions --- Key: PIG-1366 URL: https://issues.apache.org/jira/browse/PIG-1366 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Under the following conditions, a NullPointerException is caused when PigStorage is used: If in the script, only the 2nd and 3rd column of the data (say) are used, the PruneColumns optimization passes this information to PigStorage through the pushProjection() method. If the data contains a row with only one column (malformed data due to missing cols in certain rows), PigStorage returns a Tuple backed by a null ArrayList. Subsequent projection operations on this tuple result in the NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions
[ https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1366: Attachment: PIG-1366.patch Currently in PigStorage the ArrayList backing the Tuple returned in getNext() is created in readField(). Under the data conditions explained in the description, readField() never gets called and the ArrayList (mProtoTuple) remains null causing the eventual NPE. The patch fixes the issue by initializing mProtoTuple to a new ArrayList at the beginning of getNext(). PigStorage's pushProjection implementation results in NPE under certain data conditions --- Key: PIG-1366 URL: https://issues.apache.org/jira/browse/PIG-1366 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1366.patch Under the following conditions, a NullPointerException is caused when PigStorage is used: If in the script, only the 2nd and 3rd column of the data (say) are used, the PruneColumns optimization passes this information to PigStorage through the pushProjection() method. If the data contains a row with only one column (malformed data due to missing cols in certain rows), PigStorage returns a Tuple backed by a null ArrayList. Subsequent projection operations on this tuple result in the NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions
[ https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1366: Status: Patch Available (was: Open) PigStorage's pushProjection implementation results in NPE under certain data conditions --- Key: PIG-1366 URL: https://issues.apache.org/jira/browse/PIG-1366 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1366.patch Under the following conditions, a NullPointerException is caused when PigStorage is used: If in the script, only the 2nd and 3rd column of the data (say) are used, the PruneColumns optimization passes this information to PigStorage through the pushProjection() method. If the data contains a row with only one column (malformed data due to missing cols in certain rows), PigStorage returns a Tuple backed by a null ArrayList. Subsequent projection operations on this tuple result in the NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1365) WrappedIOException is missing from Pig.jar
[ https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1365: Attachment: PIG-1365.patch Attached patch restores WrappedIOException - this is not used in Pig Code and only provided for use by UDFs to maintain backward compatibility. I have marked the class as deprecated so that it can be removed from pig code base in a later release. No unit tests have been added since this is just restoring an old class which is no longer used in the pig code. WrappedIOException is missing from Pig.jar -- Key: PIG-1365 URL: https://issues.apache.org/jira/browse/PIG-1365 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Pradeep Kamath Priority: Critical Fix For: 0.7.0 Attachments: PIG-1365.patch We need to put it back since UDFs rely on it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1365) WrappedIOException is missing from Pig.jar
[ https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1365: Status: Patch Available (was: Open) WrappedIOException is missing from Pig.jar -- Key: PIG-1365 URL: https://issues.apache.org/jira/browse/PIG-1365 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Pradeep Kamath Priority: Critical Fix For: 0.7.0 Attachments: PIG-1365.patch We need to put it back since UDFs rely on it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1338) Pig should exclude hadoop conf in local mode
[ https://issues.apache.org/jira/browse/PIG-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854166#action_12854166 ] Pradeep Kamath commented on PIG-1338: - +1 - changes look good. A minor comment: Can the following error message be changed from: {noformat} Cannot find hadoop configurations in classpath. {noformat} to {noformat} Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath). {noformat} Pig should exclude hadoop conf in local mode Key: PIG-1338 URL: https://issues.apache.org/jira/browse/PIG-1338 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Attachments: PIG-1338-1.patch, PIG-1338-2.patch, PIG-1338-3.patch, PIG-1338-4.patch, PIG-1338-5.patch, PIG-1338-6.patch Currently, the behavior for hadoop conf look up is: * in local mode, if there is hadoop conf, bail out; if there is no hadoop conf, launch local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, still launch without warning, but many functionality will go wrong We should bring it to a more intuitive way, which is: * in local mode, always launch Pig in local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, bail out with a meaningful message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files
[ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854188#action_12854188 ] Pradeep Kamath commented on PIG-1299: - Changes are mostly good - a few comments: 1) Instead of creating a wrapper RecordWriter in MapReducePOStoreImpl, the incrementing of the counter should be done in POStore.getNext() - POStore holds a reference to MapReducePOStoreImpl, so the counter is available for incrementing. This way, we will still keep our contract to StoreFunc that the RecordWriter instance provided in prepareToWrite() is the same as the one given by StoreFunc.getOutputFormat().getRecordWriter(). With this change, the change to BinStorage should be reverted. 2) Is the check for store.isMultiStore() required in MapReducePOStoreImpl - I think MapReducePOStoreImpl is used only with multi-store POStore(s) - so the check seems redundant 3) If javac warnings can be addressed, please address them - also unit tests along the lines of those in TestCounters would be good. Implement Pig counter to track number of output rows for each output files Key: PIG-1299 URL: https://issues.apache.org/jira/browse/PIG-1299 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1299.patch When running a multi-store query, the Hadoop job tracker often displays only 0 for Reduce output records or Map output records counters, This is incorrect and misleading. Pig should implement an output records counter for each output files in the query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Begin a discussion about Pig as a top level project
I agree with Ashutosh and Santhosh. Just based on the current direction of the project I think we are more closely tied with Hadoop now (with Pig 0.7, our load/store interfaces are very closely tied with Hadoop) - hence for now my vote would be a -1 to be a TLP - if there is change in that direction/philosophy to be really backend agnostic I think we should revisit this question. Pradeep -Original Message- From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com] Sent: Sunday, April 04, 2010 11:11 PM To: pig-dev@hadoop.apache.org Subject: Re: Begin a discussion about Pig as a top level project I concur with Santhosh here. I think main question we need to answer here is how close our ties are with Hadoop currently and how it will be in future ? When Pig was originally designed the intent was to keep it backend neutral, so much so that there was a reference backend implementation (also known as local engine) which had nothing to do with Hadoop. But things have changed since then. Hadoop's local mode is adopted in favor of Pig's own local mode. We have moved from being backend agnostic to hadoop favoring. And while this was happening, it seems we tried to keep Pig Latin language independent of hadoop backend while Pig runtime started to make use of hadoop concepts. Apart from design decisions, this move also has a practical impact on our codebase. Since we adopted Hadoop more closely, we got rid of an extra layer of abstraction and instead started using similar abstractions already existing in Hadoop. This has a positive impact that it simplified the codebase and provides tighter integration with Hadoop. So, if we are continuing in a direction where Hadoop is our only backend (or atleast a favored one), close ties to Hadoop are useful because of the reasons Alan and Dmitriy pointed out. if not, then I think moving out to TLP makes sense. Since, there is no efforts which I am aware of, is trying to plug in a different backend for Pig, I think maintaining close ties with Hadoop is useful for Pig. In future when there is a different distributed computing platform comes up which we want to use as backend, we can revisit our decision. So, as for things stand today I am -1 to move out of Hadoop. And I would also like to reiterate my point that though Pig runtime may continue to get closer to Hadoop, we shall keep Pig Latin completely backend agnostic. Ashutosh On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan s...@yahoo-inc.com wrote: I see this as a multi-part question. Looking back at some of the significant roadmap/existential questions asked in the last 12 months, I see the following: 1. With the introduction of SQL, what is the philosophy of Pig (I sent an email about this approximately 9 months ago) 2. What is the approach to support backward compatibility in Pig (Alan had sent an email about this 3 months ago) 3. Should Pig be a TLP (the current email thread). Here is my take on answering the aforementioned questions. The initial philosophy of Pig was to be backend agnostic. It was designed as a data flow language. Whenever a new language is designed, the syntax and semantics of the language have to be laid out. The syntax is usually captured in the form of a BNF grammar. The semantics are defined by the language creators. Backward compatibility is then a question of holding true to the syntax and semantics. With Pig, in addition to the language, the Java APIs were exposed to customers to implement UDFs (load/store/filter/grouping/row transformation etc), provision looping since the language does not support looping constructs and also support a programmatic mode of access. Backward compatibility in this context is to support API versioning. Do we still intend to position as a data flow language that is backend agnostic? If the answer is yes, then there is a strong case for making Pig a TLP. Are we influenced by Hadoop? A big YES! The reason Pig chose to become a Hadoop sub-project was to ride the Hadoop popularity wave. As a consequence, we chose to be heavily influenced by the Hadoop roadmap. Like a good lawyer, I also have rebuttals to Alan's questions :) 1. Search engine popularity - We can discuss this with the Hadoop team and still retain links to TLP's that are coupled (loosely or tightly). 2. Explicit connection to Hadoop - I see this as logical connection v/s physical connection. Today, we are physically connected as a sub-project. Becoming a TLP, will not increase/decrease our influence on the Hadoop community (think Logical, Physical and MR Layers :) 3. Philosophy - I have already talked about this. The tight coupling is by choice. If Pig continues to be a data flow language with clear syntax and semantics then someone can implement Pig on top of a different backend. Do we intend to take this approach? I just wanted to offer a different opinion to this thread. I strongly believe that we should think about the original
[jira] Commented: (PIG-1330) Move pruned schema tracking logic from LoadFunc to core code
[ https://issues.apache.org/jira/browse/PIG-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853600#action_12853600 ] Pradeep Kamath commented on PIG-1330: - +1 Move pruned schema tracking logic from LoadFunc to core code Key: PIG-1330 URL: https://issues.apache.org/jira/browse/PIG-1330 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1330-1.patch Currently, LoadFunc.getSchema require a schema after column pruning. The good side of this is LoadFunc.getSchema matches the data it actually load. This gives a sense of consistency. However, by doing this, every LoadFunc need to keep track of the columns pruned. This is an unnecessary burden to the LoadFunc writer and it is very error proning. This issue is to move this logic from LoadFunc to Pig core. LoadFunc.getSchema then only need to return original schema even after pruning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1337) Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc
[ https://issues.apache.org/jira/browse/PIG-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852485#action_12852485 ] Pradeep Kamath commented on PIG-1337: - We may need to add a new method - addToDistributedCache() on LoadFunc - notice this is an adder not a setter since there is only one key for distributed cache in hadoop's Job (Configuration in the Job). So implementations of loadfunc will have to use the DistributedCache.add*() methods. Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc -- Key: PIG-1337 URL: https://issues.apache.org/jira/browse/PIG-1337 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Chao Wang Fix For: 0.8.0 The Zebra storage layer needs to use distributed cache to reduce name node load during job runs. To to this, Zebra needs to set up distributed cache related configuration information in TableLoader (which extends Pig's LoadFunc) . It is doing this within getSchema(conf). The problem is that the conf object here is not the one that is being serialized to map/reduce backend. As such, the distributed cache is not set up properly. To work over this problem, we need Pig in its LoadFunc to ensure a way that we can use to set up distributed cache information in a conf object, and this conf object is the one used by map/reduce backend. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1346: Status: Open (was: Patch Available) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1346: Attachment: PIG-1346-2.patch The earlier patch was using System.getProperty(java.home) - apparently ant sometimes appends jre to $JAVA_HOME as the value of the java.home property - this causes failures since $JAVA_HOME/jre/bin/ does not contain javac. I have changed this code to use System.getEnv(JAVA_HOME) instead. In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346-2.patch, PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1346: Status: Patch Available (was: Open) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346-2.patch, PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1346: Attachment: PIG-1346.patch In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1346: Status: Patch Available (was: Open) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1337) Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc
[ https://issues.apache.org/jira/browse/PIG-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851479#action_12851479 ] Pradeep Kamath commented on PIG-1337: - My worry in doing these kinds of job related updates in the Job in getSchema() is that currently getSchema has been designed to be a pure getter without any indirect set side effects - this is noted in the javadoc: {noformat} /** * Get a schema for the data to be loaded. * @param location Location as returned by * {...@link LoadFunc#relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)} * @param job The {...@link Job} object - this should be used only to obtain * cluster properties through {...@link Job#getConfiguration()} and not to set/query * any runtime job information. ... {noformat} We should be careful in opening this up to allow set capability - something to consider before designing a fix for this issue. Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc -- Key: PIG-1337 URL: https://issues.apache.org/jira/browse/PIG-1337 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Chao Wang Fix For: 0.8.0 The Zebra storage layer needs to use distributed cache to reduce name node load during job runs. To to this, Zebra needs to set up distributed cache related configuration information in TableLoader (which extends Pig's LoadFunc) . It is doing this within getSchema(conf). The problem is that the conf object here is not the one that is being serialized to map/reduce backend. As such, the distributed cache is not set up properly. To work over this problem, we need Pig in its LoadFunc to ensure a way that we can use to set up distributed cache information in a conf object, and this conf object is the one used by map/reduce backend. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.