[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site
[ https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917246#action_12917246 ] Santhosh Srinivasan commented on PIG-1661: -- Sure, worth a try. Add alternative search-provider to Pig site --- Key: PIG-1661 URL: https://issues.apache.org/jira/browse/PIG-1661 Project: Pig Issue Type: Improvement Components: documentation Reporter: Alex Baranau Priority: Minor Attachments: PIG-1661.patch Use search-hadoop.com service to make available search in Pig sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) via AVRO-626 so this issue is about enabling it for Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1661) Add alternative search-provider to Pig site
Add alternative search-provider to Pig site --- Key: PIG-1661 URL: https://issues.apache.org/jira/browse/PIG-1661 Project: Pig Issue Type: Improvement Components: documentation Reporter: Alex Baranau Priority: Minor Use search-hadoop.com service to make available search in Pig sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) via AVRO-626 so this issue is about enabling it for Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1661) Add alternative search-provider to Pig site
[ https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baranau updated PIG-1661: -- Attachment: PIG-1661.patch Add alternative search-provider to Pig site --- Key: PIG-1661 URL: https://issues.apache.org/jira/browse/PIG-1661 Project: Pig Issue Type: Improvement Components: documentation Reporter: Alex Baranau Priority: Minor Attachments: PIG-1661.patch Use search-hadoop.com service to make available search in Pig sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) via AVRO-626 so this issue is about enabling it for Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1661) Add alternative search-provider to Pig site
[ https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baranau updated PIG-1661: -- Status: Patch Available (was: Open) Add alternative search-provider to Pig site --- Key: PIG-1661 URL: https://issues.apache.org/jira/browse/PIG-1661 Project: Pig Issue Type: Improvement Components: documentation Reporter: Alex Baranau Priority: Minor Attachments: PIG-1661.patch Use search-hadoop.com service to make available search in Pig sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) via AVRO-626 so this issue is about enabling it for Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1531) Pig gobbles up error messages
[ https://issues.apache.org/jira/browse/PIG-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1531: --- Attachment: PIG-1531_5.patch reviewed the patch and made the required changes after discussion with Ashutosh. Ran test-patch and unit test and everything looks fine. Ashutosh, please commit the patch, if you don't have any further comment. Thanks Niraj Pig gobbles up error messages - Key: PIG-1531 URL: https://issues.apache.org/jira/browse/PIG-1531 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: niraj rai Fix For: 0.8.0 Attachments: pig-1531_3.patch, pig-1531_4.patch, PIG-1531_5.patch, PIG_1531.patch, PIG_1531_2.patch Consider the following. I have my own Storer implementing StoreFunc and I am throwing FrontEndException (and other Exceptions derived from PigException) in its various methods. I expect those error messages to be shown in error scenarios. Instead Pig gobbles up my error messages and shows its own generic error message like: {code} 010-07-31 14:14:25,414 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2116: Unexpected error. Could not validate the output specification for: default.partitoned Details at logfile: /Users/ashutosh/workspace/pig/pig_1280610650690.log {code} Instead I expect it to display my error messages which it stores away in that log file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1542) log level not propogated to MR task loggers
[ https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916974#action_12916974 ] niraj rai commented on PIG-1542: All the tests: end to end, unit and test-patch tests passed. If Thejas does not have any feedback, please commit the patch. Thanks Niraj log level not propogated to MR task loggers --- Key: PIG-1542 URL: https://issues.apache.org/jira/browse/PIG-1542 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch Specifying -d DEBUG does not affect the logging of the MR tasks . This was fixed earlier in PIG-882 . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1662) Need better error message for MalFormedProbVecException
Need better error message for MalFormedProbVecException --- Key: PIG-1662 URL: https://issues.apache.org/jira/browse/PIG-1662 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Instead the generic error message: Backend error message - Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.MalFormedProbVecException: ERROR 2122: Sum of probabilities should be one at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.DiscreteProbabilitySampleGenerator.init(DiscreteProbabilitySampleGenerator.java:56) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:128) ... 10 more it can easily print out the content of the malformed probability vector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1662) Need better error message for MalFormedProbVecException
[ https://issues.apache.org/jira/browse/PIG-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1662: -- Attachment: PIG-1662.patch Need better error message for MalFormedProbVecException --- Key: PIG-1662 URL: https://issues.apache.org/jira/browse/PIG-1662 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1662.patch Instead the generic error message: Backend error message - Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.MalFormedProbVecException: ERROR 2122: Sum of probabilities should be one at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.DiscreteProbabilitySampleGenerator.init(DiscreteProbabilitySampleGenerator.java:56) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:128) ... 10 more it can easily print out the content of the malformed probability vector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1662) Need better error message for MalFormedProbVecException
[ https://issues.apache.org/jira/browse/PIG-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1662: -- Status: Patch Available (was: Open) Need better error message for MalFormedProbVecException --- Key: PIG-1662 URL: https://issues.apache.org/jira/browse/PIG-1662 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1662.patch Instead the generic error message: Backend error message - Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.MalFormedProbVecException: ERROR 2122: Sum of probabilities should be one at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.DiscreteProbabilitySampleGenerator.init(DiscreteProbabilitySampleGenerator.java:56) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:128) ... 10 more it can easily print out the content of the malformed probability vector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY
[ https://issues.apache.org/jira/browse/PIG-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916998#action_12916998 ] Daniel Dai commented on PIG-1659: - We should set sortInfo after optimization. So we should add SetSortInfo after the optimization of new logical plan. This code is missing. sortinfo is not set for store if there is a filter after ORDER BY - Key: PIG-1659 URL: https://issues.apache.org/jira/browse/PIG-1659 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Daniel Dai Fix For: 0.8.0 This has caused 6 (of 7) failures in the Zebra test TestOrderPreserveVariableTable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1658: -- Attachment: PIG-1658.patch Add Zebra test TestMergeJoinPartial to the pigtest target. ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1658.patch, PIG-1658.patch In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY
[ https://issues.apache.org/jira/browse/PIG-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917012#action_12917012 ] Yan Zhou commented on PIG-1659: --- Need to make sure it is invoked after optimization in both old and new logical plans. sortinfo is not set for store if there is a filter after ORDER BY - Key: PIG-1659 URL: https://issues.apache.org/jira/browse/PIG-1659 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Daniel Dai Fix For: 0.8.0 This has caused 6 (of 7) failures in the Zebra test TestOrderPreserveVariableTable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917014#action_12917014 ] Thejas M Nair commented on PIG-1658: Looks good . +1 ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1658.patch, PIG-1658.patch In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1658: -- Status: Resolved (was: Patch Available) Resolution: Fixed Committed to both trunk and the 0.8 branch. ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1658.patch, PIG-1658.patch In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema
[ https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1656: --- Attachment: PIG-1656.1.patch TOBAG udfs ignores columns with null value; it does not use input type to determine output schema --- Key: PIG-1656 URL: https://issues.apache.org/jira/browse/PIG-1656 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1656.1.patch TOBAG udf ignores columns with null value {code} R4= foreach B generate $0, TOBAG( id, null, id,null ); grunt dump R4; 1000{(1),(1)} 1000{(2),(2)} 1000{(3),(3)} 1000{(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema
[ https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1656: --- Status: Patch Available (was: Open) Patch passes unit tests and test-patch . TOBAG udfs ignores columns with null value; it does not use input type to determine output schema --- Key: PIG-1656 URL: https://issues.apache.org/jira/browse/PIG-1656 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1656.1.patch TOBAG udf ignores columns with null value {code} R4= foreach B generate $0, TOBAG( id, null, id,null ); grunt dump R4; 1000{(1),(1)} 1000{(2),(2)} 1000{(3),(3)} 1000{(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1663) Order by only allows ordering on columns, not expressions
Order by only allows ordering on columns, not expressions - Key: PIG-1663 URL: https://issues.apache.org/jira/browse/PIG-1663 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Xuefu Zhang Priority: Minor Fix For: 0.9.0 Currently the following Pig Latin will fail: {code} A = LOAD '/Users/gates/test/data/studenttab10' as (name, age, gpa); B = order A by (int)age; dump B; {code} with an error message {code} ERROR 1000: Error during parsing. Encountered int int at line 2, column 17. Was expecting one of: IDENTIFIER ... DOLLARVAR ... {code} The issue is because Pig expects a column not an expression for Order By. If the cast is removed, the script passes. Order by should take an expression for its key, just as group, join, etc. do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY
[ https://issues.apache.org/jira/browse/PIG-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1659: Attachment: PIG-1659-1.patch sortinfo is not set for store if there is a filter after ORDER BY - Key: PIG-1659 URL: https://issues.apache.org/jira/browse/PIG-1659 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1659-1.patch This has caused 6 (of 7) failures in the Zebra test TestOrderPreserveVariableTable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema
[ https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917050#action_12917050 ] Richard Ding commented on PIG-1656: --- We need to make it clear how the output schema of TOBAG is generated. For example, in the first case, the type is preserved in the inner schema: {code} grunt a = load 'input' as (a0:int, a1:int); grunt b = foreach a generate TOBAG(a0, a1); grunt describe b; b: {{int}} {code} but not in the second case: {code} grunt a = load 'input' as (a0:int, a1:int); grunt c = group a by a0 ; grunt b = foreach c generate TOBAG(a.a0, a.a1); grunt describe b; b: {{NULL}} {code} TOBAG udfs ignores columns with null value; it does not use input type to determine output schema --- Key: PIG-1656 URL: https://issues.apache.org/jira/browse/PIG-1656 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1656.1.patch TOBAG udf ignores columns with null value {code} R4= foreach B generate $0, TOBAG( id, null, id,null ); grunt dump R4; 1000{(1),(1)} 1000{(2),(2)} 1000{(3),(3)} 1000{(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1542) log level not propogated to MR task loggers
[ https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917058#action_12917058 ] Thejas M Nair commented on PIG-1542: Comment on the patch - In case of log level settings, it is not possible to override the config file setting using command line options. In other cases, the command line values usually override what is specified in configuration file. For example in case of hadoop properties, this is what happens. This is also very convenient, because you can easily change the setting for a particular invocation of pig. You don't have to change config file which you might potentially share with other users. log level not propogated to MR task loggers --- Key: PIG-1542 URL: https://issues.apache.org/jira/browse/PIG-1542 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch Specifying -d DEBUG does not affect the logging of the MR tasks . This was fixed earlier in PIG-882 . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site
[ https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917064#action_12917064 ] Alan Gates commented on PIG-1661: - I have a few questions about this: # Can you tell me what's better about using search-hadoop than google? # What all are you indexing? How does this differ from what google is indexing? # How often do you refresh your index? That is, how long will it take changes to show up in your search? Add alternative search-provider to Pig site --- Key: PIG-1661 URL: https://issues.apache.org/jira/browse/PIG-1661 Project: Pig Issue Type: Improvement Components: documentation Reporter: Alex Baranau Priority: Minor Attachments: PIG-1661.patch Use search-hadoop.com service to make available search in Pig sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) via AVRO-626 so this issue is about enabling it for Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1531) Pig gobbles up error messages
[ https://issues.apache.org/jira/browse/PIG-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1531: -- Status: Resolved (was: Patch Available) Resolution: Fixed Committed to both trunk and 0.8. Thanks, Niraj! Pig gobbles up error messages - Key: PIG-1531 URL: https://issues.apache.org/jira/browse/PIG-1531 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: niraj rai Fix For: 0.8.0 Attachments: pig-1531_3.patch, pig-1531_4.patch, PIG-1531_5.patch, PIG_1531.patch, PIG_1531_2.patch Consider the following. I have my own Storer implementing StoreFunc and I am throwing FrontEndException (and other Exceptions derived from PigException) in its various methods. I expect those error messages to be shown in error scenarios. Instead Pig gobbles up my error messages and shows its own generic error message like: {code} 010-07-31 14:14:25,414 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2116: Unexpected error. Could not validate the output specification for: default.partitoned Details at logfile: /Users/ashutosh/workspace/pig/pig_1280610650690.log {code} Instead I expect it to display my error messages which it stores away in that log file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1542) log level not propogated to MR task loggers
[ https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917079#action_12917079 ] Daniel Dai commented on PIG-1542: - Yes, -d xxx should treat as -Ddebug=xxx. And system properties already have higher priority in the current code. (And in my mind, we should deprecate -d in favor of -Ddebug) log level not propogated to MR task loggers --- Key: PIG-1542 URL: https://issues.apache.org/jira/browse/PIG-1542 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch Specifying -d DEBUG does not affect the logging of the MR tasks . This was fixed earlier in PIG-882 . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917081#action_12917081 ] Alan Gates commented on PIG-1565: - [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 8 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] The javadoc warning is: [javadoc] /home/gates/src/pig/PIG-1565/trunk/src/org/apache/pig/builtin/INDEXOF.java:78: warning - Tag @link: can't find INDEX_OF(int, int) in java.lang.String Building Piggybank now fails as well, since some of the ErrorCatchingBase class was moved into main Pig. Also, the patch fails a couple of unit tests in TestStringUDFs. It fails testIndexOf and testLastIndexOf() because it doesn't properly handle the null case. I'll attach the output from running the tests. additional piggybank datetime and string UDFs - Key: PIG-1565 URL: https://issues.apache.org/jira/browse/PIG-1565 Project: Pig Issue Type: Improvement Reporter: Andrew Hitchcock Assignee: Andrew Hitchcock Fix For: 0.8.0 Attachments: PIG-1565-1.patch, PIG-1565-2.patch Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1565: Status: Open (was: Patch Available) additional piggybank datetime and string UDFs - Key: PIG-1565 URL: https://issues.apache.org/jira/browse/PIG-1565 Project: Pig Issue Type: Improvement Reporter: Andrew Hitchcock Assignee: Andrew Hitchcock Fix For: 0.8.0 Attachments: PIG-1565-1.patch, PIG-1565-2.patch Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema
[ https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1656: --- Attachment: PIG-1656.2.patch PIG-1656.2.patch Updated patch to include documentation of details of output schema generation. TOBAG udfs ignores columns with null value; it does not use input type to determine output schema --- Key: PIG-1656 URL: https://issues.apache.org/jira/browse/PIG-1656 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1656.1.patch, PIG-1656.2.patch TOBAG udf ignores columns with null value {code} R4= foreach B generate $0, TOBAG( id, null, id,null ); grunt dump R4; 1000{(1),(1)} 1000{(2),(2)} 1000{(3),(3)} 1000{(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site
[ https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917103#action_12917103 ] Otis Gospodnetic commented on PIG-1661: --- search-hadoop.com (SH) is indexing: * User dev ML messages * JIRA issues * Wiki * Web site * Source code * Javadocs I think Google is only indexing the web site and probably the Wiki. SH has a nice auto-complete/suggest-as-you-type functionality, facets (project, data type, author), different sorting options, source code with syntax highlighting SH index is continuously refreshed (new documents added, deleted ones removed, etc.) and new changes become visible in search results every 10 minutes. Add alternative search-provider to Pig site --- Key: PIG-1661 URL: https://issues.apache.org/jira/browse/PIG-1661 Project: Pig Issue Type: Improvement Components: documentation Reporter: Alex Baranau Priority: Minor Attachments: PIG-1661.patch Use search-hadoop.com service to make available search in Pig sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) via AVRO-626 so this issue is about enabling it for Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema
[ https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917108#action_12917108 ] Richard Ding commented on PIG-1656: --- +1 TOBAG udfs ignores columns with null value; it does not use input type to determine output schema --- Key: PIG-1656 URL: https://issues.apache.org/jira/browse/PIG-1656 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1656.1.patch, PIG-1656.2.patch TOBAG udf ignores columns with null value {code} R4= foreach B generate $0, TOBAG( id, null, id,null ); grunt dump R4; 1000{(1),(1)} 1000{(2),(2)} 1000{(3),(3)} 1000{(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema
[ https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1656: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to 0.8 branch and trunk. TOBAG udfs ignores columns with null value; it does not use input type to determine output schema --- Key: PIG-1656 URL: https://issues.apache.org/jira/browse/PIG-1656 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1656.1.patch, PIG-1656.2.patch TOBAG udf ignores columns with null value {code} R4= foreach B generate $0, TOBAG( id, null, id,null ); grunt dump R4; 1000{(1),(1)} 1000{(2),(2)} 1000{(3),(3)} 1000{(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site
[ https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917117#action_12917117 ] Alan Gates commented on PIG-1661: - I like that JIRA, source code, javadocs, etc. get added in. So I'm willing to switch. Anyone else have an opinion? Add alternative search-provider to Pig site --- Key: PIG-1661 URL: https://issues.apache.org/jira/browse/PIG-1661 Project: Pig Issue Type: Improvement Components: documentation Reporter: Alex Baranau Priority: Minor Attachments: PIG-1661.patch Use search-hadoop.com service to make available search in Pig sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) via AVRO-626 so this issue is about enabling it for Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input
[ https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1649: --- Status: Resolved (was: Patch Available) Resolution: Fixed unit tests passed. PIG-1649.5.patch committed to trunk and 0.8 branch. FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, PIG-1649.4.patch, PIG-1649.5.patch In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1629) Need ability to limit bags produced during GROUP + LIMIT
[ https://issues.apache.org/jira/browse/PIG-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916538#action_12916538 ] Thejas M Nair commented on PIG-1629: Similar optimization can be done for inner filter as well - C = foreach B{ D = filter A by x 0; generate group, MyUDF(D);} Changes required- - group physical/MR plan implementation to have an inner limit/filter. - logical optimizer rules to make the load/filter an inner plan of groupp Need ability to limit bags produced during GROUP + LIMIT Key: PIG-1629 URL: https://issues.apache.org/jira/browse/PIG-1629 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Thejas M Nair Fix For: 0.9.0 Currently, the code below will construct the full group in memory and then trim it. This requires in use of more memory than needed. A = load 'data' as (x, y, z); B = group A by x; C = foreach B{ D = limit A 100; generate group, MyUDF(D);} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
[ https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1650: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Niraj confirmed that unit tests and test-patch has succeded. Patch looks good. +1 . Committed to trunk and 0.8 branch. pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc - Key: PIG-1650 URL: https://issues.apache.org/jira/browse/PIG-1650 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1650_0.patch, PIG-1650_1.patch, PIG-1650_2.patch grunt shell breaks for many unix xommands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1657) reduce the ivy verbosity during build.
reduce the ivy verbosity during build. -- Key: PIG-1657 URL: https://issues.apache.org/jira/browse/PIG-1657 Project: Pig Issue Type: Improvement Reporter: Giridharan Kesavan ivy is very verbose while doing build, making it less verbose would let us see what the builds actually does.. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Reporter: Yan Zhou In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1658: -- Fix Version/s: 0.8.0 Affects Version/s: 0.8.0 ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1658: - Assignee: Yan Zhou ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1607) pig should have separate javadoc.jar in the maven repository
[ https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916594#action_12916594 ] Giridharan Kesavan commented on PIG-1607: - looks good +1 able to do mvn-install and mvn-deploy to install/deploy javadoc jar to the fs and apache mvn repo. pig should have separate javadoc.jar in the maven repository Key: PIG-1607 URL: https://issues.apache.org/jira/browse/PIG-1607 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, PIG-1607_3.patch, PIG-1607_4.patch At this moment, javadoc is part of the source.jar but pig should have separate javadoc.jar in the maven repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1659) sortinfo is not set for store if there is a filter after ORDER BY
sortinfo is not set for store if there is a filter after ORDER BY - Key: PIG-1659 URL: https://issues.apache.org/jira/browse/PIG-1659 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Daniel Dai Fix For: 0.8.0 This has caused 6 (of 7) failures in the Zebra test TestOrderPreserveVariableTable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1651) PIG class loading mishandled
[ https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1651: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed PIG class loading mishandled Key: PIG-1651 URL: https://issues.apache.org/jira/browse/PIG-1651 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1651.patch If just having zebra.jar as being registered in a PIG script but not in the CLASSPATH, the query using zebra fails since there appear to be multiple classes loaded into JVM, causing static variable set previously not seen after one instance of the class is created through reflection. (After the zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is as follows: ackend error message during job submission --- org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hdfs://hostname/pathto/zebra_dir :: null at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123) at org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718) at org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) ... 7 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1655) code duplicated for udfs that were moved from piggybank to builtin
[ https://issues.apache.org/jira/browse/PIG-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai reassigned PIG-1655: -- Assignee: niraj rai (was: Thejas M Nair) code duplicated for udfs that were moved from piggybank to builtin -- Key: PIG-1655 URL: https://issues.apache.org/jira/browse/PIG-1655 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: niraj rai Fix For: 0.8.0 As part of PIG-1405, some udfs from piggybank were made standard udfs. But now the code is duplicated in piggybank and org.apache.pig.builtin. . This can cause confusion. I am planning to make these udfs in piggybank subclasses of those in org.apache.pig.builtin. so that users don't have to change their scripts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1607) pig should have separate javadoc.jar in the maven repository
[ https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1607: --- Fix Version/s: 0.8.0 Affects Version/s: 0.8.0 pig should have separate javadoc.jar in the maven repository Key: PIG-1607 URL: https://issues.apache.org/jira/browse/PIG-1607 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: niraj rai Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, PIG-1607_3.patch, PIG-1607_4.patch At this moment, javadoc is part of the source.jar but pig should have separate javadoc.jar in the maven repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1607) pig should have separate javadoc.jar in the maven repository
[ https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1607: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to 0.8 branch and trunk. pig should have separate javadoc.jar in the maven repository Key: PIG-1607 URL: https://issues.apache.org/jira/browse/PIG-1607 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: niraj rai Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, PIG-1607_3.patch, PIG-1607_4.patch At this moment, javadoc is part of the source.jar but pig should have separate javadoc.jar in the maven repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1638) sh output gets mixed up with the grunt prompt
[ https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916725#action_12916725 ] Daniel Dai commented on PIG-1638: - +1 sh output gets mixed up with the grunt prompt - Key: PIG-1638 URL: https://issues.apache.org/jira/browse/PIG-1638 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.8.0 Reporter: niraj rai Assignee: niraj rai Priority: Minor Fix For: 0.8.0 Attachments: PIG-1638_0.patch Many times, the grunt prompt gets mixed up with the sh output.e.g. grunt sh ls 000 autocomplete bin build build.xml grunt CHANGES.txt conf contrib In the above case, grunt is mixed up with the output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1297) algebraic interface of udf does not get used if the foreach with udf projects column within group
[ https://issues.apache.org/jira/browse/PIG-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved PIG-1297. Resolution: Duplicate algebraic interface of udf does not get used if the foreach with udf projects column within group - Key: PIG-1297 URL: https://issues.apache.org/jira/browse/PIG-1297 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.9.0 grunt l = load 'file' as (a,b,c); grunt g = group l by (a,b); grunt f = foreach g generate SUM(l.c), group.a; grunt explain f; ... ... #-- # Map Reduce Plan #-- MapReduce node 1-752 Map Plan Local Rearrange[tuple]{tuple}(false) - 1-742 | | | Project[bytearray][0] - 1-743 | | | Project[bytearray][1] - 1-744 | |---Load(file:///Users/tejas/pig/trunk/file:org.apache.pig.builtin.PigStorage) - 1-739 Reduce Plan Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-751 | |---New For Each(false,false)[bag] - 1-750 | | | POUserFunc(org.apache.pig.builtin.SUM)[double] - 1-747 | | | |---Project[bag][2] - 1-746 | | | |---Project[bag][1] - 1-745 | | | Project[bytearray][0] - 1-749 | | | |---Project[tuple][0] - 1-748 | |---Package[tuple]{tuple} - 1-741 Global sort: false -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1638) sh output gets mixed up with the grunt prompt
[ https://issues.apache.org/jira/browse/PIG-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1638: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to both trunk and 0.8 branch. sh output gets mixed up with the grunt prompt - Key: PIG-1638 URL: https://issues.apache.org/jira/browse/PIG-1638 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.8.0 Reporter: niraj rai Assignee: niraj rai Priority: Minor Fix For: 0.8.0 Attachments: PIG-1638_0.patch Many times, the grunt prompt gets mixed up with the sh output.e.g. grunt sh ls 000 autocomplete bin build build.xml grunt CHANGES.txt conf contrib In the above case, grunt is mixed up with the output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1660) Consider passing result of COUNT/COUNT_STAR to LIMIT
Consider passing result of COUNT/COUNT_STAR to LIMIT - Key: PIG-1660 URL: https://issues.apache.org/jira/browse/PIG-1660 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Viraj Bhat Fix For: 0.9.0 In realistic scenarios we need to split a dataset into segments by using LIMIT, and like to achieve that goal within the same pig script. Here is a case: {code} A = load '$DATA' using PigStorage(',') as (id, pvs); B = group A by ALL; C = foreach B generate COUNT_STAR(A) as row_cnt; -- get the low 50% segment D = order A by pvs; E = limit D (C.row_cnt * 0.2); store E in '$Eoutput'; -- get the high 20% segment F = order A by pvs DESC; G = limit F (C.row_cnt * 0.2); store G in '$Goutput'; {code} Since LIMIT only accepts constants, we have to split the operation to two steps in order to pass in the constants for the LIMIT statements. Please consider bringing this feature in so the processing can be more efficient. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1658: -- Status: Patch Available (was: Open) ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1658.patch In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1658) ORDER BY does not work properly on integer/short keys that are -1
[ https://issues.apache.org/jira/browse/PIG-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1658: -- Attachment: PIG-1658.patch This problem is caused by the PIG-1295 patch. test-core pass. Zebra's nightly pass too. test-patch output: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Zebra's TestMergeJoinPartial is used to verify the fix. ORDER BY does not work properly on integer/short keys that are -1 - Key: PIG-1658 URL: https://issues.apache.org/jira/browse/PIG-1658 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1658.patch In fact, all these types of keys of values that are negative but within the byte or short's range would have the problem. Basic cally, a byte value of -1 0xff will return 255 not -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input
[ https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1649: --- Status: Patch Available (was: Open) Hadoop Flags: [Reviewed] FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, PIG-1649.4.patch In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
[ https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1650: --- Status: Open (was: Patch Available) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc - Key: PIG-1650 URL: https://issues.apache.org/jira/browse/PIG-1650 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1650_0.patch grunt shell breaks for many unix xommands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
[ https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1650: --- Attachment: PIG-1650_1.patch pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc - Key: PIG-1650 URL: https://issues.apache.org/jira/browse/PIG-1650 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1650_0.patch, PIG-1650_1.patch grunt shell breaks for many unix xommands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
[ https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1650: --- Status: Patch Available (was: Open) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc - Key: PIG-1650 URL: https://issues.apache.org/jira/browse/PIG-1650 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1650_0.patch, PIG-1650_1.patch grunt shell breaks for many unix xommands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1656) TOBAG udfs ignores columns with null value; it does not use input type to determine output schema
[ https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1656: --- Summary: TOBAG udfs ignores columns with null value; it does not use input type to determine output schema (was: TOBAG TOTUPLE udfs ignores columns with null value; TOBAG does not use input type to determine output schema) Description: TOBAG udf ignores columns with null value {code} R4= foreach B generate $0, TOBAG( id, null, id,null ); grunt dump R4; 1000{(1),(1)} 1000{(2),(2)} 1000{(3),(3)} 1000{(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} was: TOBAG TOTUPLE udfs ignores columns with null value {code} R4= foreach B generate $0, TOTUPLE(null, id, null), TOBAG( id, null, id,null ); grunt dump R4; 1000(,1,) {(1),(1)} 1000(,2,) {(2),(2)} 1000(,3,) {(3),(3)} 1000(,4,) {(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} TOBAG udfs ignores columns with null value; it does not use input type to determine output schema --- Key: PIG-1656 URL: https://issues.apache.org/jira/browse/PIG-1656 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 TOBAG udf ignores columns with null value {code} R4= foreach B generate $0, TOBAG( id, null, id,null ); grunt dump R4; 1000{(1),(1)} 1000{(2),(2)} 1000{(3),(3)} 1000{(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
[ https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1650: --- Attachment: PIG-1650_2.patch pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc - Key: PIG-1650 URL: https://issues.apache.org/jira/browse/PIG-1650 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1650_0.patch, PIG-1650_1.patch, PIG-1650_2.patch grunt shell breaks for many unix xommands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1607) pig should have separate javadoc.jar in the maven repository
[ https://issues.apache.org/jira/browse/PIG-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1607: --- Status: Patch Available (was: Open) pig should have separate javadoc.jar in the maven repository Key: PIG-1607 URL: https://issues.apache.org/jira/browse/PIG-1607 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1607_0.patch, PIG-1607_1.patch, PIG-1607_2.patch, PIG-1607_3.patch, PIG-1607_4.patch At this moment, javadoc is part of the source.jar but pig should have separate javadoc.jar in the maven repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1648) Split combination may return too many block locations to map/reduce framework
[ https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915815#action_12915815 ] Yan Zhou commented on PIG-1648: --- Top 5 locations with most data will be used. This has been agreed upon by the M/R dev. Split combination may return too many block locations to map/reduce framework - Key: PIG-1648 URL: https://issues.apache.org/jira/browse/PIG-1648 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 For instance, if a small split has block locations h1, h2 and h3; another small split has h1, h3, h4. After combination, the composite split contains 4 block locations. If the number of component splits is big, then the number of block locations could be big too. In fact, the number of block locations serves as a hint to M/R as the best hosts this composite split should be run on so the list should contain a short list, say 5, of the hosts that contain the most data in this composite split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input
[ https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1649: --- Attachment: PIG-1649.1.patch Patch passes unit tests and test-patch . FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1649.1.patch In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input
[ https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1649: --- Status: Patch Available (was: Open) FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1649.1.patch In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1648) Split combination may return too many block locations to map/reduce framework
[ https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915852#action_12915852 ] Yan Zhou commented on PIG-1648: --- test-patch results: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. test-core tests pass too. Split combination may return too many block locations to map/reduce framework - Key: PIG-1648 URL: https://issues.apache.org/jira/browse/PIG-1648 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1648.patch For instance, if a small split has block locations h1, h2 and h3; another small split has h1, h3, h4. After combination, the composite split contains 4 block locations. If the number of component splits is big, then the number of block locations could be big too. In fact, the number of block locations serves as a hint to M/R as the best hosts this composite split should be run on so the list should contain a short list, say 5, of the hosts that contain the most data in this composite split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug
TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug Key: PIG-1652 URL: https://issues.apache.org/jira/browse/PIG-1652 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Fix For: 0.8.0 TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias records3 at org.apache.pig.PigServer.storeEx(PigServer.java:877) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 69: org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file: at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.init(Path.java:50) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 69: org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file: at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parse(URI.java:3009) at java.net.URI.init(URI.java:736) at org.apache.hadoop.fs.Path.initialize(Path.java:137) The reason is we are trying
[jira] Commented: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug
[ https://issues.apache.org/jira/browse/PIG-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915860#action_12915860 ] Olga Natkovich commented on PIG-1652: - I think the code needs to be modified to default to 1 if we can't perform the computation TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug Key: PIG-1652 URL: https://issues.apache.org/jira/browse/PIG-1652 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Fix For: 0.8.0 TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias records3 at org.apache.pig.PigServer.storeEx(PigServer.java:877) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 69: org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file: at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.init(Path.java:50) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) Caused by: java.net.URISyntaxException: Illegal
[jira] Assigned: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug
[ https://issues.apache.org/jira/browse/PIG-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-1652: --- Assignee: Thejas M Nair TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug Key: PIG-1652 URL: https://issues.apache.org/jira/browse/PIG-1652 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Thejas M Nair Fix For: 0.8.0 TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias records3 at org.apache.pig.PigServer.storeEx(PigServer.java:877) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 69: org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file: at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.init(Path.java:50) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 69
[jira] Created: (PIG-1653) Scripting UDF fails if the path to script is an absolute path
Scripting UDF fails if the path to script is an absolute path - Key: PIG-1653 URL: https://issues.apache.org/jira/browse/PIG-1653 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Fix For: 0.8.0 The following script fail: {code} register '/homes/jianyong/pig/aaa/scriptingudf.py' using jython as myfuncs; a = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage() as (name, age, gpa:double); b = foreach a generate myfuncs.square(gpa); dump b; {code} If we change the register to use relative path (such as aaa/scriptingudf.py), it success. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function
[ https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915880#action_12915880 ] Daniel Dai commented on PIG-1637: - test-patch result for PIG-1637-2.patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Combiner not use because optimizor inserts a foreach between group and algebric function Key: PIG-1637 URL: https://issues.apache.org/jira/browse/PIG-1637 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1637-1.patch, PIG-1637-2.patch The following script does not use combiner after new optimization change. {code} A = load ':INPATH:/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = foreach A generate user, (int)timespent as timespent, (double)estimated_revenue as estimated_revenue; C = group B all; D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue); store D into ':OUTPATH:'; {code} This is because after group, optimizer detect group key is not used afterward, it add a foreach statement after C. This is how it looks like after optimization: {code} A = load ':INPATH:/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = foreach A generate user, (int)timespent as timespent, (double)estimated_revenue as estimated_revenue; C = group B all; C1 = foreach C generate B; D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue); store D into ':OUTPATH:'; {code} That cancel the combiner optimization for D. The way to solve the issue is to merge the C1 we inserted and D. Currently, we do not merge these two foreach. The reason is that one output of the first foreach (B) is referred twice in D, and currently rule assume after merge, we need to calculate B twice in D. Actually, C1 is only doing projection, no calculation of B. Merging C1 and D will not result calculating B twice. So C1 and D should be merged. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1648) Split combination may return too many block locations to map/reduce framework
[ https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915889#action_12915889 ] Richard Ding commented on PIG-1648: --- +1 Split combination may return too many block locations to map/reduce framework - Key: PIG-1648 URL: https://issues.apache.org/jira/browse/PIG-1648 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1648.patch For instance, if a small split has block locations h1, h2 and h3; another small split has h1, h3, h4. After combination, the composite split contains 4 block locations. If the number of component splits is big, then the number of block locations could be big too. In fact, the number of block locations serves as a hint to M/R as the best hosts this composite split should be run on so the list should contain a short list, say 5, of the hosts that contain the most data in this composite split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1648) Split combination may return too many block locations to map/reduce framework
[ https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1648: -- Status: Resolved (was: Patch Available) Resolution: Fixed Patch committed to both trunk and the 0.8 branch. Split combination may return too many block locations to map/reduce framework - Key: PIG-1648 URL: https://issues.apache.org/jira/browse/PIG-1648 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1648.patch For instance, if a small split has block locations h1, h2 and h3; another small split has h1, h3, h4. After combination, the composite split contains 4 block locations. If the number of component splits is big, then the number of block locations could be big too. In fact, the number of block locations serves as a hint to M/R as the best hosts this composite split should be run on so the list should contain a short list, say 5, of the hosts that contain the most data in this composite split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1648) Split combination may return too many block locations to map/reduce framework
[ https://issues.apache.org/jira/browse/PIG-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1648: -- Status: Patch Available (was: Open) Split combination may return too many block locations to map/reduce framework - Key: PIG-1648 URL: https://issues.apache.org/jira/browse/PIG-1648 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1648.patch For instance, if a small split has block locations h1, h2 and h3; another small split has h1, h3, h4. After combination, the composite split contains 4 block locations. If the number of component splits is big, then the number of block locations could be big too. In fact, the number of block locations serves as a hint to M/R as the best hosts this composite split should be run on so the list should contain a short list, say 5, of the hosts that contain the most data in this composite split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput
[ https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915941#action_12915941 ] Daniel Dai commented on PIG-1579: - Rollback the change and run test many times, all tests pass. Seems some change between r990721 and now (r1002348) fix this issue. Will rollback the change and close the Jira. Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput --- Key: PIG-1579 URL: https://issues.apache.org/jira/browse/PIG-1579 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1579-1.patch Error message: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error executing function: Traceback (most recent call last): File iostream, line 5, in multStr TypeError: can't multiply sequence by non-int of type 'NoneType' at org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1651) PIG class loading mishandled
[ https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915945#action_12915945 ] Richard Ding commented on PIG-1651: --- The problem here is that PigContext uses LogicalPlanBuilder.classloader to instantiate the LoadFuncs, but the context ClassLoader for the Thread uses a different class loader, and hence the static variable set for the class loaded by one loader is not visible by the class loaded by the other loader. The solution is to use the same LogicalPlanBuilder.classloader as the context ClassLoader for the Thread. PIG class loading mishandled Key: PIG-1651 URL: https://issues.apache.org/jira/browse/PIG-1651 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Richard Ding Fix For: 0.8.0 If just having zebra.jar as being registered in a PIG script but not in the CLASSPATH, the query using zebra fails since there appear to be multiple classes loaded into JVM, causing static variable set previously not seen after one instance of the class is created through reflection. (After the zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is as follows: ackend error message during job submission --- org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hdfs://hostname/pathto/zebra_dir :: null at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123) at org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718) at org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) ... 7 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function
[ https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915950#action_12915950 ] Daniel Dai commented on PIG-1637: - Yes, it could be improved as per Xuefu's suggestion. Anyway, current patch solve the combiner not used issue, will commit this part first. I will open another Jira to improve it. Also, MergeForEach is a best example to practice cloning framework [PIG-1587|https://issues.apache.org/jira/browse/PIG-1587], so it is better to improve it once PIG-1587 is available. Combiner not use because optimizor inserts a foreach between group and algebric function Key: PIG-1637 URL: https://issues.apache.org/jira/browse/PIG-1637 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1637-1.patch, PIG-1637-2.patch The following script does not use combiner after new optimization change. {code} A = load ':INPATH:/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = foreach A generate user, (int)timespent as timespent, (double)estimated_revenue as estimated_revenue; C = group B all; D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue); store D into ':OUTPATH:'; {code} This is because after group, optimizer detect group key is not used afterward, it add a foreach statement after C. This is how it looks like after optimization: {code} A = load ':INPATH:/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = foreach A generate user, (int)timespent as timespent, (double)estimated_revenue as estimated_revenue; C = group B all; C1 = foreach C generate B; D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue); store D into ':OUTPATH:'; {code} That cancel the combiner optimization for D. The way to solve the issue is to merge the C1 we inserted and D. Currently, we do not merge these two foreach. The reason is that one output of the first foreach (B) is referred twice in D, and currently rule assume after merge, we need to calculate B twice in D. Actually, C1 is only doing projection, no calculation of B. Merging C1 and D will not result calculating B twice. So C1 and D should be merged. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input
[ https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1649: --- Attachment: PIG-1649.2.patch PIG-1649.2.patch Addressing review comments from Richard - pointed out that that hdfs Path class constructor can fail on valid Uri like the format used for jdbc. So this patch checks if the input location uri has a hdfs scheme before using the hdfs Path constructor. - The code here can run into same problem as one in PIG-1652. The patch also includes changes to handle comma separated file names. A better long term solution would be to have support in LoadFunc or related interfaces to check the input size and to check if parts of the file should be consolidated. FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1649.1.patch, PIG-1649.2.patch In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input
[ https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1649: --- Status: Open (was: Patch Available) The patch also includes changes to fix the issue in PIG-1652 , since FRJoin code path also faces similar issue. FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1649.1.patch, PIG-1649.2.patch In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1652) TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug
[ https://issues.apache.org/jira/browse/PIG-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved PIG-1652. Resolution: Duplicate Marking as duplicate of PIG-1649 because the code path to consolidate input files in FRJoin also has the same issue. TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug Key: PIG-1652 URL: https://issues.apache.org/jira/browse/PIG-1652 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Thejas M Nair Fix For: 0.8.0 TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias records3 at org.apache.pig.PigServer.storeEx(PigServer.java:877) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 69: org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file: at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.init(Path.java:50) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) Caused
[jira] Updated: (PIG-1651) PIG class loading mishandled
[ https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1651: -- Status: Patch Available (was: Open) PIG class loading mishandled Key: PIG-1651 URL: https://issues.apache.org/jira/browse/PIG-1651 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1651.patch If just having zebra.jar as being registered in a PIG script but not in the CLASSPATH, the query using zebra fails since there appear to be multiple classes loaded into JVM, causing static variable set previously not seen after one instance of the class is created through reflection. (After the zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is as follows: ackend error message during job submission --- org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hdfs://hostname/pathto/zebra_dir :: null at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123) at org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718) at org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) ... 7 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1651) PIG class loading mishandled
[ https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1651: -- Attachment: PIG-1651.patch PIG class loading mishandled Key: PIG-1651 URL: https://issues.apache.org/jira/browse/PIG-1651 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1651.patch If just having zebra.jar as being registered in a PIG script but not in the CLASSPATH, the query using zebra fails since there appear to be multiple classes loaded into JVM, causing static variable set previously not seen after one instance of the class is created through reflection. (After the zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is as follows: ackend error message during job submission --- org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hdfs://hostname/pathto/zebra_dir :: null at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123) at org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718) at org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) ... 7 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1651) PIG class loading mishandled
[ https://issues.apache.org/jira/browse/PIG-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915959#action_12915959 ] Daniel Dai commented on PIG-1651: - +1 PIG class loading mishandled Key: PIG-1651 URL: https://issues.apache.org/jira/browse/PIG-1651 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1651.patch If just having zebra.jar as being registered in a PIG script but not in the CLASSPATH, the query using zebra fails since there appear to be multiple classes loaded into JVM, causing static variable set previously not seen after one instance of the class is created through reflection. (After the zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is as follows: ackend error message during job submission --- org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hdfs://hostname/pathto/zebra_dir :: null at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123) at org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718) at org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) ... 7 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1654) Pig should check schema alias duplication at any levels.
Pig should check schema alias duplication at any levels. Key: PIG-1654 URL: https://issues.apache.org/jira/browse/PIG-1654 Project: Pig Issue Type: Bug Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.9.0 The following script appears valid to Pig but it shouldn't: A = load 'file' as (a:tuple( u:int, u:bytearray, w:long), b:int, c:chararray); dump A; Pig tries to launch map/reduce jobs for this. However, for the following script, Pig correctly reports error message: A = load 'file' as (a:int, b:long, c:bytearray); dump A; Error message is: 2010-09-28 15:53:37,390 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108: Duplicate schema alias: b in A Thus, Pig only checks alias duplication at the top level, which is confirmed by looking at the code. The right behavior is that the same check should be applied at all levels. This should be addressed in the new parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1654) Pig should check schema alias duplication at any levels.
[ https://issues.apache.org/jira/browse/PIG-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1654: - Description: The following script appears valid to Pig but it shouldn't: A = load 'file' as (a:tuple( u:int, u:bytearray, w:long), b:int, c:chararray); dump A; Pig tries to launch map/reduce jobs for this. However, for the following script, Pig correctly reports error message: A = load 'file' as (a:int, a:long, c:bytearray); dump A; Error message is: 2010-09-28 15:53:37,390 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108: Duplicate schema alias: b in A Thus, Pig only checks alias duplication at the top level, which is confirmed by looking at the code. The right behavior is that the same check should be applied at all levels. This should be addressed in the new parser. was: The following script appears valid to Pig but it shouldn't: A = load 'file' as (a:tuple( u:int, u:bytearray, w:long), b:int, c:chararray); dump A; Pig tries to launch map/reduce jobs for this. However, for the following script, Pig correctly reports error message: A = load 'file' as (a:int, b:long, c:bytearray); dump A; Error message is: 2010-09-28 15:53:37,390 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108: Duplicate schema alias: b in A Thus, Pig only checks alias duplication at the top level, which is confirmed by looking at the code. The right behavior is that the same check should be applied at all levels. This should be addressed in the new parser. Pig should check schema alias duplication at any levels. Key: PIG-1654 URL: https://issues.apache.org/jira/browse/PIG-1654 Project: Pig Issue Type: Bug Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.9.0 The following script appears valid to Pig but it shouldn't: A = load 'file' as (a:tuple( u:int, u:bytearray, w:long), b:int, c:chararray); dump A; Pig tries to launch map/reduce jobs for this. However, for the following script, Pig correctly reports error message: A = load 'file' as (a:int, a:long, c:bytearray); dump A; Error message is: 2010-09-28 15:53:37,390 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108: Duplicate schema alias: b in A Thus, Pig only checks alias duplication at the top level, which is confirmed by looking at the code. The right behavior is that the same check should be applied at all levels. This should be addressed in the new parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1655) code duplicated for udfs that were moved from piggybank to builtin
code duplicated for udfs that were moved from piggybank to builtin -- Key: PIG-1655 URL: https://issues.apache.org/jira/browse/PIG-1655 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 As part of PIG-1405, some udfs from piggybank were made standard udfs. But now the code is duplicated in piggybank and org.apache.pig.builtin. . This can cause confusion. I am planning to make these udfs in piggybank subclasses of those in org.apache.pig.builtin. so that users don't have to change their scripts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1656) TOBAG TOTUPLE udfs ignores columns with null value; TOBAG does not use input type to determine output schema
[ https://issues.apache.org/jira/browse/PIG-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1656: --- Fix Version/s: 0.8.0 Affects Version/s: 0.8.0 TOBAG TOTUPLE udfs ignores columns with null value; TOBAG does not use input type to determine output schema --- Key: PIG-1656 URL: https://issues.apache.org/jira/browse/PIG-1656 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 TOBAG TOTUPLE udfs ignores columns with null value {code} R4= foreach B generate $0, TOTUPLE(null, id, null), TOBAG( id, null, id,null ); grunt dump R4; 1000(,1,) {(1),(1)} 1000(,2,) {(2),(2)} 1000(,3,) {(3),(3)} 1000(,4,) {(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1656) TOBAG TOTUPLE udfs ignores columns with null value; TOBAG does not use input type to determine output schema
TOBAG TOTUPLE udfs ignores columns with null value; TOBAG does not use input type to determine output schema --- Key: PIG-1656 URL: https://issues.apache.org/jira/browse/PIG-1656 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair TOBAG TOTUPLE udfs ignores columns with null value {code} R4= foreach B generate $0, TOTUPLE(null, id, null), TOBAG( id, null, id,null ); grunt dump R4; 1000(,1,) {(1),(1)} 1000(,2,) {(2),(2)} 1000(,3,) {(3),(3)} 1000(,4,) {(4),(4)} {code} TOBAG does not use input type to determine output schema {code} grunt B1 = foreach B generate TOBAG( 1, 2, 3); grunt describe B1; B1: {{null}} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1542) log level not propogated to MR task loggers
[ https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1542: --- Attachment: PIG-1542_2.patch log level not propogated to MR task loggers --- Key: PIG-1542 URL: https://issues.apache.org/jira/browse/PIG-1542 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch Specifying -d DEBUG does not affect the logging of the MR tasks . This was fixed earlier in PIG-882 . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1542) log level not propogated to MR task loggers
[ https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1542: --- Status: Open (was: Patch Available) log level not propogated to MR task loggers --- Key: PIG-1542 URL: https://issues.apache.org/jira/browse/PIG-1542 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch Specifying -d DEBUG does not affect the logging of the MR tasks . This was fixed earlier in PIG-882 . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1542) log level not propogated to MR task loggers
[ https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1542: --- Status: Patch Available (was: Open) log level not propogated to MR task loggers --- Key: PIG-1542 URL: https://issues.apache.org/jira/browse/PIG-1542 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: niraj rai Fix For: 0.8.0 Attachments: PIG-1542.patch, PIG-1542_1.patch, PIG-1542_2.patch Specifying -d DEBUG does not affect the logging of the MR tasks . This was fixed earlier in PIG-882 . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1649) FRJoin fails to compute number of input files for replicated input
[ https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1649: --- Attachment: PIG-1649.4.patch New patch addressing comments from Richard - In UriUtil.isHDFSFile(String uri) return false if uri is null - Modified a test in TestFRJoin2 to use comma separated file name. FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, PIG-1649.4.patch In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1649) FRJoin fails to compute number of input files for replicated input
[ https://issues.apache.org/jira/browse/PIG-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915985#action_12915985 ] Richard Ding commented on PIG-1649: --- +1. Looks good. FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1649.1.patch, PIG-1649.2.patch, PIG-1649.3.patch, PIG-1649.4.patch In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function
[ https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-1637. - Hadoop Flags: [Reviewed] Resolution: Fixed All tests pass except for TestSortedTableUnion / TestSortedTableUnionMergeJoin for zebra, which are already fail and will be addressed by [PIG-1649|https://issues.apache.org/jira/browse/PIG-1649]. Patch committed to both trunk and 0.8 branch. Combiner not use because optimizor inserts a foreach between group and algebric function Key: PIG-1637 URL: https://issues.apache.org/jira/browse/PIG-1637 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1637-1.patch, PIG-1637-2.patch The following script does not use combiner after new optimization change. {code} A = load ':INPATH:/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = foreach A generate user, (int)timespent as timespent, (double)estimated_revenue as estimated_revenue; C = group B all; D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue); store D into ':OUTPATH:'; {code} This is because after group, optimizer detect group key is not used afterward, it add a foreach statement after C. This is how it looks like after optimization: {code} A = load ':INPATH:/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = foreach A generate user, (int)timespent as timespent, (double)estimated_revenue as estimated_revenue; C = group B all; C1 = foreach C generate B; D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue); store D into ':OUTPATH:'; {code} That cancel the combiner optimization for D. The way to solve the issue is to merge the C1 we inserted and D. Currently, we do not merge these two foreach. The reason is that one output of the first foreach (B) is referred twice in D, and currently rule assume after merge, we need to calculate B twice in D. Actually, C1 is only doing projection, no calculation of B. Merging C1 and D will not result calculating B twice. So C1 and D should be merged. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism
[ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1642: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to both trunk and 0.8 branch. Order by doesn't use estimation to determine the parallelism Key: PIG-1642 URL: https://issues.apache.org/jira/browse/PIG-1642 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch With PIG-1249, a simple heuristic is used to determine the number of reducers if it isn't specified (via PARALLEL or default_parallel). For order by statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1647) Logical simplifier throws a NPE
[ https://issues.apache.org/jira/browse/PIG-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915365#action_12915365 ] Daniel Dai commented on PIG-1647: - +1. Please commit. Logical simplifier throws a NPE --- Key: PIG-1647 URL: https://issues.apache.org/jira/browse/PIG-1647 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1647.patch, PIG-1647.patch A query like: A = load 'd.txt' as (a:chararray, b:long, c:map[], d:chararray, e:chararray); B = filter A by a == 'v' and b == 117L and c#'p1' == 'h' and c#'p2' == 'to' and ((d is not null and d != '') or (e is not null and e != '')); will cause the logical expression simplifier to throw a NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1647) Logical simplifier throws a NPE
[ https://issues.apache.org/jira/browse/PIG-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1647: -- Status: Resolved (was: Patch Available) Resolution: Fixed Patch committed to both trunk and the 0.8 branch. Logical simplifier throws a NPE --- Key: PIG-1647 URL: https://issues.apache.org/jira/browse/PIG-1647 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1647.patch, PIG-1647.patch A query like: A = load 'd.txt' as (a:chararray, b:long, c:map[], d:chararray, e:chararray); B = filter A by a == 'v' and b == 117L and c#'p1' == 'h' and c#'p2' == 'to' and ((d is not null and d != '') or (e is not null and e != '')); will cause the logical expression simplifier to throw a NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1641) Incorrect counters in local mode
[ https://issues.apache.org/jira/browse/PIG-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915408#action_12915408 ] Ashutosh Chauhan commented on PIG-1641: --- Tested manually for local mode. Messages were same as proposed above. +1 for the commit. One minor suggestion is to put a line at the start saying something like: Detected Local mode. Stats reported below may be incomplete. This will reinforce the message to users that stats reporting is not transparent across different modes (local Vs map-reduce). Incorrect counters in local mode Key: PIG-1641 URL: https://issues.apache.org/jira/browse/PIG-1641 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1641.patch User report, not verified. email HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 0.20.20.8.0-SNAPSHOTuser2010-09-21 19:25:582010-09-21 21:58:42ORDER_BY Success! Job Stats (time in seconds): JobIdMapsReducesMaxMapTimeMinMapTImeAvgMapTime MaxReduceTimeMinReduceTimeAvgReduceTimeAliasFeatureOutputs job_local_000100000000rawMAP_ONLY job_local_000200000000rank_sort SAMPLER job_local_000300000000rank_sort ORDER_BYProcessed/user_visits_table, Input(s): Successfully read 0 records from: Data/Raw/UserVisits.dat Output(s): Successfully stored 0 records in: Processed/user_visits_table However, when I look in the output: $ ls -lh Processed/user_visits_table/CG0/ total 15250760 -rwxrwxrwx 1 user _lpoperator 7.3G Sep 21 21:58 part-0* It read a 20G input file and generated some output... /email Is it that in local mode counters are not available? If so, instead of printing zeros we should print Information Unavailable or some such. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1641) Incorrect counters in local mode
[ https://issues.apache.org/jira/browse/PIG-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1641: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to both trunk and 0.8 branch. Incorrect counters in local mode Key: PIG-1641 URL: https://issues.apache.org/jira/browse/PIG-1641 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1641.patch User report, not verified. email HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 0.20.20.8.0-SNAPSHOTuser2010-09-21 19:25:582010-09-21 21:58:42ORDER_BY Success! Job Stats (time in seconds): JobIdMapsReducesMaxMapTimeMinMapTImeAvgMapTime MaxReduceTimeMinReduceTimeAvgReduceTimeAliasFeatureOutputs job_local_000100000000rawMAP_ONLY job_local_000200000000rank_sort SAMPLER job_local_000300000000rank_sort ORDER_BYProcessed/user_visits_table, Input(s): Successfully read 0 records from: Data/Raw/UserVisits.dat Output(s): Successfully stored 0 records in: Processed/user_visits_table However, when I look in the output: $ ls -lh Processed/user_visits_table/CG0/ total 15250760 -rwxrwxrwx 1 user _lpoperator 7.3G Sep 21 21:58 part-0* It read a 20G input file and generated some output... /email Is it that in local mode counters are not available? If so, instead of printing zeros we should print Information Unavailable or some such. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1649) FRJoin fails to compute number of input files for replicated input
FRJoin fails to compute number of input files for replicated input -- Key: PIG-1649 URL: https://issues.apache.org/jira/browse/PIG-1649 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 In FRJoin, if input path has curly braces, it fails to compute number of input files and logs the following exception in the log - 10/09/27 14:31:13 WARN mapReduceLayer.MRCompiler: failed to get number of input files java.net.URISyntaxException: Illegal character in path at index 12: /user/tejas/{std*txt} at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.init(URI.java:578) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.hasTooManyInputFiles(MRCompiler.java:1283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:1203) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:188) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:475) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:454) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:336) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:468) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:301) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.run(Main.java:453) at org.apache.pig.Main.main(Main.java:107) This does not cause a query to fail. But since the number of input files don't get calculated, the optimizations added in PIG-1458 to reduce load on name node will not get used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1637) Combiner not use because optimizor inserts a foreach between group and algebric function
[ https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1637: Attachment: PIG-1637-1.patch Combiner not use because optimizor inserts a foreach between group and algebric function Key: PIG-1637 URL: https://issues.apache.org/jira/browse/PIG-1637 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1637-1.patch The following script does not use combiner after new optimization change. {code} A = load ':INPATH:/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = foreach A generate user, (int)timespent as timespent, (double)estimated_revenue as estimated_revenue; C = group B all; D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue); store D into ':OUTPATH:'; {code} This is because after group, optimizer detect group key is not used afterward, it add a foreach statement after C. This is how it looks like after optimization: {code} A = load ':INPATH:/pigmix/page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = foreach A generate user, (int)timespent as timespent, (double)estimated_revenue as estimated_revenue; C = group B all; C1 = foreach C generate B; D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue); store D into ':OUTPATH:'; {code} That cancel the combiner optimization for D. The way to solve the issue is to merge the C1 we inserted and D. Currently, we do not merge these two foreach. The reason is that one output of the first foreach (B) is referred twice in D, and currently rule assume after merge, we need to calculate B twice in D. Actually, C1 is only doing projection, no calculation of B. Merging C1 and D will not result calculating B twice. So C1 and D should be merged. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc - Key: PIG-1650 URL: https://issues.apache.org/jira/browse/PIG-1650 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai grunt shell breaks for many unix xommands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
[ https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1650: --- Attachment: PIG-1650_0.patch This patch will fix many broken commands inside the grunt shell. pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc - Key: PIG-1650 URL: https://issues.apache.org/jira/browse/PIG-1650 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1650_0.patch grunt shell breaks for many unix xommands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1650) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc
[ https://issues.apache.org/jira/browse/PIG-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1650: --- Status: Patch Available (was: Open) pig grunt shell breaks for many commands like perl , awk , pipe , 'ls -l' etc - Key: PIG-1650 URL: https://issues.apache.org/jira/browse/PIG-1650 Project: Pig Issue Type: Bug Reporter: niraj rai Assignee: niraj rai Attachments: PIG-1650_0.patch grunt shell breaks for many unix xommands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1651) PIG class loading mishandled
PIG class loading mishandled Key: PIG-1651 URL: https://issues.apache.org/jira/browse/PIG-1651 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Richard Ding Fix For: 0.8.0 If just having zebra.jar as being registered in a PIG script but not in the CLASSPATH, the query using zebra fails since there appear to be multiple classes loaded into JVM, causing static variable set previously not seen after one instance of the class is created through reflection. (After the zebra.jar is specified in CLASSPATH, it works fine.) The exception stack is as follows: ackend error message during job submission --- org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hdfs://hostname/pathto/zebra_dir :: null at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:284) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:907) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:801) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.zebra.io.ColumnGroup.getNonDataFilePrefix(ColumnGroup.java:123) at org.apache.hadoop.zebra.io.ColumnGroup$CGPathFilter.accept(ColumnGroup.java:2413) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat$MultiPathFilter.accept(TableInputFormat.java:718) at org.apache.hadoop.fs.FileSystem$GlobFilter.accept(FileSystem.java:1084) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:919) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.zebra.mapreduce.TableInputFormat$DummyFileInputFormat.listStatus(TableInputFormat.java:780) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getRowSplits(TableInputFormat.java:863) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:1017) at org.apache.hadoop.zebra.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:961) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) ... 7 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1647) Logical simplifier throws a NPE
[ https://issues.apache.org/jira/browse/PIG-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1647: -- Attachment: PIG-1647.patch passes test-core. test-patch results: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Logical simplifier throws a NPE --- Key: PIG-1647 URL: https://issues.apache.org/jira/browse/PIG-1647 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1647.patch, PIG-1647.patch A query like: A = load 'd.txt' as (a:chararray, b:long, c:map[], d:chararray, e:chararray); B = filter A by a == 'v' and b == 117L and c#'p1' == 'h' and c#'p2' == 'to' and ((d is not null and d != '') or (e is not null and e != '')); will cause the logical expression simplifier to throw a NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.