[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799859#action_12799859 ] Ying He commented on PIG-1178: -- a couple questions on expression operators: 1. in ProjectExpression, is it better to change the object variable input from int to LogicalRelationalOperator to point to the operator that the project expression operates on directly? And I don't understand why this operator needs alias it references. But if we change input to operator object, the alias can be get from the operator. 2. I don't know the purpose of ColumnExpression. Is it to capture operands? It doesn't seem to have any special features. So I am not sure if it is necessary LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml
Pig do not take values in pig-cluster-hadoop-site.xml --- Key: PIG-1186 URL: https://issues.apache.org/jira/browse/PIG-1186 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml
[ https://issues.apache.org/jira/browse/PIG-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1186: Status: Patch Available (was: Open) Pig do not take values in pig-cluster-hadoop-site.xml --- Key: PIG-1186 URL: https://issues.apache.org/jira/browse/PIG-1186 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1186-1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml
[ https://issues.apache.org/jira/browse/PIG-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1186: Attachment: PIG-1186-1.patch Pig do not take values in pig-cluster-hadoop-site.xml --- Key: PIG-1186 URL: https://issues.apache.org/jira/browse/PIG-1186 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1186-1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1185) Data bags do not close spill files after using iterator to read tuples
[ https://issues.apache.org/jira/browse/PIG-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1185: Fix Version/s: 0.6.0 Data bags do not close spill files after using iterator to read tuples -- Key: PIG-1185 URL: https://issues.apache.org/jira/browse/PIG-1185 Project: Pig Issue Type: Bug Reporter: Ying He Fix For: 0.6.0 Attachments: PIG_1185.patch spill files are not closed after reading the tuples from iterator. When large number of spill files exists, this can exceed specified max number of open files on the system and therefore, cause application failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800054#action_12800054 ] Alan Gates commented on PIG-1178: - bq. in ProjectExpression, is it better to change the object variable input from int to LogicalRelationalOperator to point to the operator that the project expression operates on directly? I want to avoid references to the actual relational operators in the expressions because it makes patching up the plans after a transformation much easier. If each project keeps a reference to the relational operator, then when the plan is transformed we have to go to every project and change its reference. By keeping pointers only to which input number, we don't have to make any changes in the projects after a transformation in the plan. bq. And I don't understand why this operator needs alias it references. But if we change input to operator object, the alias can be get from the operator. You're right, we shouldn't double store aliases here. We should just use the uid and the project reference. I'll make the change. bq. I don't know the purpose of ColumnExpression. Is it to capture operands? It doesn't seem to have any special features. So I am not sure if it is necessary It's a super class for all expressions that represent a single value: projection, constants, and eventually, tuple and map dereferences. I think it's useful for understanding the categorization of expressions. I'm not sure it adds any functionality. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1178: Attachment: expressions.patch Second take on expressions.patch, with fixes for the findbugs warning and Ying's comment on not having aliases in projections. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1178: Status: Patch Available (was: Open) LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1178: Attachment: (was: expressions.patch) LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1178: Status: Open (was: Patch Available) LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified
UTF-8 (international code) breaks with loader when load with schema is specified Key: PIG-1187 URL: https://issues.apache.org/jira/browse/PIG-1187 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a set of Pig statements which dump an international dataset. {code} INPUT_OBJECT = load 'internationalcode'; describe INPUT_OBJECT; dump INPUT_OBJECT; {code} Sample output (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)}) It works and dumps results but when I use a schema for loading it fails. {code} INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag {T: tuple(label:chararray)}); describe INPUT_OBJECT; {code} The error message is as follows:2010-01-14 02:23:27,320 FATAL org.apache.hadoop.mapred.Child: Error running child : org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop caused by repeated empty string matches at line 1, column 21. at org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620) at org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569) at org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651) at org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152) at org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100) at org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382) at org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42) at org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68) at org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified
[ https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800095#action_12800095 ] Jeff Zhang commented on PIG-1187: - Hi Viraj, I run your script, and it works OK on my machine. Maybe it relates with encoding of OS, I install Asia character on my machine. UTF-8 (international code) breaks with loader when load with schema is specified Key: PIG-1187 URL: https://issues.apache.org/jira/browse/PIG-1187 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a set of Pig statements which dump an international dataset. {code} INPUT_OBJECT = load 'internationalcode'; describe INPUT_OBJECT; dump INPUT_OBJECT; {code} Sample output (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)}) It works and dumps results but when I use a schema for loading it fails. {code} INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag {T: tuple(label:chararray)}); describe INPUT_OBJECT; {code} The error message is as follows:2010-01-14 02:23:27,320 FATAL org.apache.hadoop.mapred.Child: Error running child : org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop caused by repeated empty string matches at line 1, column 21. at org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620) at org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569) at org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651) at org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152) at org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100) at org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382) at org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42) at org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68) at org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml
[ https://issues.apache.org/jira/browse/PIG-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800123#action_12800123 ] Hadoop QA commented on PIG-1186: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430194/PIG-1186-1.patch against trunk revision 898497. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/173/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/173/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/173/console This message is automatically generated. Pig do not take values in pig-cluster-hadoop-site.xml --- Key: PIG-1186 URL: https://issues.apache.org/jira/browse/PIG-1186 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1186-1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800125#action_12800125 ] Hadoop QA commented on PIG-1178: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430202/expressions.patch against trunk revision 898497. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/174/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/174/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/174/console This message is automatically generated. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.