[jira] Commented: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml
[ https://issues.apache.org/jira/browse/PIG-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800285#action_12800285 ] Daniel Dai commented on PIG-1186: - I didn't include unit test because it is very hard to write a unit test for this. I tested it manually and it works. Pig do not take values in pig-cluster-hadoop-site.xml --- Key: PIG-1186 URL: https://issues.apache.org/jira/browse/PIG-1186 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1186-1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800289#action_12800289 ] Ying He commented on PIG-1178: -- +1 LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified
[ https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800315#action_12800315 ] Viraj Bhat commented on PIG-1187: - Hi Jeff, This is specific to the data we are using and it looks like parser failed when it is trying to interpret some characters. As such we have tested this with Chinese characters and it works. Viraj UTF-8 (international code) breaks with loader when load with schema is specified Key: PIG-1187 URL: https://issues.apache.org/jira/browse/PIG-1187 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a set of Pig statements which dump an international dataset. {code} INPUT_OBJECT = load 'internationalcode'; describe INPUT_OBJECT; dump INPUT_OBJECT; {code} Sample output (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)}) It works and dumps results but when I use a schema for loading it fails. {code} INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag {T: tuple(label:chararray)}); describe INPUT_OBJECT; {code} The error message is as follows:2010-01-14 02:23:27,320 FATAL org.apache.hadoop.mapred.Child: Error running child : org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop caused by repeated empty string matches at line 1, column 21. at org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620) at org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569) at org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651) at org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152) at org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100) at org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382) at org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42) at org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68) at org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1178: Attachment: expressions-2.patch New patch that addresses the unit test failure and javadoc warnings. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions-2.patch, expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1178: Status: Open (was: Patch Available) LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions-2.patch, expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1178: Status: Patch Available (was: Open) LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions-2.patch, expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register
StoreFunc UDF should ship to the backend automatically without register - Key: PIG-1189 URL: https://issues.apache.org/jira/browse/PIG-1189 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Pig should ship store UDF to backend even if user do not use register. The prerequisite is that UDF should be in classpath on frontend. We make that work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800353#action_12800353 ] Ying He commented on PIG-1178: -- to answer Daniel's questions: . In Rule.match, is PatternMatchOperatorPlan only contains leave nodes but not edge information? If so, instead of saying A list of all matched sub-plans, can we put more details in the comments? The returned lists are plans. You can call getPredecessors() or getSuccessors() on any node in the plan. The implementation doesn't keep edge information, it calls the base plan for this information and return the operators that are in this sub-plan. So looking from outside, it is a plan, it's just read-only, and method to update the plan would throw an exception. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions-2.patch, expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1190) Handling of quoted strings in pig-latin/grunt commands
Handling of quoted strings in pig-latin/grunt commands -- Key: PIG-1190 URL: https://issues.apache.org/jira/browse/PIG-1190 Project: Pig Issue Type: Bug Reporter: Thejas M Nair There is some inconsistency in the way quoted strings are used/handled in pig-latin . In load/store and define-ship commands, files are specified in quoted strings , and the file name is the content within the quotes. But in case of register, set, and file system commands , if string is specified in quotes, the quotes are also included as part of the string. This is not only inconsistent , it is also unintuitive. This is also inconsistent with the way hdfs commandline (or bash shell) interpret file names. For example, currently with the command - set job.name 'job123' The job name set set to 'job123' (including the quotes) not job123 . This needs to be fixed, and above command should be considered equivalent to - set job.name job123. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1190) Handling of quoted strings in pig-latin/grunt commands
[ https://issues.apache.org/jira/browse/PIG-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800359#action_12800359 ] Thejas M Nair commented on PIG-1190: This breaks backward compatibility, but I don't think thing the use of file names or job names that actually have quotes is likely to be common . For the long run, I think this is the right thing to do. Handling of quoted strings in pig-latin/grunt commands -- Key: PIG-1190 URL: https://issues.apache.org/jira/browse/PIG-1190 Project: Pig Issue Type: Bug Reporter: Thejas M Nair There is some inconsistency in the way quoted strings are used/handled in pig-latin . In load/store and define-ship commands, files are specified in quoted strings , and the file name is the content within the quotes. But in case of register, set, and file system commands , if string is specified in quotes, the quotes are also included as part of the string. This is not only inconsistent , it is also unintuitive. This is also inconsistent with the way hdfs commandline (or bash shell) interpret file names. For example, currently with the command - set job.name 'job123' The job name set set to 'job123' (including the quotes) not job123 . This needs to be fixed, and above command should be considered equivalent to - set job.name job123. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800413#action_12800413 ] Daniel Dai commented on PIG-1090: - PIG-1090-12.patch committed. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1090: -- Attachment: PIG-1090-13.patch The main updates of the latest patch 13 is the following: * Remove the these files: {code} D src/org/apache/pig/experimental/LoadMetadata.java D src/org/apache/pig/experimental/ResourceStatistics.java D src/org/apache/pig/experimental/ResourceSchema.java D src/org/apache/pig/experimental/JsonMetadata.java D src/org/apache/pig/experimental/StoreMetadata.java {code} * Move _JsonMetadata.java_ to the package _org.apache.pig.piggybank.storage_ * Move _StoreMetadata.java_ to the package _org.apache.pig_ * Modify _PigStorageSchema_ class to use _PigOutputCommitter_ to store the metadata with the output file (PIG-760). Dmitriy, Can you review PIG-760 related changes? Thanks. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1185) Data bags do not close spill files after using iterator to read tuples
[ https://issues.apache.org/jira/browse/PIG-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1185: Resolution: Fixed Assignee: Ying He Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to both trunk and 0.6 branch. No unit test included because it is a fix to existing features. It is very hard to make a unit test for it. Data bags do not close spill files after using iterator to read tuples -- Key: PIG-1185 URL: https://issues.apache.org/jira/browse/PIG-1185 Project: Pig Issue Type: Bug Reporter: Ying He Assignee: Ying He Fix For: 0.6.0 Attachments: PIG_1185.patch spill files are not closed after reading the tuples from iterator. When large number of spill files exists, this can exceed specified max number of open files on the system and therefore, cause application failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800434#action_12800434 ] Hadoop QA commented on PIG-1178: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430285/expressions-2.patch against trunk revision 898497. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/175/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/175/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/175/console This message is automatically generated. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions-2.patch, expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml
[ https://issues.apache.org/jira/browse/PIG-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800438#action_12800438 ] Olga Natkovich commented on PIG-1186: - +1 Pig do not take values in pig-cluster-hadoop-site.xml --- Key: PIG-1186 URL: https://issues.apache.org/jira/browse/PIG-1186 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1186-1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml
[ https://issues.apache.org/jira/browse/PIG-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1186: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to both trunk and 0.6 branch. Pig do not take values in pig-cluster-hadoop-site.xml --- Key: PIG-1186 URL: https://issues.apache.org/jira/browse/PIG-1186 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1186-1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1090: -- Attachment: PIG-1090-13.patch Sync patch-13 with patch-12. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1090: -- Attachment: (was: PIG-1090-13.patch) Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1090: -- Attachment: (was: PIG-1090-13.patch) Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1090: -- Attachment: (was: PIG-1090-13.patch) Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1090: -- Attachment: (was: PIG-1090-13.patch) Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800479#action_12800479 ] Alan Gates commented on PIG-1178: - I've checked in the expressions-2.patch. I'll flesh out LogicalSchema in a separate patch. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions-2.patch, expressions.patch, lp.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800500#action_12800500 ] Dmitriy V. Ryaboy commented on PIG-1090: Richard, I'll check it out, thanks. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-50) query optimization for Pig
[ https://issues.apache.org/jira/browse/PIG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-50. --- Resolution: Fixed Fix Version/s: 0.3.0 A rudimentary optimizer was added by 0.3, with ongoing work being done on it (see PIG-1178). query optimization for Pig -- Key: PIG-50 URL: https://issues.apache.org/jira/browse/PIG-50 Project: Pig Issue Type: Wish Components: impl Reporter: Christopher Olston Fix For: 0.3.0 add relational query optimization techniques, or similar, to Pig discussion so far: ** Amir Youssefi: Comparing two pig scripts of join+filter and filter+join I see that pig has an optimization opportunity of first doing filter by constraints then do the actual join. Do we have a JIRA open for this (or other optimization scenarios)? In my case, the first one resulted in OutOfMemory exception but the second one runs just fine. ** Chris Olston: Yup. It would be great to sprinkle a little relational query optimization technology onto Pig. Given that query optimization is a double-edged sword, we might want to consider some guidelines of the form: 1. Optimizations should always be easy to override by the user. (Sometimes the system is smarter than the user, but other times the reverse is true, and that can be incredibly frustrating.) 2. Only safe optimizations should be performed, where a safe optimization is one that with 95% probability doesn't make the program slower. (An example is pushing filters before joins, given that the filter is known to be cheap; if the filter has a user-defined function it is not guaranteed to be cheap.) Or perhaps there is a knob that controls worst-case versus expected-case minimization. We're at a severe disadvantage relative to relational query engines, because at the moment we have zero metadata. We don't even know the schema of our data sets, much less the distributions of data values (which in turn govern intermediate data sizes between operators). We have to think about how to approach this that is compatible with the Pig philosophy of having metadata always be optional. It could be as simple as (fine, if the user doesn't want to register his data with Pig, then Pig won't be able to optimize programs over that data very well), or as sophisticated as on-line sampling and/or on-line operator reordering. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-64) Formatter for Hadoop Job Config file
[ https://issues.apache.org/jira/browse/PIG-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-64. --- Resolution: Incomplete This patch is way out of date. It also isn't clear to me that PIg wants to get into the business of interpreting JobConf since we don't control it. Formatter for Hadoop Job Config file Key: PIG-64 URL: https://issues.apache.org/jira/browse/PIG-64 Project: Pig Issue Type: Improvement Components: impl Reporter: Benjamin Reed Priority: Minor Attachments: printer.patch We serialize and encode a number of different Pig data structures that describe a part of a Pig job to run in Hadoop. Because of the encoding you cannot see what Pig was doing in a given Hadoop job using just the job XML config file. We need a simple program to make the Hadoop job structures human readable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-21) Show more details about the current execution context
[ https://issues.apache.org/jira/browse/PIG-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-21. --- Resolution: Won't Fix It looks like this patch got dropped without being finished. Show more details about the current execution context - Key: PIG-21 URL: https://issues.apache.org/jira/browse/PIG-21 Project: Pig Issue Type: Improvement Components: grunt Affects Versions: 0.1.0 Reporter: Andrzej Bialecki Priority: Minor Attachments: context.patch After a long interactive session with grunt I lost track of what kind of queries I defined, and then re-defined. It would be nice to have the ability to show all defined aliases, and other context variables, such as the filesystem, jobTracker, user jars and Configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-76) Unit tests for Grunt
[ https://issues.apache.org/jira/browse/PIG-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-76. --- Resolution: Fixed Tests for grunt were added some time ago. Unit tests for Grunt Key: PIG-76 URL: https://issues.apache.org/jira/browse/PIG-76 Project: Pig Issue Type: Bug Reporter: Antonio Magnaghi Currently there are no units tests in place for Grunt. However Grunt is extensively used as part of the end-to-end tests. If some changes break Grunt, this will become evident only later on in the development process during E2E testing. Talked to Alan and Olga, probably the best way to address this is to put in place unit tests that integrate with the test harness used for regression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-79) Switch grunt shell to use hadoop FSShell for DFS commands
[ https://issues.apache.org/jira/browse/PIG-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-79. --- Resolution: Duplicate Switch grunt shell to use hadoop FSShell for DFS commands - Key: PIG-79 URL: https://issues.apache.org/jira/browse/PIG-79 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich This will provide us command semantics consistent with hadoop including allowing pig remove command to use trash. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-82) Loose floating point precision
[ https://issues.apache.org/jira/browse/PIG-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-82. --- Resolution: Won't Fix Loss of precision is a known issue with floating point numbers. The correct solution here is to introduce a fixed point type, similar to SQL's decimal. Loose floating point precision -- Key: PIG-82 URL: https://issues.apache.org/jira/browse/PIG-82 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.1.0 Reporter: Daeho Baek Pig looses floating point precision during conversion between binary and string conversion. Here is an example code. words = LOAD '/user/daeho/words.txt' as (word); numWords = FOREACH (GROUP words ALL) GENERATE COUNT($1); weight = FOREACH numWords GENERATE 1.0 / $0; wordsWithWeight = CROSS words, weight; sumWeight = FOREACH (GROUP wordsWithWeight ALL) GENERATE SUM($1.$1); dump sumWeight; sumWeight is not 1 even though words.txt has 118 lines. Can we store floating point as binary format? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-119) test suite improvements
[ https://issues.apache.org/jira/browse/PIG-119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-119. Resolution: Fixed Hudson ci build is done and there are tests for local mode. We aren't planning on moving the unit tests into the various different packages at the moment. test suite improvements --- Key: PIG-119 URL: https://issues.apache.org/jira/browse/PIG-119 Project: Pig Issue Type: Improvement Reporter: Stefan Groschupf Priority: Critical From my point of view a test suite is very important for a open source project. As better and easier to use it is, as more people can easy contribute and fix bugs. With this in mind I see some space for improvement in the test suite for pig. Here my suggestions, I would love to work on that in case we all agree on the points. Phase 1: + it should be possible to switch a test mode that defines if pig runs in local mode, mini cluster or big cluster. ++ ant test -Dtest.mode=local or -Dtest.mode=mapreduce or -Dtest.mode=mapreduce -Dcluster=myJobTracker ++ default should be local Phase 2: + setup a hudson ci build, run minicluster once a day, run local mode after each checkin. Phase 3: cleanup the test package, general standard is that each test should be in the same package as the class that is tested. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-117) commons logging and log4j
[ https://issues.apache.org/jira/browse/PIG-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-117. Resolution: Won't Fix We're not pulling log4j out anytime soon. commons logging and log4j - Key: PIG-117 URL: https://issues.apache.org/jira/browse/PIG-117 Project: Pig Issue Type: Improvement Reporter: Stefan Groschupf On the one hand side Pig uses commons logging - what makes sense. On the other hand side the Pig Main class configure Log4j in the code. This introduce a log4j must have dependency. I suggest to only use a log4j configuration file to configure log4j and remove the log4j configuration in the code. Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-135) Ensure no temporary files are created in the top-level source directory during the build/test process
[ https://issues.apache.org/jira/browse/PIG-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-135. Resolution: Fixed Everything but src-gen now goes under the build directory. We aren't planning on moving src-gen there. Ensure no temporary files are created in the top-level source directory during the build/test process - Key: PIG-135 URL: https://issues.apache.org/jira/browse/PIG-135 Project: Pig Issue Type: Improvement Reporter: Arun C Murthy Let's assume SRC_TOP is the top-level src directory. Currently the build process creates a *src-gen* directory in SRC_TOP and the junit tests create *dfs* and *test* directories in SRC_TOP. This means that the 'ant clean' task now has to cleanup all of them. Interestingly, 'ant clean' doesn't remove the 'dfs' directory at all... a related bug. It would be nice to create a standalone _build_ directory in the top-level directory and then use that as the parent of _all_ generated files (source and non-source). This would mean 'ant clean' would just need to delete the build directory. It plays well when there are multiple sub-projects developed on top of Pig (e.g. contrib etc.) too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-145) LocalFile ignores active container and HDataStorage can't copy to other DataStrorage
[ https://issues.apache.org/jira/browse/PIG-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-145. Resolution: Won't Fix As of Pig 0.6 true local mode (ie Pig executing the code rather than through Map Reduce) has been removed. LocalFile ignores active container and HDataStorage can't copy to other DataStrorage Key: PIG-145 URL: https://issues.apache.org/jira/browse/PIG-145 Project: Pig Issue Type: Bug Components: impl Reporter: Charlie Groves Attachments: PIG-145-DataStorage_Bugs.patch As part of starting to rewrite the DataStorage APIs, I wrote some unit tests for the existing DataStorage implementations to make sure I wasn't breaking anything. In testing the open code, I found that LocalFile doesn't respect the active container you set you LocalDataStorage, so if you open a relative file, it's relative to wherever you're running the code. Similarly, while testing the copy operations, I found that HFile doesn't allow copying to anything other than other HFiles and that HDirectory's copy operation was never used because it had the wrong signature. The attached patch fixes these issues and adds tests for much of the DataStorage API for both of the existing backends. There are no tests for the sopen code as I'm planning on changing that significantly in rewriting these. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-147) Pig Jira Administrator: Please remove the Patch Available check box
[ https://issues.apache.org/jira/browse/PIG-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-147. Resolution: Fixed Pig Jira Administrator: Please remove the Patch Available check box --- Key: PIG-147 URL: https://issues.apache.org/jira/browse/PIG-147 Project: Pig Issue Type: Bug Reporter: Xu Zhang Priority: Minor We now have Patch Available as a status of a JIRA Pig bug, so the Patch Available checkbox needs to be removed from the Find pane and the Edit page of the Pig project. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-163) Improve parsing for UDFs in QueryParser
[ https://issues.apache.org/jira/browse/PIG-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-163. Resolution: Fixed Fixed a long time ago. Improve parsing for UDFs in QueryParser --- Key: PIG-163 URL: https://issues.apache.org/jira/browse/PIG-163 Project: Pig Issue Type: Bug Reporter: Arun C Murthy Parsing of UDFs in QueryParser (used in LOAD/GROUP) could be more strict, currently it just assumes it is a list of quoted-strings, so for e.g. it doesn't handle UDFs which take other UDFs as arguments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
reading/writing HBase in Pig
Hi all, I was looking at the current Pig code in SVN, and it seems like HBase is supported for loading, but not for storing. If this is the case, I'd like to add support for writing to HBase to Pig. Is there anyone else working on this, and if not is this something that you'd like contributed? Based on a cursory evaluation of the StoreFunc interface, it looks like the APIs there are pretty file-centric and may need to be modified to accomodate HBase's table-based design. For example, you aren't going to be serializing your output to an OutputStream object in all likelihood. I haven't contributed to Pig before, and I wanted to see if this is something that would be beneficial to the rest of the Pig community, and if so what next steps I should take (like starting a JIRA) to get the ball rolling. Thanks Best regards, Mike
[jira] Resolved: (PIG-175) Reading compressed files in local mode + MiniMRCluster
[ https://issues.apache.org/jira/browse/PIG-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-175. Resolution: Won't Fix Pig local mode has been dropped in 0.6 in favor of Hadoop's LocalJobRunner. I'm not worried about being unable to mix compressed and uncompressed files in MiniMR mode. Reading compressed files in local mode + MiniMRCluster -- Key: PIG-175 URL: https://issues.apache.org/jira/browse/PIG-175 Project: Pig Issue Type: Bug Reporter: Craig Macdonald Attachments: testCompressed.sh I have written a small test script that tests if three simple compressed and uncompressed files can be loaded successfully. Essentially, it writes a file, compresses it using gzip and bzip2, and see if Pig can load it. I use both local execution mode and miniMR cluster. Here are my results: MiniMRCluster * uncompressed: OK * gzip: OK * bzip2: OK * All three at once: not OK Local Execution Mode * uncompressed: OK * gzip: not OK (garbled output) * bzip2: not OK ( garbled output) * All three at once: not OK (expected) I'm not sure what the problem is with the miniMRcluster - there is a NPE in PigSplit.getLocations(). I suspect that getFileCacheHints() is returning null, which ususally indicates a non-existant file. However, for the local execution mode, I'm fairly confident that this mode has no support for compressed files. Craig {noformat} == Bashs good friend: cat == Normal A B C bz2 A B C gzip A B C == MiniMRCluster == test.all.pig 2008-03-29 12:07:22,103 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2008-03-29 12:07:22,241 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2008-03-29 12:07:22,555 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - - MapReduce Job - 2008-03-29 12:07:22,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: [/users/grad/craigm/src/pig/FROMApache/trunk4/trunk/test.normal:org.apache.pig.builtin.PigStorage()] 2008-03-29 12:07:22,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]] 2008-03-29 12:07:22,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null 2008-03-29 12:07:22,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null 2008-03-29 12:07:22,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null 2008-03-29 12:07:22,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output: /tmp/temp-1403805719/tmp1733057091:org.apache.pig.builtin.BinStorage 2008-03-29 12:07:22,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null 2008-03-29 12:07:22,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism: -1 2008-03-29 12:07:22,557 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce parallelism: -1 2008-03-29 12:07:23,427 [Thread-0] INFO org.apache.hadoop.mapred.MapTask - numReduceTasks: 1 2008-03-29 12:07:23,544 [Thread-0] INFO org.apache.hadoop.mapred.LocalJobRunner - 2008-03-29 12:07:23,545 [Thread-0] INFO org.apache.hadoop.mapred.TaskRunner - Task 'map_' done. 2008-03-29 12:07:23,581 [Thread-0] INFO org.apache.hadoop.mapred.TaskRunner - Saved output of task 'map_' to file:/tmp/temp-1403805719/tmp1733057091 2008-03-29 12:07:23,625 [Thread-0] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce reduce 2008-03-29 12:07:23,626 [Thread-0] INFO org.apache.hadoop.mapred.TaskRunner - Task 'reduce_cibps7' done. 2008-03-29 12:07:23,630 [Thread-0] INFO org.apache.hadoop.mapred.TaskRunner - Saved output of task 'reduce_cibps7' to file:/tmp/temp-1403805719/tmp1733057091 2008-03-29 12:07:24,383 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher - Pig progress = 100% (A) (B) (C) 2008-03-29 12:07:24,415 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - - MapReduce Job - 2008-03-29 12:07:24,415 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: [/user/craigm/test.gz:org.apache.pig.builtin.PigStorage()] 2008-03-29 12:07:24,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]] 2008-03-29 12:07:24,416 [main] INFO org.apache.pig.backend.hadoop.executionengine.POMapreduce -
[jira] Resolved: (PIG-208) Keeping files internalized
[ https://issues.apache.org/jira/browse/PIG-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-208. Resolution: Incomplete I don't understand what this means. Keeping files internalized -- Key: PIG-208 URL: https://issues.apache.org/jira/browse/PIG-208 Project: Pig Issue Type: New Feature Components: data Reporter: John DeTreville Pig files are kept in externalized form between Pig programs, but (I believe) are held in internalized form while being used. It is expensive to internalize externalized files at the beginning of each program, and to externalize internalized files at the end of each program. Pig needs a way to keep its files internalized across programs. This will require a way to name and manage internalized files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-209) Indexes for accelerating joins
[ https://issues.apache.org/jira/browse/PIG-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-209. Resolution: Won't Fix At this point Pig is relying on storage formats such as Zebra to do indexing. We have no near term plans to provide indexing inside Pig itself. Indexes for accelerating joins -- Key: PIG-209 URL: https://issues.apache.org/jira/browse/PIG-209 Project: Pig Issue Type: New Feature Components: data Reporter: John DeTreville Computing the inner join of a very large table (i.e., bag or mapping) with a smaller table can take time proportional to the size of the very large table. This time required can be greatly reduced if the very large table is indexed, taking time proportional to the size of the smaller table. It should be possible for clients to index tables for use by future joins. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-210) Column store
[ https://issues.apache.org/jira/browse/PIG-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-210. Resolution: Duplicate Column store Key: PIG-210 URL: https://issues.apache.org/jira/browse/PIG-210 Project: Pig Issue Type: New Feature Components: data Reporter: John DeTreville I believe that Pig stores its tables in row order, which is less efficient in space and time than column order in a data-mining system. Column stores can be more highly compressed, and can be read and written faster. It should be possible for clients to store their tables in column order. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-221) Release updated builds on a regular basis
[ https://issues.apache.org/jira/browse/PIG-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-221. Resolution: Won't Fix We now have regular releases and a continuous integration process. We don't have, or plan to have, nightly builds. Release updated builds on a regular basis - Key: PIG-221 URL: https://issues.apache.org/jira/browse/PIG-221 Project: Pig Issue Type: Task Reporter: Amir Youssefi Release updated builds on a regular basis. For the starter we can use Hudson to release nightly builds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-241) Sharding and joins
[ https://issues.apache.org/jira/browse/PIG-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-241. Resolution: Won't Fix We have chosen a different approach to this. Our merge join does take advantage of sort order, but does not require that data be partitioned in the same way in order to do the join, as the this suggested sharding approach does. Sharding and joins -- Key: PIG-241 URL: https://issues.apache.org/jira/browse/PIG-241 Project: Pig Issue Type: New Feature Components: data Reporter: John DeTreville Many large distributed systems for storage and computing over tables divide these tables into smaller _shards,_ such that all rows with the same (primary) key will appear in the same shard. If two tables are consistently sharded, then they can be joined shard-by-shard. If corresponding shards are stored on the same hosts (or racks), then joins can be performed locally on those hosts without copying the rows of the tables over the network; this can produce significant speedups. Pig does not currently provide application-controlled sharding and the associated shard placement and computation placement. The performance of joins therefore suffers in many scenarios; rows are passed over the network multiple times when performing a join. If Pig (and Hadoop) could provide the ability for the application to shard tables consistently, according to an application-controlled policy, joins could be completely local operations and could in many cases perform much better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-247) Accept globbing when ExecType.LOCAL
[ https://issues.apache.org/jira/browse/PIG-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-247. Resolution: Won't Fix In Pig 0.6 Pig's local mode has been replaced with Hadoop's LocalJobRunner. Accept globbing when ExecType.LOCAL --- Key: PIG-247 URL: https://issues.apache.org/jira/browse/PIG-247 Project: Pig Issue Type: Improvement Components: impl Reporter: Iván de Prado Priority: Minor Globs are supported when ExecType is MAPREDUCE (Hadoop), but not when ExecType is LOCAL. That is inconsistent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-281) Support # for comment besides --
[ https://issues.apache.org/jira/browse/PIG-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-281. Resolution: Won't Fix # is the map dereference operator in Pig Latin and thus cannot be the comment operator too. Support # for comment besides -- Key: PIG-281 URL: https://issues.apache.org/jira/browse/PIG-281 Project: Pig Issue Type: Improvement Reporter: Amir Youssefi Priority: Trivial -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-265) Make all functions in pig case insesitive.
[ https://issues.apache.org/jira/browse/PIG-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-265. Resolution: Won't Fix Since we map directly from UDF name to java (package and) class, this would be difficult. Make all functions in pig case insesitive. -- Key: PIG-265 URL: https://issues.apache.org/jira/browse/PIG-265 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich I should be able to say COUNT, Count, or count in my script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-371) Show line number in grunt
[ https://issues.apache.org/jira/browse/PIG-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-371. Resolution: Won't Fix Show line number in grunt - Key: PIG-371 URL: https://issues.apache.org/jira/browse/PIG-371 Project: Pig Issue Type: Improvement Reporter: Amir Youssefi Priority: Trivial Now that PIG-270 is in. It will be nice to have line number in grunt prompt. Something like this: 10 grunt grunt (10) grunt:10 10: grunt etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-417) Local Mode is broken
[ https://issues.apache.org/jira/browse/PIG-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-417. Resolution: Won't Fix Local mode has been replaced by Hadoop's LocalJobRunner in 0.6. Local Mode is broken Key: PIG-417 URL: https://issues.apache.org/jira/browse/PIG-417 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Shravan Matthur Narayanamurthy Priority: Minor When we use pig in local mode and also have some config files that point to cluster (in the form of hadoop-site.xml) in the classpath, the local mode errs out saying it can't find the input file. This is because, when the local execution engine is being created, a new Configuration object is being created which takes properties from hadoop-site.xml while initializing. Because of this from then on it tries to connect to the settings in the hadoop-site.xml and fails to find the local files. However, as we are in local mode we want this new Configuration to contain only properties from our pigContext. Currently, the configuration object doesn't support such a thing. We would actually want to initialize the Configuration with properties in hadoop-default.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-384) regression: execution plan does not show up in the job's output
[ https://issues.apache.org/jira/browse/PIG-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-384. Resolution: Not A Problem This is by design. The execution plan can be shown by adding -v to the pig command line. regression: execution plan does not show up in the job's output --- Key: PIG-384 URL: https://issues.apache.org/jira/browse/PIG-384 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Olga Natkovich Priority: Minor The code in trunk shows execution plan as part of job's output. this is missing from types branch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-386) Pig does not do type checking on a per statement basis
[ https://issues.apache.org/jira/browse/PIG-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-386. Resolution: Won't Fix Pig does not do type checking on a per statement basis -- Key: PIG-386 URL: https://issues.apache.org/jira/browse/PIG-386 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Shravan Matthur Narayanamurthy Priority: Minor Currently though Pig has a type checker it is not called with every query registration. Instead, the system waits till there is a dump or store. I think its not in line with the philosophy of catching errors early. Instead of the typechecking happening in the execute method, it should happen in registerQuery and the execute method should expect a type checked plan. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-458) Type branch integration with hadoop 18
[ https://issues.apache.org/jira/browse/PIG-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-458. Resolution: Fixed Done a long time ago. Type branch integration with hadoop 18 -- Key: PIG-458 URL: https://issues.apache.org/jira/browse/PIG-458 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Assignee: Olga Natkovich Attachments: hadoop18.jar, PIG-458.patch, un18.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-478) allowing custome partitioner between map and reduce
[ https://issues.apache.org/jira/browse/PIG-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-478. Resolution: Duplicate allowing custome partitioner between map and reduce --- Key: PIG-478 URL: https://issues.apache.org/jira/browse/PIG-478 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich the hope is for more even distribution. Don't have a specific use case here -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-491) evaluate function argument expressions before the arguments are constructed as bags of tuples (a la SQL)
[ https://issues.apache.org/jira/browse/PIG-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-491. Resolution: Won't Fix We're not going to change Pig Latin semantics at such a basic level at this point. evaluate function argument expressions before the arguments are constructed as bags of tuples (a la SQL) Key: PIG-491 URL: https://issues.apache.org/jira/browse/PIG-491 Project: Pig Issue Type: New Feature Environment: pig interpreter Reporter: Mike Potts The final section of: http://wiki.apache.org/pig/PigTypesFunctionalSpec proposes this exact feature. The crucial excerpt is: The proposed solution is to change the semantics of pig, so that expression evaluation on function arguments is done before the arguments are constructed as bags of tuples, rather than afterwards. This means that the semantics would change so that SUM(salary * bonus_multiplier) means that for each tuple in grouped, the fields grouped.employee:salary and grouped.employee:bonus_multiplier will be multiplied and the result formed into tuples that are placed in a bag to be passed to the function SUM(). This would make my pig scripts significantly shorter and easier to understand. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-492) There should be a way for Loader to refer to the output of determineSchema() in the backend
[ https://issues.apache.org/jira/browse/PIG-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-492. Resolution: Fixed PIG-1085 provides this functionality. There should be a way for Loader to refer to the output of determineSchema() in the backend --- Key: PIG-492 URL: https://issues.apache.org/jira/browse/PIG-492 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Currently LoadFunc.determineSchema() is only called from LOLoad() at parse time in the front end. If the loader.getNext() needs to know what the output of determineSchema() was there is no way to get to it in the backend - there should be some way to get to it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: reading/writing HBase in Pig
Hi Mike, It would be great to have a StoreFunc for HBase! There is a rewrite underway for the Load/Store stuff that will make that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966 . You may want to consider writing it for the load-store redesign branch. This is what's probably going to be in 0.7. The first step would be to open a jira and look at the existing StoreFunc implementations. -D On Thu, Jan 14, 2010 at 9:59 PM, Michael Dalton mwdal...@gmail.com wrote: Hi all, I was looking at the current Pig code in SVN, and it seems like HBase is supported for loading, but not for storing. If this is the case, I'd like to add support for writing to HBase to Pig. Is there anyone else working on this, and if not is this something that you'd like contributed? Based on a cursory evaluation of the StoreFunc interface, it looks like the APIs there are pretty file-centric and may need to be modified to accomodate HBase's table-based design. For example, you aren't going to be serializing your output to an OutputStream object in all likelihood. I haven't contributed to Pig before, and I wanted to see if this is something that would be beneficial to the rest of the Pig community, and if so what next steps I should take (like starting a JIRA) to get the ball rolling. Thanks Best regards, Mike
[jira] Created: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH
POCast throws exception for certain sequences of LOAD, FILTER, FORACH - Key: PIG-1191 URL: https://issues.apache.org/jira/browse/PIG-1191 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ankur Priority: Blocker When using a custom load/store function, one that returns complex data (map of maps, list of maps), for certain sequences of LOAD, FILTER, FOREACH pig script throws an exception of the form - org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a bytearray from the UDF. Cannot determine how to convert the bytearray to actual-type at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639) ... Looking through the code of POCast, apparently the operator was unable to find the right load function for doing the conversion and consequently bailed out with the exception failing the entire pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH
[ https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1191: Status: Patch Available (was: Open) POCast throws exception for certain sequences of LOAD, FILTER, FORACH - Key: PIG-1191 URL: https://issues.apache.org/jira/browse/PIG-1191 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ankur Priority: Blocker Attachments: PIG-1191-1.patch When using a custom load/store function, one that returns complex data (map of maps, list of maps), for certain sequences of LOAD, FILTER, FOREACH pig script throws an exception of the form - org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a bytearray from the UDF. Cannot determine how to convert the bytearray to actual-type at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639) ... Looking through the code of POCast, apparently the operator was unable to find the right load function for doing the conversion and consequently bailed out with the exception failing the entire pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH
[ https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1191: Attachment: PIG-1191-1.patch Hi, Ankur, Can you check if this patch works? POCast throws exception for certain sequences of LOAD, FILTER, FORACH - Key: PIG-1191 URL: https://issues.apache.org/jira/browse/PIG-1191 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ankur Priority: Blocker Attachments: PIG-1191-1.patch When using a custom load/store function, one that returns complex data (map of maps, list of maps), for certain sequences of LOAD, FILTER, FOREACH pig script throws an exception of the form - org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a bytearray from the UDF. Cannot determine how to convert the bytearray to actual-type at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639) ... Looking through the code of POCast, apparently the operator was unable to find the right load function for doing the conversion and consequently bailed out with the exception failing the entire pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH
[ https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800609#action_12800609 ] Ankur commented on PIG-1191: Listed below are the identified cases. CASE 1: LOAD - FILTER - FOREACH - LIMIT - STORE === SCRIPT --- sds = LOAD '/my/data/location' USING my.org.MyMapLoader() AS (simpleFields:map[], mapFields:map[], listMapFields:map[]); queries = FILTER sds BY mapFields#'page_params'#'query' is NOT NULL; queries_rand = FOREACH queries GENERATE (CHARARRAY) (mapFields#'page_params'#'query') AS query_string; queries_limit = LIMIT queries_rand 100; STORE queries_limit INTO 'out'; RESULT FAILS in reduce stage with the following exception org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a bytearray from the UDF. Cannot determine how to convert the bytearray to string. at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371) CASE 2: LOAD - FOREACH - FILTER - LIMIT - STORE === Note that FILTER and FOREACH order is reversed SCRIPT --- sds = LOAD '/my/data/location' USING my.org.MyMapLoader() AS (simpleFields:map[], mapFields:map[], listMapFields:map[]); queries_rand = FOREACH sds GENERATE (CHARARRAY) (mapFields#'page_params'#'query') AS query_string; queries = FILTER queries_rand BY query_string IS NOT null; queries_limit = LIMIT queries 100; STORE queries_limit INTO 'out'; RESULT --- SUCCESS - Results are correctly stored. So if a projection is done before FILTER it recieves the LoadFunc in the POCast operator and everything is cool. CASE 3: LOAD - FOREACH - FOREACH - FILTER - LIMIT - STORE == SCRIPT --- ds = LOAD '/my/data/location' USING my.org.MyMapLoader() AS (simpleFields:map[], mapFields:map[], listMapFields:map[]); params = FOREACH sds GENERATE (map[]) (mapFields#'page_params') AS params; queries = FOREACH params GENERATE (CHARARRAY) (params#'query') AS query_string; queries_filtered = FILTER queries BY query_string IS NOT null; queries_limit = LIMIT queries_filtered 100; STORE queries_limit INTO 'out'; RESULT --- FAILS in Map stage. Looks like the 2nd FOREACH did not get the loadFunc and bailed out with following stack trace org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a bytearray from the UDF. Cannot determine how to convert the bytearray to string. at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:85) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at CASE 4: LOAD - FOREACH - FOREACH - LIMIT - STORE SCRIPT --- sds = LOAD '/my/data/location' USING my.org.MyMapLoader() AS (simpleFields:map[], mapFields:map[], listMapFields:map[]); params = FOREACH sds GENERATE (map[]) (mapFields#'page_params') AS params; queries = FOREACH params