[jira] Updated: (PIG-1060) MultiQuery optimization throws error for multi-level splits
[ https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1060: -- Status: Open (was: Patch Available) MultiQuery optimization throws error for multi-level splits --- Key: PIG-1060 URL: https://issues.apache.org/jira/browse/PIG-1060 Project: Pig Issue Type: Bug Affects Versions: 0.5.0 Reporter: Ankur Assignee: Richard Ding Attachments: PIG-1060.patch Consider the following scenario :- 1. Multi-level splits in the map plan. 2. Each split branch further progressing across a local-global rearrange. 3. Output of each of these finally merged via a UNION. MultiQuery optimizer throws the following error in such a case: ERROR 2146: Internal Error. Inconsistency in key index found during optimization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1060) MultiQuery optimization throws error for multi-level splits
[ https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1060: -- Status: Patch Available (was: Open) MultiQuery optimization throws error for multi-level splits --- Key: PIG-1060 URL: https://issues.apache.org/jira/browse/PIG-1060 Project: Pig Issue Type: Bug Affects Versions: 0.5.0 Reporter: Ankur Assignee: Richard Ding Attachments: PIG-1060.patch Consider the following scenario :- 1. Multi-level splits in the map plan. 2. Each split branch further progressing across a local-global rearrange. 3. Output of each of these finally merged via a UNION. MultiQuery optimizer throws the following error in such a case: ERROR 2146: Internal Error. Inconsistency in key index found during optimization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's
[ https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774054#action_12774054 ] Pradeep Kamath commented on PIG-1065: - This is an instance of the problem of representing unknown schema with a null schema. If the schema of a relational operator is null, pig assumes the fields are of type bytearray which is incorrect. An unknown schema really means we don't know the types of the fields. In the above case, once pig determines that the two schemas have different sizes, it sets the schema of LOUnion to null (to represent unknown schema). Hence the order by expects the fields coming out of the union to be byte arrays but in reality the first field (which is the sort key above) is a chararray - this results in a runtime exception. I propose that when either of the two inputs to a union have a schema we should error out if the two are incompatible and not continue. If the two inputs don't have a schema then we can proceed with null schema - thoughts? In-determinate behaviour of Union when there are 2 non-matching schema's Key: PIG-1065 URL: https://issues.apache.org/jira/browse/PIG-1065 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a script which first does a union of these schemas and then does a ORDER BY of this result. {code} f1 = LOAD '1.txt' as (key:chararray, v:chararray); f2 = LOAD '2.txt' as (key:chararray); u0 = UNION f1, f2; describe u0; dump u0; u1 = ORDER u0 BY $0; dump u1; {code} When I run in Map Reduce mode I get the following result: $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig Schema for u0 unknown. (1,2) (2,3) (1) (2) org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias u1 at org.apache.pig.PigServer.openIterator(PigServer.java:475) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) When I run the same script in local mode I get a different result, as we know that local mode does not use any Hadoop Classes. $java -cp pig.jar org.apache.pig.Main -x local broken.pig Schema for u0 unknown (1,2) (1) (2,3) (2) (1,2) (1) (2,3) (2) Here are some questions 1) Why do we allow union if the schemas do not match 2) Should we not print an error message/warning so that the user knows that this is not allowed or he can get unexpected results? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1071) Support comma separated file/directory names in load statements
[ https://issues.apache.org/jira/browse/PIG-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774061#action_12774061 ] Hadoop QA commented on PIG-1071: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424056/PIG-1071.patch against trunk revision 832804. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/140/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/140/console This message is automatically generated. Support comma separated file/directory names in load statements --- Key: PIG-1071 URL: https://issues.apache.org/jira/browse/PIG-1071 Project: Pig Issue Type: New Feature Reporter: Richard Ding Assignee: Richard Ding Attachments: PIG-1071.patch Currently Pig Latin support following LOAD syntax: {code} LOAD 'data' [USING loader function] [AS schema]; {code} where data is the name of the file or directory, including files specified with Hadoop-supported globing syntax. This name is passed to the loader function. This feature is to support loaders that can load multiple files from different directories and allows users to pass in the file names in a comma separated string. For example, these will be valid load statements: {code} LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()'; {code} and {code} LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader(); {code} This comma separated string is passed to the loader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's
[ https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774065#action_12774065 ] Thejas M Nair commented on PIG-1065: Can this be allowed (in case of incompatible schemas as in description) - u0 = UNION f1, f2 as (key:chararray, v:chararray); ? In-determinate behaviour of Union when there are 2 non-matching schema's Key: PIG-1065 URL: https://issues.apache.org/jira/browse/PIG-1065 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a script which first does a union of these schemas and then does a ORDER BY of this result. {code} f1 = LOAD '1.txt' as (key:chararray, v:chararray); f2 = LOAD '2.txt' as (key:chararray); u0 = UNION f1, f2; describe u0; dump u0; u1 = ORDER u0 BY $0; dump u1; {code} When I run in Map Reduce mode I get the following result: $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig Schema for u0 unknown. (1,2) (2,3) (1) (2) org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias u1 at org.apache.pig.PigServer.openIterator(PigServer.java:475) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) When I run the same script in local mode I get a different result, as we know that local mode does not use any Hadoop Classes. $java -cp pig.jar org.apache.pig.Main -x local broken.pig Schema for u0 unknown (1,2) (1) (2,3) (2) (1,2) (1) (2,3) (2) Here are some questions 1) Why do we allow union if the schemas do not match 2) Should we not print an error message/warning so that the user knows that this is not allowed or he can get unexpected results? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-997) [zebra] Sorted Table Support by Zebra
[ https://issues.apache.org/jira/browse/PIG-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-997: --- Resolution: Fixed Status: Resolved (was: Patch Available) All the nightly tests now pass. Patch checked in. [zebra] Sorted Table Support by Zebra - Key: PIG-997 URL: https://issues.apache.org/jira/browse/PIG-997 Project: Pig Issue Type: New Feature Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.6.0 Attachments: SortedTable.patch, SortedTable.patch, SortedTable.patch, SortedTable.patch This new feature is for Zebra to support sorted data in storage. As a storage library, Zebra will not sort the data by itself. But it will support creation and use of sorted data either through PIG or through map/reduce tasks that use Zebra as storage format. The sorted table keeps the data in a totally sorted manner across all TFiles created by potentially all mappers or reducers. For sorted data creation through PIG's STORE operator , if the input data is sorted through ORDER BY, the new Zebra table will be marked as sorted on the sorted columns; For sorted data creation though Map/Reduce tasks, three new static methods of the BasicTableOutput class will be provided to allow or help the user to achieve the goal. setSortInfo allows the user to specify the sorted columns of the input tuple to be stored; getSortKeyGenerator and getSortKey help the user to generate the key acceptable by Zebra as a sorted key based upon the schema, sorted columns and the input tuple. For sorted data read through PIG's LOAD operator, pass string sorted as an extra argument to the TableLoader constructor to ask for sorted table to be loaded; For sorted data read through Map/Reduce tasks, a new static method of TableInputFormat class, requireSortedTable, can be called to ask for a sorted table to be read. Additionally, an overloaded version of the new method can be called to ask for a sorted table on specified sort columns and comparator. For this release, sorted table only supported sorting in ascending order, not in descending order. In addition, the sort keys must be of simple types not complex types such as RECORD, COLLECTION and MAP. Multiple-key sorting is supported. But the ordering of the multiple sort keys is significant with the first sort column being the primary sort key, the second being the secondary sort key, etc. In this release, the sort keys are stored along with the sort columns where the keys were originally created from, resulting in some data storage redundancy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's
[ https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774086#action_12774086 ] Pradeep Kamath commented on PIG-1065: - Currently pig parser does not allow specifying a schema for the union. If we do want to allow it there are a few details which arise: 1) From what I know, currently pig allowing specifying row schemas only in load statements. Is this restriction by design? ForEach allows only to give different schemas at individual field level and even there the type has to match if I recollect right. 2) If the input schemas ae unequal in size, should the specified union schema size strictly be MAX of the input schema sizes with nulls being projected for missing columns in the input with the smaller schema? So a specified schema of size MAX(size of input schemas) will not be allowed? 3) Can the specified union schema have different types (castable) than the result of merging the two input schemas - for example if after merging the two input schemas if the first input has int and if specified schema has long? What about demotions like the merged schema being long and specified one being int - would those be disallowed? - I suppose we just allow whatever is allowed in casts In-determinate behaviour of Union when there are 2 non-matching schema's Key: PIG-1065 URL: https://issues.apache.org/jira/browse/PIG-1065 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a script which first does a union of these schemas and then does a ORDER BY of this result. {code} f1 = LOAD '1.txt' as (key:chararray, v:chararray); f2 = LOAD '2.txt' as (key:chararray); u0 = UNION f1, f2; describe u0; dump u0; u1 = ORDER u0 BY $0; dump u1; {code} When I run in Map Reduce mode I get the following result: $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig Schema for u0 unknown. (1,2) (2,3) (1) (2) org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias u1 at org.apache.pig.PigServer.openIterator(PigServer.java:475) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) When I run the same script in local mode I get a different result, as we know that local mode does not use any Hadoop Classes. $java -cp pig.jar org.apache.pig.Main -x local broken.pig Schema for u0 unknown (1,2) (1) (2,3) (2) (1,2) (1) (2,3) (2) Here are some questions 1) Why do we allow union if the schemas do not match 2) Should we not print an error message/warning so that the user knows that this is not allowed or he can get unexpected results? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-958. Resolution: Fixed Fix Version/s: 0.6.0 Hadoop Flags: [Reviewed] Patch committed, thanks for the contribution Ankur! Splitting output data on key field -- Key: PIG-958 URL: https://issues.apache.org/jira/browse/PIG-958 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Ankur Fix For: 0.6.0 Attachments: 958.v3.patch, 958.v4.patch Pig users often face the need to split the output records into a bunch of files and directories depending on the type of record. Pig's SPLIT operator is useful when record types are few and known in advance. In cases where type is not directly known but is derived dynamically from values of a key field in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1071) Support comma separated file/directory names in load statements
[ https://issues.apache.org/jira/browse/PIG-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1071: -- Attachment: PIG-1071.patch Added two more test cases. Support comma separated file/directory names in load statements --- Key: PIG-1071 URL: https://issues.apache.org/jira/browse/PIG-1071 Project: Pig Issue Type: New Feature Reporter: Richard Ding Assignee: Richard Ding Attachments: PIG-1071.patch, PIG-1071.patch Currently Pig Latin support following LOAD syntax: {code} LOAD 'data' [USING loader function] [AS schema]; {code} where data is the name of the file or directory, including files specified with Hadoop-supported globing syntax. This name is passed to the loader function. This feature is to support loaders that can load multiple files from different directories and allows users to pass in the file names in a comma separated string. For example, these will be valid load statements: {code} LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()'; {code} and {code} LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader(); {code} This comma separated string is passed to the loader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1060) MultiQuery optimization throws error for multi-level splits
[ https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774115#action_12774115 ] Hadoop QA commented on PIG-1060: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424143/PIG-1060.patch against trunk revision 833126. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 319 release audit warnings (more than the trunk's current 318 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/41/console This message is automatically generated. MultiQuery optimization throws error for multi-level splits --- Key: PIG-1060 URL: https://issues.apache.org/jira/browse/PIG-1060 Project: Pig Issue Type: Bug Affects Versions: 0.5.0 Reporter: Ankur Assignee: Richard Ding Attachments: PIG-1060.patch Consider the following scenario :- 1. Multi-level splits in the map plan. 2. Each split branch further progressing across a local-global rearrange. 3. Output of each of these finally merged via a UNION. MultiQuery optimizer throws the following error in such a case: ERROR 2146: Internal Error. Inconsistency in key index found during optimization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
How to clone a logical plan ?
Hi, For our cost based optimizer for a given query plan we need to generate alternative query plans and evaluate them based on their estimated cost. As a result of that, I want to clone a logical plan. I thought LogicalPlanCloner is meant for that, but it doesnt seem to work. I added this simple test case in TestLogicalPlanBuilder.java public void testLogicalPlanCloneHelper() throws CloneNotSupportedException{ LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by $0;); LogicalPlanCloner cloner = new LogicalPlanCloner(lp); cloner.getClonedPlan(); } and this fails with the following stacktrace: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171) at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73) at org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46) at org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110) I am debugging this, but wanted to ask if I have hit a bug here or if I am doing something wrong? Thanks, Ashutosh
[jira] Created: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner
ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner --- Key: PIG-1072 URL: https://issues.apache.org/jira/browse/PIG-1072 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner
[ https://issues.apache.org/jira/browse/PIG-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774135#action_12774135 ] Pradeep Kamath commented on PIG-1072: - The requirement that to be reversible the load and store func must be the same class seems restricting. While we're reworking the APIs should we add a call like: {code} Class getReversibleLoader() {code} to the StoreFunc interface. Then the store function can return itself if it is also the load function, it can return null if it has no reversible loader, or it can return another class (like in the Zebra case). This will effect Multi Query optimization and streaming optimization currently built around ReversibleLoadStoreFunc ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner --- Key: PIG-1072 URL: https://issues.apache.org/jira/browse/PIG-1072 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: How to clone a logical plan ?
You have hit a bug. I think LOJoin has to be added to LogicalPlanCloneHelper.java. Can you file a jira? Thanks, Santhosh -Original Message- From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com] Sent: Thursday, November 05, 2009 3:28 PM To: pig-dev@hadoop.apache.org Subject: How to clone a logical plan ? Hi, For our cost based optimizer for a given query plan we need to generate alternative query plans and evaluate them based on their estimated cost. As a result of that, I want to clone a logical plan. I thought LogicalPlanCloner is meant for that, but it doesnt seem to work. I added this simple test case in TestLogicalPlanBuilder.java public void testLogicalPlanCloneHelper() throws CloneNotSupportedException{ LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by $0;); LogicalPlanCloner cloner = new LogicalPlanCloner(lp); cloner.getClonedPlan(); } and this fails with the following stacktrace: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171) at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja va:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja va:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo gicalPlanCloneHelper.java:73) at org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical PlanCloner.java:46) at org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te stLogicalPlanBuilder.java:2110) I am debugging this, but wanted to ask if I have hit a bug here or if I am doing something wrong? Thanks, Ashutosh
[jira] Created: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin
LogicalPlanCloner can't clone plan containing LOJoin Key: PIG-1073 URL: https://issues.apache.org/jira/browse/PIG-1073 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Add following testcase in LogicalPlanBuilder.java public void testLogicalPlanCloner() throws CloneNotSupportedException{ LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by $0;); LogicalPlanCloner cloner = new LogicalPlanCloner(lp); cloner.getClonedPlan(); } and this fails with the following stacktrace: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171) at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73) at org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46) at org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: How to clone a logical plan ?
If my memory serves me correctly, the logical plan cloning was implemented (by me) for cloning inner plans for foreach. As such, the top level plan cloning was never tested and some items are marked as TODO (see visit methods for LOLoad, LOStore and LOStream). If you want to use it as you mention in your test cases, then you need to add code for cloning the LOLoad, LOStore, LOStream and LOJoin. Santhosh -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Thursday, November 05, 2009 4:04 PM To: pig-dev@hadoop.apache.org Subject: RE: How to clone a logical plan ? You have hit a bug. I think LOJoin has to be added to LogicalPlanCloneHelper.java. Can you file a jira? Thanks, Santhosh -Original Message- From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com] Sent: Thursday, November 05, 2009 3:28 PM To: pig-dev@hadoop.apache.org Subject: How to clone a logical plan ? Hi, For our cost based optimizer for a given query plan we need to generate alternative query plans and evaluate them based on their estimated cost. As a result of that, I want to clone a logical plan. I thought LogicalPlanCloner is meant for that, but it doesnt seem to work. I added this simple test case in TestLogicalPlanBuilder.java public void testLogicalPlanCloneHelper() throws CloneNotSupportedException{ LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by $0;); LogicalPlanCloner cloner = new LogicalPlanCloner(lp); cloner.getClonedPlan(); } and this fails with the following stacktrace: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171) at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja va:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja va:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo gicalPlanCloneHelper.java:73) at org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical PlanCloner.java:46) at org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te stLogicalPlanBuilder.java:2110) I am debugging this, but wanted to ask if I have hit a bug here or if I am doing something wrong? Thanks, Ashutosh
[jira] Commented: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin
[ https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774147#action_12774147 ] Santhosh Srinivasan commented on PIG-1073: -- If my memory serves me correctly, the logical plan cloning was implemented (by me) for cloning inner plans for foreach. As such, the top level plan cloning was never tested and some items are marked as TODO (see visit methods for LOLoad, LOStore and LOStream). If you want to use it as you mention in your test cases, then you need to add code for cloning the LOLoad, LOStore, LOStream and LOJoin operators. LogicalPlanCloner can't clone plan containing LOJoin Key: PIG-1073 URL: https://issues.apache.org/jira/browse/PIG-1073 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Add following testcase in LogicalPlanBuilder.java public void testLogicalPlanCloner() throws CloneNotSupportedException{ LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by $0;); LogicalPlanCloner cloner = new LogicalPlanCloner(lp); cloner.getClonedPlan(); } and this fails with the following stacktrace: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171) at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73) at org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46) at org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin
[ https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774146#action_12774146 ] Ashutosh Chauhan commented on PIG-1073: --- It seems that fix is to override the visit method in LogicalPlanCloneHelper.java @Override protected void visit(LOJoin loJoin) throws VisitorException { .. } LogicalPlanCloner can't clone plan containing LOJoin Key: PIG-1073 URL: https://issues.apache.org/jira/browse/PIG-1073 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Add following testcase in LogicalPlanBuilder.java public void testLogicalPlanCloner() throws CloneNotSupportedException{ LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by $0;); LogicalPlanCloner cloner = new LogicalPlanCloner(lp); cloner.getClonedPlan(); } and this fails with the following stacktrace: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171) at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73) at org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46) at org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1026) [zebra] map split returns null
[ https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1026: -- Status: Patch Available (was: Open) [zebra] map split returns null -- Key: PIG-1026 URL: https://issues.apache.org/jira/browse/PIG-1026 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Yan Zhou Fix For: 0.6.0 Attachments: PIG_1026.patch Here is the test scenario: final static String STR_SCHEMA = m1:map(string),m2:map(map(int)); //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1]; final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1,m2]; projection: String projection2 = new String(m1#{b}, m2#{x|z}); User got null pointer exception on reading m1#{b}. Yan, please refer to the test class: TestNonDefaultWholeMapSplit.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's
[ https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774153#action_12774153 ] Santhosh Srinivasan commented on PIG-1065: -- Answer to Question 1: Pig 1.0 had that syntax and it was retained for backward compatibility. Paolo suggested that for uniformity, the 'AS' clause for the load statements should be extended to all relational operators. Gradually, the column aliasing in the foreach should be removed from the documentation and eventually removed from the language. In-determinate behaviour of Union when there are 2 non-matching schema's Key: PIG-1065 URL: https://issues.apache.org/jira/browse/PIG-1065 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.6.0 I have a script which first does a union of these schemas and then does a ORDER BY of this result. {code} f1 = LOAD '1.txt' as (key:chararray, v:chararray); f2 = LOAD '2.txt' as (key:chararray); u0 = UNION f1, f2; describe u0; dump u0; u1 = ORDER u0 BY $0; dump u1; {code} When I run in Map Reduce mode I get the following result: $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig Schema for u0 unknown. (1,2) (2,3) (1) (2) org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias u1 at org.apache.pig.PigServer.openIterator(PigServer.java:475) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) When I run the same script in local mode I get a different result, as we know that local mode does not use any Hadoop Classes. $java -cp pig.jar org.apache.pig.Main -x local broken.pig Schema for u0 unknown (1,2) (1) (2,3) (2) (1,2) (1) (2,3) (2) Here are some questions 1) Why do we allow union if the schemas do not match 2) Should we not print an error message/warning so that the user knows that this is not allowed or he can get unexpected results? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1001) Generate more meaningful error message when one input file does not exist
[ https://issues.apache.org/jira/browse/PIG-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1001: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed. Generate more meaningful error message when one input file does not exist - Key: PIG-1001 URL: https://issues.apache.org/jira/browse/PIG-1001 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1001-1.patch, PIG-1001-2.patch In the following query, if 1.txt does not exist, a = load '1.txt'; b = group a by $0; c = group b all; dump c; Pig throws error message ERROR 2100: file:/tmp/temp155054664/tmp1144108421 does not exist., Pig should deal with it with the error message Input file 1.txt not exist instead of those confusing messages. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: How to clone a logical plan ?
Thanks, Santhosh for quick response and explaination. Saved few hours of debugging :) Ashutosh On Thu, Nov 5, 2009 at 19:21, Santhosh Srinivasan s...@yahoo-inc.com wrote: If my memory serves me correctly, the logical plan cloning was implemented (by me) for cloning inner plans for foreach. As such, the top level plan cloning was never tested and some items are marked as TODO (see visit methods for LOLoad, LOStore and LOStream). If you want to use it as you mention in your test cases, then you need to add code for cloning the LOLoad, LOStore, LOStream and LOJoin. Santhosh -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Thursday, November 05, 2009 4:04 PM To: pig-dev@hadoop.apache.org Subject: RE: How to clone a logical plan ? You have hit a bug. I think LOJoin has to be added to LogicalPlanCloneHelper.java. Can you file a jira? Thanks, Santhosh -Original Message- From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com] Sent: Thursday, November 05, 2009 3:28 PM To: pig-dev@hadoop.apache.org Subject: How to clone a logical plan ? Hi, For our cost based optimizer for a given query plan we need to generate alternative query plans and evaluate them based on their estimated cost. As a result of that, I want to clone a logical plan. I thought LogicalPlanCloner is meant for that, but it doesnt seem to work. I added this simple test case in TestLogicalPlanBuilder.java public void testLogicalPlanCloneHelper() throws CloneNotSupportedException{ LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by $0;); LogicalPlanCloner cloner = new LogicalPlanCloner(lp); cloner.getClonedPlan(); } and this fails with the following stacktrace: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171) at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja va:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja va:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo gicalPlanCloneHelper.java:73) at org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical PlanCloner.java:46) at org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te stLogicalPlanBuilder.java:2110) I am debugging this, but wanted to ask if I have hit a bug here or if I am doing something wrong? Thanks, Ashutosh
[jira] Created: (PIG-1074) Zebra store function should allow '::' in column names in output schema
Zebra store function should allow '::' in column names in output schema --- Key: PIG-1074 URL: https://issues.apache.org/jira/browse/PIG-1074 Project: Pig Issue Type: Bug Reporter: Pradeep Kamath the following script fails: {noformat} a = load '/zebra/singlefile/studenttab10k' using org.apache.hadoop.zebra.pig.TableLoader() as (name, age, gpa); b = load '/zebra/singlefile/votertab10k' using org.apache.hadoop.zebra.pig.TableLoader() as (name, age, registration, contributions); c = filter a by age 20; d = filter b by age 20; store c into '/user/pig/out//ZebraMultiQuery_30.out.1' using org.apache.hadoop.zebra.pig.TableStorer(''); store d into '/user/pig/out//ZebraMultiQuery_30.out.2' using org.apache.hadoop.zebra.pig.TableStorer(''); e = cogroup c by name, d by name; f = foreach e generate flatten(c), flatten(d); store f into '/user/pig//ZebraMultiQuery_30.out.3' using org.apache.hadoop.zebra.pig.TableStorer(''); {noformat} Here the schema of f has names like c::name and it looks like zebra storefunc does not allow '::' in column name The stack trace is ERROR 2997: Unable to recreate exception from backend error: java.io.IOException: ColumnGroup.Writer constructor failed : Partition constructor failed :Encountered : : at line 1, column 3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin
[ https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1073: -- Attachment: pig-1073.patch Draft patch with testcase. LogicalPlanCloner can't clone plan containing LOJoin Key: PIG-1073 URL: https://issues.apache.org/jira/browse/PIG-1073 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Attachments: pig-1073.patch Add following testcase in LogicalPlanBuilder.java public void testLogicalPlanCloner() throws CloneNotSupportedException{ LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by $0;); LogicalPlanCloner cloner = new LogicalPlanCloner(lp); cloner.getClonedPlan(); } and this fails with the following stacktrace: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171) at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73) at org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46) at org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin
[ https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reassigned PIG-1073: - Assignee: Ashutosh Chauhan LogicalPlanCloner can't clone plan containing LOJoin Key: PIG-1073 URL: https://issues.apache.org/jira/browse/PIG-1073 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: pig-1073.patch Add following testcase in LogicalPlanBuilder.java public void testLogicalPlanCloner() throws CloneNotSupportedException{ LogicalPlan lp = buildPlan(C = join ( load 'A') by $0, (load 'B') by $0;); LogicalPlanCloner cloner = new LogicalPlanCloner(lp); cloner.getClonedPlan(); } and this fails with the following stacktrace: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171) at org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73) at org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46) at org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RequiredFields contents
Hi all, I am looking at the RequiredFields class and it has this explanation of what getFields() returns: /** * List of fields required from the input. This includes fields that are * transformed, and thus are no longer the same fields. Using the example 'B * = foreach A generate $0, $2, $3, udf($1)' would produce the list (0, 0), * (0, 2), (0, 3), (0, 1). Note that the order is not guaranteed. */ The second element of the pair is self-explanatory -- but what is the first element in the pair? Thanks, -Dmitriy
[jira] Updated: (PIG-1071) Support comma separated file/directory names in load statements
[ https://issues.apache.org/jira/browse/PIG-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1071: Resolution: Fixed Fix Version/s: 0.6.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1, patch committed, Thanks Richard! Support comma separated file/directory names in load statements --- Key: PIG-1071 URL: https://issues.apache.org/jira/browse/PIG-1071 Project: Pig Issue Type: New Feature Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.6.0 Attachments: PIG-1071.patch, PIG-1071.patch Currently Pig Latin support following LOAD syntax: {code} LOAD 'data' [USING loader function] [AS schema]; {code} where data is the name of the file or directory, including files specified with Hadoop-supported globing syntax. This name is passed to the loader function. This feature is to support loaders that can load multiple files from different directories and allows users to pass in the file names in a comma separated string. For example, these will be valid load statements: {code} LOAD '/usr/pig/test1/a,/usr/pig/test2/b' USING someloader()'; {code} and {code} LOAD '/usr/pig/test1/{a,c},/usr/pig/test2/b' USING someloader(); {code} This comma separated string is passed to the loader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1026) [zebra] map split returns null
[ https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774214#action_12774214 ] Hadoop QA commented on PIG-1026: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423738/PIG_1026.patch against trunk revision 833266. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/141/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/141/console This message is automatically generated. [zebra] map split returns null -- Key: PIG-1026 URL: https://issues.apache.org/jira/browse/PIG-1026 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Yan Zhou Fix For: 0.6.0 Attachments: PIG_1026.patch Here is the test scenario: final static String STR_SCHEMA = m1:map(string),m2:map(map(int)); //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1]; final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1,m2]; projection: String projection2 = new String(m1#{b}, m2#{x|z}); User got null pointer exception on reading m1#{b}. Yan, please refer to the test class: TestNonDefaultWholeMapSplit.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: RequiredFields contents
Hi, I am also curious. I studied this and I guess that it could be the input index. For example, foreach A generate Here A's index is 0 in the inputs of the operator foreach. Let me know if I am wrong. Thanks, Richard Date: Fri, 6 Nov 2009 00:04:36 -0500 Subject: RequiredFields contents From: dvrya...@gmail.com To: pig-dev@hadoop.apache.org Hi all, I am looking at the RequiredFields class and it has this explanation of what getFields() returns: /** * List of fields required from the input. This includes fields that are * transformed, and thus are no longer the same fields. Using the example 'B * = foreach A generate $0, $2, $3, udf($1)' would produce the list (0, 0), * (0, 2), (0, 3), (0, 1). Note that the order is not guaranteed. */ The second element of the pair is self-explanatory -- but what is the first element in the pair? Thanks, -Dmitriy _ “游日本,拿现金”MClub白领股神大赛火热报名中 http://club.msn.cn/pr/?a=emoney
Re: How to clone a logical plan ?
Richard, The Load/Store redesign proposal has an interface that defines how stats get represented; a loader that implements ResourceLoader will pass statistics up into Pig, which will then take care of doing whatever it needs to do with them. The specifics of how the stats get loaded in by the loader are up to the implementation of the loader -- they can be read in from a metadata service, sampled on the fly, stored in a metadata file, etc. For simplicity, we are working with serialized JSON representations of ResourceStatistics right now. -Dmitriy 2009/11/6 RichardGUO Fei gladiato...@hotmail.com: Hi Dmitriy, Thanks for sharing. I look forward to seeing your work. I implemented a storage and want to connect Pig to my storage. In order to let the optimizer fully benefit from the histogram and the side-information of my storage, I am thinking of implementing a cost-based optimizer. How do you plan to pass in the statistics? So let's say that your input file is a plain-text log file, do you require the users to do a statistics themselves? Or do you plan to limit this to only certain types of storage? Thanks, Richard Date: Thu, 5 Nov 2009 22:54:47 -0500 Subject: Re: How to clone a logical plan ? From: dvrya...@gmail.com To: pig-dev@hadoop.apache.org At a high level, we are implementing the framework for propagating statistics between Pig operators, and using said statistics to make moderately intelligent decisions about Join types that should be used (unless they are specified by the user). We do this in a fairly brute-force manner, by generating all alternative plans (that part is not working so hot right now, see subject) and costing them, choosing the global minimum (there is some pruning happening, but not as much as something like System R). As far as relation order inside a given Join, we set that deterministically after choosing the join, as Pig has specific preferences for where the largest relation should go for a given join type. Once we have join type selection working, other optimizations can be added -- the tricky part is making sure the costing functions can't produce drastically wrong results. All the work is happening at the logical layer, between the rule-based optimizer and LogToPhysTranslator. -D 2009/11/5 RichardGUO Fei gladiato...@hotmail.com: Hi, I am also doing a cost-based optimizer. So I am interested in knowing some of the specs that you are after. Thanks, Richard _ 上Windows Live 中国首页,下载Messenger2009安全版! http://www.windowslive.cn _ 上Windows Live 中国首页,下载Messenger2009安全版! http://www.windowslive.cn
RE: RequiredFields contents
The first element in the pair is the input number. Its mostly 0 for most operators. For multi-input operators like join and cogroup, it will range from 0 to (n - 1) where n is the number of inputs. Santhosh -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Thursday, November 05, 2009 9:05 PM To: pig-dev@hadoop.apache.org Subject: RequiredFields contents Hi all, I am looking at the RequiredFields class and it has this explanation of what getFields() returns: /** * List of fields required from the input. This includes fields that are * transformed, and thus are no longer the same fields. Using the example 'B * = foreach A generate $0, $2, $3, udf($1)' would produce the list (0, 0), * (0, 2), (0, 3), (0, 1). Note that the order is not guaranteed. */ The second element of the pair is self-explanatory -- but what is the first element in the pair? Thanks, -Dmitriy
[jira] Created: (PIG-1075) Error in Cogroup when key fields types don't match
Error in Cogroup when key fields types don't match -- Key: PIG-1075 URL: https://issues.apache.org/jira/browse/PIG-1075 Project: Pig Issue Type: Bug Affects Versions: 0.5.0 Reporter: Ankur When Cogrouping 2 relations on multiple key fields, pig throws an error if the corresponding types don't match. Consider the following script:- A = LOAD 'data' USING PigStorage() as (a:chararray, b:int, c:int); B = LOAD 'data' USING PigStorage() as (a:chararray, b:chararray, c:int); C = CoGROUP A BY (a,b,c), B BY (a,b,c); D = FOREACH C GENERATE FLATTEN(A), FLATTEN(B); describe D; dump D; The complete stack trace of the error thrown is Pig Stack Trace --- ERROR 1051: Cannot cast to Unknown org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to describe schema for alias D at org.apache.pig.PigServer.dumpSchema(PigServer.java:436) at org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:233) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:253) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30) at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83) at org.apache.pig.PigServer.compileLp(PigServer.java:821) at org.apache.pig.PigServer.dumpSchema(PigServer.java:428) ... 6 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1060: Cannot resolve COGroup output schema at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2463) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:372) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101) ... 11 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1051: Cannot cast to Unknown at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForCOGroupInnerPlan(TypeCheckingVisitor.java:2552) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2451) ... 16 more The error message does not help the user in identifying the issue clearly especially if the pig script is large and complex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1075) Error in Cogroup when key fields types don't match
[ https://issues.apache.org/jira/browse/PIG-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12774222#action_12774222 ] Ankur commented on PIG-1075: Pig should throw an error message that better identifies the cause of the problem. Error in Cogroup when key fields types don't match -- Key: PIG-1075 URL: https://issues.apache.org/jira/browse/PIG-1075 Project: Pig Issue Type: Bug Affects Versions: 0.5.0 Reporter: Ankur When Cogrouping 2 relations on multiple key fields, pig throws an error if the corresponding types don't match. Consider the following script:- A = LOAD 'data' USING PigStorage() as (a:chararray, b:int, c:int); B = LOAD 'data' USING PigStorage() as (a:chararray, b:chararray, c:int); C = CoGROUP A BY (a,b,c), B BY (a,b,c); D = FOREACH C GENERATE FLATTEN(A), FLATTEN(B); describe D; dump D; The complete stack trace of the error thrown is Pig Stack Trace --- ERROR 1051: Cannot cast to Unknown org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to describe schema for alias D at org.apache.pig.PigServer.dumpSchema(PigServer.java:436) at org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:233) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:253) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30) at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83) at org.apache.pig.PigServer.compileLp(PigServer.java:821) at org.apache.pig.PigServer.dumpSchema(PigServer.java:428) ... 6 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1060: Cannot resolve COGroup output schema at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2463) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:372) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101) ... 11 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1051: Cannot cast to Unknown at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForCOGroupInnerPlan(TypeCheckingVisitor.java:2552) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2451) ... 16 more The error message does not help the user in identifying the issue clearly especially if the pig script is large and complex. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.