[jira] Updated: (PIG-1001) Generate more meaningful error message when one input file does not exist
[ https://issues.apache.org/jira/browse/PIG-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1001: Status: Patch Available (was: Open) Generate more meaningful error message when one input file does not exist - Key: PIG-1001 URL: https://issues.apache.org/jira/browse/PIG-1001 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1001-1.patch In the following query, if 2.txt does not exist, a = load '1.txt'; b = order a by $0; c = load '2.txt'; d = order c by $0; e = join b by $0, d by $0; dump e; Pig throws error message ERROR 2100: file:/tmp/temp155054664/tmp1144108421 does not exist., Pig should deal with it with the error message Input file 2.txt not exist instead of those confusing messages. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries
[ https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12764997#action_12764997 ] Hadoop QA commented on PIG-1020: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421951/PIG-1020-1.patch against trunk revision 824446. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/75/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/75/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/75/console This message is automatically generated. Include an ant target to build pig.jar without hadoop libraries --- Key: PIG-1020 URL: https://issues.apache.org/jira/browse/PIG-1020 Project: Pig Issue Type: New Feature Components: build Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: PIG-1020-1.patch Provide an ant target to build pig.jar without all hadoop related libraries. User will provide external hadoop jars in classpath before invoking pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1001) Generate more meaningful error message when one input file does not exist
[ https://issues.apache.org/jira/browse/PIG-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765016#action_12765016 ] Hadoop QA commented on PIG-1001: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421956/PIG-1001-1.patch against trunk revision 824446. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/21/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/21/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/21/console This message is automatically generated. Generate more meaningful error message when one input file does not exist - Key: PIG-1001 URL: https://issues.apache.org/jira/browse/PIG-1001 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1001-1.patch In the following query, if 2.txt does not exist, a = load '1.txt'; b = order a by $0; c = load '2.txt'; d = order c by $0; e = join b by $0, d by $0; dump e; Pig throws error message ERROR 2100: file:/tmp/temp155054664/tmp1144108421 does not exist., Pig should deal with it with the error message Input file 2.txt not exist instead of those confusing messages. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1019) FINDBUGS: add exclude file
[ https://issues.apache.org/jira/browse/PIG-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765116#action_12765116 ] Olga Natkovich commented on PIG-1019: - -1 on tests is ok since this is not a code related patch -1 on release audit is also ok - it is due to exclude file not having a header can one of the committers review the patch, please. FINDBUGS: add exclude file -- Key: PIG-1019 URL: https://issues.apache.org/jira/browse/PIG-1019 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Olga Natkovich Attachments: PIG-1019.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1016: Status: Patch Available (was: Open) Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Attachments: PIG-1016.patch Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-16) setting parallel from grunt via set command
[ https://issues.apache.org/jira/browse/PIG-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-16. --- Resolution: Fixed setting parallel from grunt via set command --- Key: PIG-16 URL: https://issues.apache.org/jira/browse/PIG-16 Project: Pig Issue Type: Improvement Components: grunt Reporter: Olga Natkovich Priority: Minor I'd like to propose a different model which uses the grunt set option and/or a command line option which sets reduce parallelism to the be true and automatic. set reduce_parallelism TRUE set reduce_parallelism FALSE [Default - BTW, why is this the default?] This way I won't have to update my script every single time I try playing with -Dhod=-m N, parallelism for reduce statements will default, appropriately, to 2*(N-1). Alternatively, could I just specify PARALLEL with no value or PARALLEL DEFAULT; And any time I needed to force reduce to be single job, I could write PARALLEL 1. Basically, this whole thing tripped me up for a long time and I just haven't understood if there is a really good reason to not make parallelism. I guess it might be if you have aggregation functions that do not parallelize. If this is the case, then it seems to me that this should be detectable automagically based on whether the function is a vanilla EvalFunction or if it is an AlgebraicFunction. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1019) FINDBUGS: add exclude file
[ https://issues.apache.org/jira/browse/PIG-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765131#action_12765131 ] Alan Gates commented on PIG-1019: - +1 FINDBUGS: add exclude file -- Key: PIG-1019 URL: https://issues.apache.org/jira/browse/PIG-1019 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Olga Natkovich Attachments: PIG-1019.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records
[ https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765136#action_12765136 ] Alan Gates commented on PIG-1014: - I think I agree with Santhosh here. While it may be unfortunate that our syntax makes it difficult to match the rather strange semantics of COUNT(x) vs COUNT(*) in SQL, I'm not sure trying to make a distinct between COUNT(A) and COUNT(A.$0) is the right solution. This will not be obvious at all to users. If anything, the right way to do this would be COUNT(A.*), but I'm not sure even about that. Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records Key: PIG-1014 URL: https://issues.apache.org/jira/browse/PIG-1014 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1019) FINDBUGS: add exclude file
[ https://issues.apache.org/jira/browse/PIG-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1019: Resolution: Fixed Status: Resolved (was: Patch Available) FINDBUGS: add exclude file -- Key: PIG-1019 URL: https://issues.apache.org/jira/browse/PIG-1019 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Olga Natkovich Attachments: PIG-1019.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException
[ https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-976: - Status: Open (was: Patch Available) Multi-query optimization throws ClassCastException -- Key: PIG-976 URL: https://issues.apache.org/jira/browse/PIG-976 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Ankur Assignee: Richard Ding Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, PIG-976.patch Multi-query optimization fails to merge 2 branches when 1 is a result of Group By ALL and another is a result of Group By field1 where field 1 is of type long. Here is the script that fails with multi-query on. data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); A = GROUP data ALL; B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2; C = FOREACH B GENERATE (sum1/sum2) AS rate; STORE C INTO 'result1'; D = GROUP data BY a; E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c); STORE E into 'result2'; Here is the exception from the logs java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException
[ https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-976: - Status: Patch Available (was: Open) Multi-query optimization throws ClassCastException -- Key: PIG-976 URL: https://issues.apache.org/jira/browse/PIG-976 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Ankur Assignee: Richard Ding Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, PIG-976.patch Multi-query optimization fails to merge 2 branches when 1 is a result of Group By ALL and another is a result of Group By field1 where field 1 is of type long. Here is the script that fails with multi-query on. data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); A = GROUP data ALL; B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2; C = FOREACH B GENERATE (sum1/sum2) AS rate; STORE C INTO 'result1'; D = GROUP data BY a; E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c); STORE E into 'result2'; Here is the exception from the logs java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
LocalRearrange out of bounds exception - tips for debugging?
We ran into what looks like some edge case bug in Pig, which causes it to throw an IndexOutOfBoundsException (stack trace below). The script just joins two relations; it looks like our data was generated incorrectly, and the join is empty, which may be what's causing the failure. It also appears to only happen when at least one of the inputs is on the large size (at least a few hundred megs). Any ideas on what could be happening and how to zoom in on the underlying cause? We are running off unmodified trunk. Script: register datagen.jar; E = load 'Employee' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (id,name,cc,dc); D = load 'Department' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (dept_id,dept_nm); P = load 'Project' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (id,emp_id,role); R1 = JOIN E by dc, D by dept_id; R2 = JOIN R1 by E::id, P by emp_id; store R2 into 'TestCase2Output'; R2 join fails with the stack trace below. It also fails if we pre-calculate R1, store it, and load it directly (so, load R1, load P, join R1 by $0, P by emp_id). We've verified that the records in R1 and R2 have the expected fields, etc. Stack Trace: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:148) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:226) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170)
[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765172#action_12765172 ] Dmitriy V. Ryaboy commented on PIG-966: --- Alan, thanks for the explanation on the kinds of pushdowns you are envisioning. This makes sense, although I have a feeling that if we get this complex with pushdowns, it may be more appropriate to start thinking of interfaces that expose different access paths, rather than pushdownable operations. Starting to think perhaps you are right in wanting to make this a single interface instead of multiple ones like I suggested. A couple more thoughts on the LoadPushdown interface. getFeatures() should probably return a Set, not a List, as duplicates don't really make sense and we want fast contains() calls on the returned object. The new idea is just a small tweak on your design that aims to avoid the OperatorPlan issue. Maintain a Set of LogicalOperator classes (as in, LOProject.class) to indicate acceptable operators, and provide an pushOperator(LogicalOperator op) method, which can be called multiple times. If the order of operators matters, it should be up to whoever is calling this method to do so in the right order. This does force LoadFunc implementations to understand Pig operator classes, and in the case of Filter it does have to deal with an inner LogicalPlan, but I think those classes are mostly ok. If someone is advanced enough to want to implement pushdowns, they can handle those interfaces. There is the danger of the interfaces changing, of course, but, well, that consideration hasn't stopped Hadoop... and we are setting a precedent by breaking the LoadFunc interface right now anyway :-). Too simple? Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces --- Key: PIG-966 URL: https://issues.apache.org/jira/browse/PIG-966 Project: Pig Issue Type: Improvement Components: impl Reporter: Alan Gates Assignee: Alan Gates I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for full details -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765183#action_12765183 ] Hadoop QA commented on PIG-1016: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421949/PIG-1016.patch against trunk revision 824446. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/76/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/76/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/76/console This message is automatically generated. Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Attachments: PIG-1016.patch Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records
[ https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765192#action_12765192 ] Pradeep Kamath commented on PIG-1014: - The issue I see is with the implementation of COUNT today. It looks at only the first field in the bag and counts only non null values towards the result. This can lead to mysterious results. Consider a relation (A) with two fields with the following contents: {noformat} 1 2 3 4 null 6 7 null null null {noformat} If we have the following snippet: {code} B = group A all; C = foreach B generate COUNT(A); {code} The answer is 3 which was arrived at only by considering record 1, record 2 and record 4 since the other records have null in the first position. Ironically though record 4 has null in the second position that does not prevent it from being not counted. So the result being based on the null-ness of just the first field seems somewhat arbitrary. My concern is that most users would not know that the result was arrived at *after* dropping records which had null in the first field even though they did not specify COUNT(A.$0). Status Quo means we equate COUNT(A) to COUNT(A.$0) which is also not apparent to users. Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records Key: PIG-1014 URL: https://issues.apache.org/jira/browse/PIG-1014 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records
[ https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765194#action_12765194 ] Santhosh Srinivasan commented on PIG-1014: -- Essentially, Pradeep is pointing out an issue in the implementation of COUNT. If that is the case then COUNT has to be fixed or the semantics of COUNT has to be documented to explain the current implementation. I would vote for fixing COUNT to have the correct semantics. Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records Key: PIG-1014 URL: https://issues.apache.org/jira/browse/PIG-1014 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: LocalRearrange out of bounds exception - tips for debugging?
Have you checked that each record your input data has at least the number of fields you specify? Have you checked that the field separator in your data matches the default for PigPerformanceLoader (^A I think)? Alan. On Oct 13, 2009, at 10:28 AM, Dmitriy Ryaboy wrote: We ran into what looks like some edge case bug in Pig, which causes it to throw an IndexOutOfBoundsException (stack trace below). The script just joins two relations; it looks like our data was generated incorrectly, and the join is empty, which may be what's causing the failure. It also appears to only happen when at least one of the inputs is on the large size (at least a few hundred megs). Any ideas on what could be happening and how to zoom in on the underlying cause? We are running off unmodified trunk. Script: register datagen.jar; E = load 'Employee' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (id,name,cc,dc); D = load 'Department' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (dept_id,dept_nm); P = load 'Project' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (id,emp_id,role); R1 = JOIN E by dc, D by dept_id; R2 = JOIN R1 by E::id, P by emp_id; store R2 into 'TestCase2Output'; R2 join fails with the stack trace below. It also fails if we pre-calculate R1, store it, and load it directly (so, load R1, load P, join R1 by $0, P by emp_id). We've verified that the records in R1 and R2 have the expected fields, etc. Stack Trace: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org .apache .pig .backend .hadoop .executionengine .physicalLayer.expressionOperators.POProject.getNext(POProject.java: 148) at org .apache .pig .backend .hadoop .executionengine .physicalLayer.expressionOperators.POProject.getNext(POProject.java: 226) at org .apache .pig .backend .hadoop .executionengine .physicalLayer .relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java: 260) at org .apache .pig .backend .hadoop .executionengine .physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org .apache .pig .backend .hadoop .executionengine .mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org .apache .pig .backend .hadoop .executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org .apache .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce $Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170)
[jira] Commented: (PIG-976) Multi-query optimization throws ClassCastException
[ https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765216#action_12765216 ] Hadoop QA commented on PIG-976: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422000/PIG-976.patch against trunk revision 824838. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. -1 release audit. The applied patch generated 295 release audit warnings (more than the trunk's current 292 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/22/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/22/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/22/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/22/console This message is automatically generated. Multi-query optimization throws ClassCastException -- Key: PIG-976 URL: https://issues.apache.org/jira/browse/PIG-976 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Ankur Assignee: Richard Ding Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, PIG-976.patch Multi-query optimization fails to merge 2 branches when 1 is a result of Group By ALL and another is a result of Group By field1 where field 1 is of type long. Here is the script that fails with multi-query on. data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); A = GROUP data ALL; B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2; C = FOREACH B GENERATE (sum1/sum2) AS rate; STORE C INTO 'result1'; D = GROUP data BY a; E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c); STORE E into 'result2'; Here is the exception from the logs java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228) at
[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries
[ https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1020: Status: Open (was: Patch Available) Include an ant target to build pig.jar without hadoop libraries --- Key: PIG-1020 URL: https://issues.apache.org/jira/browse/PIG-1020 Project: Pig Issue Type: New Feature Components: build Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: PIG-1020-1.patch, PIG-1020-2.patch Provide an ant target to build pig.jar without all hadoop related libraries. User will provide external hadoop jars in classpath before invoking pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries
[ https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1020: Attachment: PIG-1020-2.patch Change the test target to depend on jar-withouthadoop rather than jar Include an ant target to build pig.jar without hadoop libraries --- Key: PIG-1020 URL: https://issues.apache.org/jira/browse/PIG-1020 Project: Pig Issue Type: New Feature Components: build Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: PIG-1020-1.patch, PIG-1020-2.patch Provide an ant target to build pig.jar without all hadoop related libraries. User will provide external hadoop jars in classpath before invoking pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries
[ https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on PIG-1020 started by Daniel Dai. Include an ant target to build pig.jar without hadoop libraries --- Key: PIG-1020 URL: https://issues.apache.org/jira/browse/PIG-1020 Project: Pig Issue Type: New Feature Components: build Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: PIG-1020-1.patch, PIG-1020-2.patch Provide an ant target to build pig.jar without all hadoop related libraries. User will provide external hadoop jars in classpath before invoking pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries
[ https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1020: Status: Patch Available (was: In Progress) Include an ant target to build pig.jar without hadoop libraries --- Key: PIG-1020 URL: https://issues.apache.org/jira/browse/PIG-1020 Project: Pig Issue Type: New Feature Components: build Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: PIG-1020-1.patch, PIG-1020-2.patch Provide an ant target to build pig.jar without all hadoop related libraries. User will provide external hadoop jars in classpath before invoking pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode
[ https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765275#action_12765275 ] Daniel Dai commented on PIG-921: The result should be ((1,a),(1,b)), ((2,aa),(2,bb). Map-reduce mode produces wrong result. Strange use case for Join which produces different results in local and map reduce mode --- Key: PIG-921 URL: https://issues.apache.org/jira/browse/PIG-921 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Environment: Hadoop 18 and Hadoop 20 Reporter: Viraj Bhat Attachments: A.txt, B.txt, joinusecase.pig I have script in this manner, loads from 2 files A.txt and B.txt {code} A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray)); B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray)); C = JOIN A by a.a1, B by b.b1; DESCRIBE C; DUMP C; {code} A.txt contains the following lines: {code} (1,a) (2,aa) {code} B.txt contains the following lines: {code} (1,b) (2,bb) {code} Now running the above script in local and map reduce mode on Hadoop 18 Hadoop 20, produces the following: Hadoop 18 = (1,1) (2,2) = Hadoop 20 = (1,1) (2,2) = Local Mode: Pig with Hadoop 18 jar release = 2009-08-13 17:15:13,473 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)} 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias C 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C Details at logfile: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log = Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) ... 9 more = Local Mode: Pig with Hadoop 20 jar release = ((1,a),(1,b)) ((2,aa),(2,bb) = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode
[ https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-921: -- Assignee: Daniel Dai Strange use case for Join which produces different results in local and map reduce mode --- Key: PIG-921 URL: https://issues.apache.org/jira/browse/PIG-921 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Environment: Hadoop 18 and Hadoop 20 Reporter: Viraj Bhat Assignee: Daniel Dai Attachments: A.txt, B.txt, joinusecase.pig I have script in this manner, loads from 2 files A.txt and B.txt {code} A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray)); B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray)); C = JOIN A by a.a1, B by b.b1; DESCRIBE C; DUMP C; {code} A.txt contains the following lines: {code} (1,a) (2,aa) {code} B.txt contains the following lines: {code} (1,b) (2,bb) {code} Now running the above script in local and map reduce mode on Hadoop 18 Hadoop 20, produces the following: Hadoop 18 = (1,1) (2,2) = Hadoop 20 = (1,1) (2,2) = Local Mode: Pig with Hadoop 18 jar release = 2009-08-13 17:15:13,473 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)} 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias C 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C Details at logfile: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log = Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) ... 9 more = Local Mode: Pig with Hadoop 20 jar release = ((1,a),(1,b)) ((2,aa),(2,bb) = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries
[ https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1020: Status: Open (was: Patch Available) Include an ant target to build pig.jar without hadoop libraries --- Key: PIG-1020 URL: https://issues.apache.org/jira/browse/PIG-1020 Project: Pig Issue Type: New Feature Components: build Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: PIG-1020-1.patch, PIG-1020-2.patch Provide an ant target to build pig.jar without all hadoop related libraries. User will provide external hadoop jars in classpath before invoking pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode
[ https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-921: --- Attachment: PIG-921-1.patch The problem is in POLocalReArragement, we skip the entire tuple in the value if we use one field of the tuple as join key. Strange use case for Join which produces different results in local and map reduce mode --- Key: PIG-921 URL: https://issues.apache.org/jira/browse/PIG-921 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Environment: Hadoop 18 and Hadoop 20 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.6.0 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch I have script in this manner, loads from 2 files A.txt and B.txt {code} A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray)); B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray)); C = JOIN A by a.a1, B by b.b1; DESCRIBE C; DUMP C; {code} A.txt contains the following lines: {code} (1,a) (2,aa) {code} B.txt contains the following lines: {code} (1,b) (2,bb) {code} Now running the above script in local and map reduce mode on Hadoop 18 Hadoop 20, produces the following: Hadoop 18 = (1,1) (2,2) = Hadoop 20 = (1,1) (2,2) = Local Mode: Pig with Hadoop 18 jar release = 2009-08-13 17:15:13,473 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)} 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias C 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C Details at logfile: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log = Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) ... 9 more = Local Mode: Pig with Hadoop 20 jar release = ((1,a),(1,b)) ((2,aa),(2,bb) = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries
[ https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1020: Status: Patch Available (was: Open) Include an ant target to build pig.jar without hadoop libraries --- Key: PIG-1020 URL: https://issues.apache.org/jira/browse/PIG-1020 Project: Pig Issue Type: New Feature Components: build Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: PIG-1020-1.patch, PIG-1020-2.patch Provide an ant target to build pig.jar without all hadoop related libraries. User will provide external hadoop jars in classpath before invoking pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode
[ https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-921: --- Fix Version/s: 0.6.0 Affects Version/s: (was: 0.3.0) 0.4.0 Status: Patch Available (was: Open) Strange use case for Join which produces different results in local and map reduce mode --- Key: PIG-921 URL: https://issues.apache.org/jira/browse/PIG-921 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Environment: Hadoop 18 and Hadoop 20 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.6.0 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch I have script in this manner, loads from 2 files A.txt and B.txt {code} A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray)); B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray)); C = JOIN A by a.a1, B by b.b1; DESCRIBE C; DUMP C; {code} A.txt contains the following lines: {code} (1,a) (2,aa) {code} B.txt contains the following lines: {code} (1,b) (2,bb) {code} Now running the above script in local and map reduce mode on Hadoop 18 Hadoop 20, produces the following: Hadoop 18 = (1,1) (2,2) = Hadoop 20 = (1,1) (2,2) = Local Mode: Pig with Hadoop 18 jar release = 2009-08-13 17:15:13,473 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)} 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias C 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C Details at logfile: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log = Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) ... 9 more = Local Mode: Pig with Hadoop 20 jar release = ((1,a),(1,b)) ((2,aa),(2,bb) = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1016: - Attachment: (was: PIG-1016.patch) Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1016: - Status: Open (was: Patch Available) Didn't pass a few other affected unit tests Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Attachments: PIG-1016.patch Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1016: - Status: Patch Available (was: Open) Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Attachments: PIG-1016.patch Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1016: - Attachment: PIG-1016.patch Sorry, first time contributor. This submit includes the fix and fixes several unit tests that failed Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Attachments: PIG-1016.patch Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1016) Reading in map data seems broken
[ https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765302#action_12765302 ] Dmitriy V. Ryaboy commented on PIG-1016: No worries, we are used to Jira sending us a never-ending stream of updates :-). Looks good to me (assuming this passes Hudson). Reading in map data seems broken Key: PIG-1016 URL: https://issues.apache.org/jira/browse/PIG-1016 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.4.0 Reporter: hc busy Attachments: PIG-1016.patch Hi, I'm trying to load a map that has a tuple for value. The read fails in 0.4.0 because of a misconfiguration in the parser. Where as in almost all documentation it is stated that value of the map can be any time. I've attached a patch that allows us to read in complex objects as value as documented. I've done simple verification of loading in maps with tuple/map values and writing them back out using LOAD and STORE. All seems to work fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries
[ https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765311#action_12765311 ] Hadoop QA commented on PIG-1020: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422019/PIG-1020-2.patch against trunk revision 824838. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/23/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/23/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/23/console This message is automatically generated. Include an ant target to build pig.jar without hadoop libraries --- Key: PIG-1020 URL: https://issues.apache.org/jira/browse/PIG-1020 Project: Pig Issue Type: New Feature Components: build Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: PIG-1020-1.patch, PIG-1020-2.patch Provide an ant target to build pig.jar without all hadoop related libraries. User will provide external hadoop jars in classpath before invoking pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-968) findContainingJar fails when there's a + in the path
[ https://issues.apache.org/jira/browse/PIG-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-968. Resolution: Fixed Patch checked in. Thanks Todd. findContainingJar fails when there's a + in the path Key: PIG-968 URL: https://issues.apache.org/jira/browse/PIG-968 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0, 0.5.0 Reporter: Todd Lipcon Attachments: pig-968.txt This is the same bug as in MAPREDUCE-714. Please see discussion there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-858) Order By followed by replicated join fails while compiling MR-plan from physical plan
[ https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765334#action_12765334 ] Alan Gates commented on PIG-858: I'm reviewing this patch. Order By followed by replicated join fails while compiling MR-plan from physical plan --- Key: PIG-858 URL: https://issues.apache.org/jira/browse/PIG-858 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.6.0 Attachments: pig-858.patch Consider the query: {code} A = load 'a'; B = order A by $0; C = join A by $0, B by $0; explain C; {code} works. But if replicated join is used instead {code} A = load 'a'; B = order A by $0; C = join A by $0, B by $0 using replicated; explain C; {code} this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error compiling operator POFRJoin relevant stacktrace: {code} Caused by: java.lang.RuntimeException: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException: ERROR 2034: Error compiling operator POFRJoin at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306) at org.apache.pig.PigServer.explain(PigServer.java:574) ... 8 more Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException: ERROR 2034: Error compiling operator POFRJoin at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901) ... 16 more {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records
[ https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765357#action_12765357 ] Santhosh Srinivasan commented on PIG-1014: -- After a discussion with Pradeep who also graciously ran SQL queries to verify semantics, we have the following proposal: The semantics of COUNT could be defined as: 1. COUNT( A ) is equivalent to COUNT( A.* ) and the result of COUNT( A ) will count null tuples in the relation 2. COUNT( A.$0) will not count null tuples in the relation 3. COUNT(A.($0, $1)) is equivalent to COUNT( A1.* ) where A1 is the relation containing tuples with two columns and will exhibit the behavior of statement 1 OR 3. COUNT(A.($0, $1)) is equivalent to COUNT( A1.* ) where A1 is the relation containing tuples with two columns and will exhibit the behavior of statement 2 Point 3 needs more discussion. Comments/thoughts/suggestions/anything else welcome. Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records Key: PIG-1014 URL: https://issues.apache.org/jira/browse/PIG-1014 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-976) Multi-query optimization throws ClassCastException
[ https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765360#action_12765360 ] Pradeep Kamath commented on PIG-976: +1 changes look good - please address the findbugs and release audit warnings if appropriate. Multi-query optimization throws ClassCastException -- Key: PIG-976 URL: https://issues.apache.org/jira/browse/PIG-976 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Ankur Assignee: Richard Ding Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, PIG-976.patch Multi-query optimization fails to merge 2 branches when 1 is a result of Group By ALL and another is a result of Group By field1 where field 1 is of type long. Here is the script that fails with multi-query on. data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); A = GROUP data ALL; B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2; C = FOREACH B GENERATE (sum1/sum2) AS rate; STORE C INTO 'result1'; D = GROUP data BY a; E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c); STORE E into 'result2'; Here is the exception from the logs java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries
[ https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765366#action_12765366 ] Hadoop QA commented on PIG-1020: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422019/PIG-1020-2.patch against trunk revision 824838. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/24/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/24/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/24/console This message is automatically generated. Include an ant target to build pig.jar without hadoop libraries --- Key: PIG-1020 URL: https://issues.apache.org/jira/browse/PIG-1020 Project: Pig Issue Type: New Feature Components: build Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: PIG-1020-1.patch, PIG-1020-2.patch Provide an ant target to build pig.jar without all hadoop related libraries. User will provide external hadoop jars in classpath before invoking pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1017) Converts strings to text in Pig
[ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765380#action_12765380 ] Sriranjan Manjunath commented on PIG-1017: -- Pigmix results before and after converting strings to text: ||Pigmix query||Trunk||Modified code|| |L1| 3:2|2:24| |L2| 2:6|1:23| |L3| 3:36|3:49| |L4| 1:42|1:49| |L5| 1:49|1:49| |L6| 1:47|3:3| |L7| 1:44|1:49| |L8| 1:19|1:18| |L9| 4:6|5:35| |L10| 8:52|7:56| |L11| 2:26|1:34| |L12| 1:57|1:54| Converts strings to text in Pig --- Key: PIG-1017 URL: https://issues.apache.org/jira/browse/PIG-1017 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1017) Converts strings to text in Pig
[ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765381#action_12765381 ] Sriranjan Manjunath commented on PIG-1017: -- Something fishy is going on. I ran L6 a couple more times with the modified code and it completed in 1:8 Converts strings to text in Pig --- Key: PIG-1017 URL: https://issues.apache.org/jira/browse/PIG-1017 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode
[ https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765385#action_12765385 ] Hadoop QA commented on PIG-921: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422030/PIG-921-1.patch against trunk revision 824980. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/78/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/78/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/78/console This message is automatically generated. Strange use case for Join which produces different results in local and map reduce mode --- Key: PIG-921 URL: https://issues.apache.org/jira/browse/PIG-921 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Environment: Hadoop 18 and Hadoop 20 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.6.0 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch I have script in this manner, loads from 2 files A.txt and B.txt {code} A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray)); B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray)); C = JOIN A by a.a1, B by b.b1; DESCRIBE C; DUMP C; {code} A.txt contains the following lines: {code} (1,a) (2,aa) {code} B.txt contains the following lines: {code} (1,b) (2,bb) {code} Now running the above script in local and map reduce mode on Hadoop 18 Hadoop 20, produces the following: Hadoop 18 = (1,1) (2,2) = Hadoop 20 = (1,1) (2,2) = Local Mode: Pig with Hadoop 18 jar release = 2009-08-13 17:15:13,473 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)} 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias C 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C Details at logfile: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log = Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) ... 9 more
[jira] Created: (PIG-1021) Cast of nested types does work as expected
Cast of nested types does work as expected -- Key: PIG-1021 URL: https://issues.apache.org/jira/browse/PIG-1021 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Daniel Dai Fix For: 0.6.0 The following script does not work as expected: 1.txt: (0.2,0.3) a = load '1.txt'; b = foreach a generate (tuple(int, int))$0; describe b; b: {(int,int)} dump b; ((0.2,0.3)) The expect result is ((0, 0)) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1021) Cast of nested types does work as expected
[ https://issues.apache.org/jira/browse/PIG-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-1021. - Resolution: Duplicate It is a duplication of Pig-613 Cast of nested types does work as expected -- Key: PIG-1021 URL: https://issues.apache.org/jira/browse/PIG-1021 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Daniel Dai Fix For: 0.6.0 The following script does not work as expected: 1.txt: (0.2,0.3) a = load '1.txt'; b = foreach a generate (tuple(int, int))$0; describe b; b: {(int,int)} dump b; ((0.2,0.3)) The expect result is ((0, 0)) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.