[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2013-10-18 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799469#comment-13799469
 ] 

Koji Noguchi commented on PIG-3020:
---

FYI, I'm trying to revert the change from this jira at PIG-3492.

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Jonathan Coveney
 Fix For: 0.11, 0.12.0

 Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, 
 PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch


 The following validates OK with pig 0.9 and fails with the following error in 
 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2013-01-14 Thread Eli Finkelshteyn (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553541#comment-13553541
 ] 

Eli Finkelshteyn commented on PIG-3020:
---

Is there any work around for this for 0.10 aside from substituting cogroup for 
join?

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Jonathan Coveney
 Fix For: 0.11, 0.12

 Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, 
 PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch


 The following validates OK with pig 0.9 and fails with the following error in 
 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534401#comment-13534401
 ] 

Julien Le Dem commented on PIG-3020:


looks good to me
+1

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, 
 PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch


 The following validates OK with pig 0.9 and fails with the following error in 
 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-17 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534454#comment-13534454
 ] 

Jonathan Coveney commented on PIG-3020:
---

This is in trunk. Not sure if it meets the criteria to be in pig-11?

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, 
 PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch


 The following validates OK with pig 0.9 and fails with the following error in 
 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-17 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534495#comment-13534495
 ] 

Dmitriy V. Ryaboy commented on PIG-3020:


existing scripts that work on pig 9 don't work on 11 without this so I think it 
needs to be in 11 (to prevent breaking changes).

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, 
 PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch


 The following validates OK with pig 0.9 and fails with the following error in 
 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-17 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534537#comment-13534537
 ] 

Jonathan Coveney commented on PIG-3020:
---

I am inclined to agree. Will commit to 0.11

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, 
 PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch


 The following validates OK with pig 0.9 and fails with the following error in 
 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-12 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530255#comment-13530255
 ] 

Julien Le Dem commented on PIG-3020:


[~dvryaboy] I just noticed it was logging a warning with a NullPointerException 
when running tests from eclipse. I just fixed the log line to something 
clearer. It is not related but I feel it is small enough to be done here.
[~jcoveney] I also added a unit test with a pig script that was failing before 
and works now to validate my change.

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
 Attachments: PIG-3020.patch


 The following vali=dates OK with pig 0.9 and fails with the following error 
 in 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-07 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526864#comment-13526864
 ] 

Jonathan Coveney commented on PIG-3020:
---

This looks good to me, though I wonder if there is anyone who knows this code 
better than can take a look.

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
 Attachments: PIG-3020.patch


 The following vali=dates OK with pig 0.9 and fails with the following error 
 in 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-07 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526925#comment-13526925
 ] 

Dmitriy V. Ryaboy commented on PIG-3020:


are the manifest changes related?

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
 Attachments: PIG-3020.patch


 The following vali=dates OK with pig 0.9 and fails with the following error 
 in 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira