[jira] [Reopened] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-17 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reopened PIG-5453:
---

While tracking multiple jiras, I missed that this patch was not put through 
full unit/e2e tests.   (Thus the previous syntax error.) 

After fixing the simple syntax error, saw a couple of regression test failures. 
 At this point, reverting the patch while I debug and come up with a new patch. 

So sorry.

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch, pig-5453-v02.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-16 Thread Daniel Dai (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847097#comment-17847097
 ] 

Daniel Dai commented on PIG-5453:
-

+1

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch, pig-5453-v02.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-16 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847095#comment-17847095
 ] 

Koji Noguchi commented on PIG-5453:
---

Sorry, original patch had extra comma causing the compile error for 
TestFlatten.java. 
Uploaded pig-5453-v02.patch.   To fix the broken trunk, I pushed the change.

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch, pig-5453-v02.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-16 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5453:
--
Attachment: pig-5453-v02.patch

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch, pig-5453-v02.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5453.
---
Fix Version/s: 0.19.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the review Daniel!
Committed to trunk.

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5452.
---
Fix Version/s: 0.19.0
   Resolution: Fixed

Thanks for the review Daniel! 
Committed to trunk.

> Null handling of FLATTEN with user defined schema (as clause)
> -
>
> Key: PIG-5452
> URL: https://issues.apache.org/jira/browse/PIG-5452
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5452-v01.patch
>
>
> Follow up from PIG-5201, 
> {code:java}
> A = load 'input' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 
> as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
> dump C;{code}
> This produces right number of nulls.
> {code:java}
> (a,,,a)
> (b,,,b)
> (c,,,c)
> (d,,,d)
> (f,,,f) {code}
>  
> However, 
> {code:java}
> A = load 'input.txt' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
> dump C;{code}
> This produces wrong number of null and the output is shifted incorrectly. 
> {code:java}
> (a,,a,)
> (b,,b,)
> (c,,c,)
> (d,,d,)
> (f,,f,) {code}
> Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
> tuple() with empty inner fields but with user defined schema of "as 
> (A1:chararray, A2:chararray)". 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5450.
---
Fix Version/s: 0.19.0
   Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
> --
>
> Key: PIG-5450
> URL: https://issues.apache.org/jira/browse/PIG-5450
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5450-v01.patch
>
>
> {noformat}
> Caused by: java.lang.VerifyError: Bad return type
> Exception Details:
> Location:
> org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
>  @117: areturn
> Reason:
> Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5446.
---
Fix Version/s: 0.19.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
> ---
>
> Key: PIG-5446
> URL: https://issues.apache.org/jira/browse/PIG-5446
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5446-v01.patch
>
>
> {noformat}
> Unable to open iterator for alias B. Backend error : Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, 
> vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, 
> taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Attempt failed because it appears to make no progress for 
> 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms]], Vertex did not succeed due to 
> OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at org.apache.pig.PigServer.openIterator(PigServer.java:1014)
> at 
> org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58)
> Caused by: org.apache.tez.dag.api.TezException: Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at 
> org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 45.647 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5448.
---
Fix Version/s: 0.19.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> All TestHBaseStorage tests failing on pig-on-spark3
> ---
>
> Key: PIG-5448
> URL: https://issues.apache.org/jira/browse/PIG-5448
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.19.0
>
> Attachments: pig-5448-v01.patch
>
>
> For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are 
> failing with 
> {noformat}
> org.apache.pig.PigException: ERROR 1002: Unable to store alias b
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at 
> org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.RuntimeException: No task metrics available for jobId 0
> at 
> org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5447.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with 
> NoSuchElementException
> ---
>
> Key: PIG-5447
> URL: https://issues.apache.org/jira/browse/PIG-5447
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5447-v01.patch
>
>
> TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer 
> and full-outer joins.
> "Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5447:
--
Fix Version/s: 0.19.0

> Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with 
> NoSuchElementException
> ---
>
> Key: PIG-5447
> URL: https://issues.apache.org/jira/browse/PIG-5447
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5447-v01.patch
>
>
> TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer 
> and full-outer joins.
> "Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5439) Support Spark 3 and drop SparkShim

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5439.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the review Rohini!
Committed to trunk

> Support Spark 3 and drop SparkShim
> --
>
> Key: PIG-5439
> URL: https://issues.apache.org/jira/browse/PIG-5439
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5439-v01.patch, pig-5439-v02.patch
>
>
> Support Pig-on-Spark to run on spark3. 
> Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. 
> This is due to log4j mismatch. 
> After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher.
> So far, not all unit/e2e tests pass with the proposed patch but at least 
> compilation goes through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5438:
--
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

Thanks for the review Rohini! 
Committed to trunk.

> Update SparkCounter.Accumulator to AccumulatorV2
> 
>
> Key: PIG-5438
> URL: https://issues.apache.org/jira/browse/PIG-5438
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5438-v01.patch
>
>
> Original Accumulator is deprecated in Spark2 and gone in Spark3.  
> AccumulatorV2 is usable on both Spark2 and Spark3. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5416.
---
Fix Version/s: 0.19.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> Spark unit tests failing randomly with "java.lang.RuntimeException: 
> Unexpected job execution status RUNNING"
> 
>
> Key: PIG-5416
> URL: https://issues.apache.org/jira/browse/PIG-5416
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.19.0
>
> Attachments: pig-5416-v01.patch
>
>
> Spark unit tests fail randomly with same errors. 
>  Sample stack trace showing "Caused by: java.lang.RuntimeException: 
> Unexpected job execution status RUNNING".
> {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF}
> Unable to store alias B
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias B
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:721)
> at 
> org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.execute(PigServer.java:1453)
> at org.apache.pig.PigServer.access$500(PigServer.java:119)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5439) Support Spark 3 and drop SparkShim

2024-05-07 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844427#comment-17844427
 ] 

Rohini Palaniswamy commented on PIG-5439:
-

+1

> Support Spark 3 and drop SparkShim
> --
>
> Key: PIG-5439
> URL: https://issues.apache.org/jira/browse/PIG-5439
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5439-v01.patch, pig-5439-v02.patch
>
>
> Support Pig-on-Spark to run on spark3. 
> Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. 
> This is due to log4j mismatch. 
> After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher.
> So far, not all unit/e2e tests pass with the proposed patch but at least 
> compilation goes through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly

2024-04-18 Thread Daniel Dai (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838865#comment-17838865
 ] 

Daniel Dai commented on PIG-5453:
-

+1

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5453-v01.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5454) Make ParallelGC the default Garbage Collection

2024-04-18 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5454:
-

 Summary: Make ParallelGC the default Garbage Collection
 Key: PIG-5454
 URL: https://issues.apache.org/jira/browse/PIG-5454
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Koji Noguchi


>From JDK9 and beyond, G1GC became the default GC. 
I've seen our users hitting OOM after migrating to recent jdk and the issue 
going away after reverting back to ParallelGC.  

Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5453) FLATTEN shifting fields incorrectly

2024-04-18 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5453:
--
Attachment: pig-5453-v01.patch

Uploading the patch that uses a new field introduced as part of PIG-5201, 
PIG-5452.  If number of fields are less than the expected number of fields, it 
will now fills the rest with null.   If number of fields are more, then it 
would now fills up to the expected number of fields only.  (pig-5453-v01.patch) 

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5453-v01.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-04-16 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837863#comment-17837863
 ] 

Rohini Palaniswamy commented on PIG-5450:
-

+1

> Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
> --
>
> Key: PIG-5450
> URL: https://issues.apache.org/jira/browse/PIG-5450
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5450-v01.patch
>
>
> {noformat}
> Caused by: java.lang.VerifyError: Bad return type
> Exception Details:
> Location:
> org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
>  @117: areturn
> Reason:
> Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3

2024-04-16 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837862#comment-17837862
 ] 

Rohini Palaniswamy commented on PIG-5449:
-

+1

> TestEmptyInputDir failing on pig-on-spark3
> --
>
> Key: PIG-5449
> URL: https://issues.apache.org/jira/browse/PIG-5449
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5449-v01.patch
>
>
> TestEmptyInputDir failing on pig-on-spark3 with 
> {noformat:title=TestEmptyInputDir.testMergeJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141)
> {noformat}
> {noformat:title=TestEmptyInputDir.testGroupByFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80)
> {noformat}
> {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297)
> {noformat}
> {noformat:title=TestEmptyInputDir.testFRJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171)
> {noformat}
> {noformat:title=TestEmptyInputDir.testBloomJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267)
>  {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3

2024-04-16 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837861#comment-17837861
 ] 

Rohini Palaniswamy commented on PIG-5448:
-

+1

> All TestHBaseStorage tests failing on pig-on-spark3
> ---
>
> Key: PIG-5448
> URL: https://issues.apache.org/jira/browse/PIG-5448
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5448-v01.patch
>
>
> For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are 
> failing with 
> {noformat}
> org.apache.pig.PigException: ERROR 1002: Unable to store alias b
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at 
> org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.RuntimeException: No task metrics available for jobId 0
> at 
> org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2

2024-04-16 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837860#comment-17837860
 ] 

Rohini Palaniswamy commented on PIG-5438:
-

+1

> Update SparkCounter.Accumulator to AccumulatorV2
> 
>
> Key: PIG-5438
> URL: https://issues.apache.org/jira/browse/PIG-5438
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5438-v01.patch
>
>
> Original Accumulator is deprecated in Spark2 and gone in Spark3.  
> AccumulatorV2 is usable on both Spark2 and Spark3. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5453) FLATTEN shifting fields incorrectly

2024-04-15 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5453:
-

 Summary: FLATTEN shifting fields incorrectly
 Key: PIG-5453
 URL: https://issues.apache.org/jira/browse/PIG-5453
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Koji Noguchi
Assignee: Koji Noguchi


Follow up from PIG-5201, PIG-5452.  

When flatten-ed tuple has less or more fields than specified, entire fields 
shift incorrectly. 

Input
{noformat}
A       (a,b,c)
B       (a,b,c)
C       (a,b,c)
Y       (a,b)
Z       (a,b,c,d,e,f)
E{noformat}
Script
{code:java}
A = load 'input.txt' as (a1:chararray, a2:tuple());
B = FOREACH A GENERATE a1, FLATTEN(a2) as 
(b1:chararray,b2:chararray,b3:chararray), a1 as a4;
dump B; {code}
Incorrect results
{noformat}
(A,a,b,c,A)
(B,a,b,c,B)
(C,a,b,c,C)
(Y,a,b,Y,)
(Z,a,b,c,d)
(EE){noformat}

E is correct.  It's fixed as part of PIG-5201, PIG-5452.
Y has shifted a4(Y) to the left incorrectly.  
Should have been (Y,a,b,,Y)
Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
Should have been (Z,a,b,c,Z).



 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)

2024-04-15 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5452:
--
Description: 
Follow up from PIG-5201, 
{code:java}
A = load 'input' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as 
a3;
C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
dump C;{code}
This produces right number of nulls.
{code:java}
(a,,,a)
(b,,,b)
(c,,,c)
(d,,,d)
(f,,,f) {code}
 

However, 
{code:java}
A = load 'input.txt' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
dump C;{code}
This produces wrong number of null and the output is shifted incorrectly. 
{code:java}
(a,,a,)
(b,,b,)
(c,,c,)
(d,,d,)
(f,,f,) {code}
Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
tuple() with empty inner fields but with user defined schema of "as 
(A1:chararray, A2:chararray)". 

 

  was:
Follow up from PIG-5201, 
{code:java}
A = load 'input' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as 
a3;
C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
dump C;{code}
This produces right number of nulls.


{code:java}
(a,,,a)
(b,,,b)
(c,,,c)
(d,,,d)
(f,,,f) {code}
 

However, 
{code:java}
A = load 'input.txt' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
dump C;{code}
This produces wrong number of null and the output is shifted incorrectly. 
{code:java}
(a,,a,)
(b,,b,)
(c,,c,)
(d,,d,)
(f,,f,) {code}
Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
tuple() with empty inner fields.

 


> Null handling of FLATTEN with user defined schema (as clause)
> -
>
> Key: PIG-5452
>     URL: https://issues.apache.org/jira/browse/PIG-5452
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5452-v01.patch
>
>
> Follow up from PIG-5201, 
> {code:java}
> A = load 'input' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 
> as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
> dump C;{code}
> This produces right number of nulls.
> {code:java}
> (a,,,a)
> (b,,,b)
> (c,,,c)
> (d,,,d)
> (f,,,f) {code}
>  
> However, 
> {code:java}
> A = load 'input.txt' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
> dump C;{code}
> This produces wrong number of null and the output is shifted incorrectly. 
> {code:java}
> (a,,a,)
> (b,,b,)
> (c,,c,)
> (d,,d,)
> (f,,f,) {code}
> Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
> tuple() with empty inner fields but with user defined schema of "as 
> (A1:chararray, A2:chararray)". 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)

2024-04-12 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5452:
--
Attachment: pig-5452-v01.patch

Instead of relying on innerfield schema, using the output schema which combines 
schema of data and user-defined schema.

> Null handling of FLATTEN with user defined schema (as clause)
> -
>
> Key: PIG-5452
> URL: https://issues.apache.org/jira/browse/PIG-5452
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5452-v01.patch
>
>
> Follow up from PIG-5201, 
> {code:java}
> A = load 'input' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 
> as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
> dump C;{code}
> This produces right number of nulls.
> {code:java}
> (a,,,a)
> (b,,,b)
> (c,,,c)
> (d,,,d)
> (f,,,f) {code}
>  
> However, 
> {code:java}
> A = load 'input.txt' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
> dump C;{code}
> This produces wrong number of null and the output is shifted incorrectly. 
> {code:java}
> (a,,a,)
> (b,,b,)
> (c,,c,)
> (d,,d,)
> (f,,f,) {code}
> Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
> tuple() with empty inner fields.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)

2024-04-12 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5452:
-

 Summary: Null handling of FLATTEN with user defined schema (as 
clause)
 Key: PIG-5452
 URL: https://issues.apache.org/jira/browse/PIG-5452
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi


Follow up from PIG-5201, 
{code:java}
A = load 'input' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as 
a3;
C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
dump C;{code}
This produces right number of nulls.


{code:java}
(a,,,a)
(b,,,b)
(c,,,c)
(d,,,d)
(f,,,f) {code}
 

However, 
{code:java}
A = load 'input.txt' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
dump C;{code}
This produces wrong number of null and the output is shifted incorrectly. 
{code:java}
(a,,a,)
(b,,b,)
(c,,c,)
(d,,d,)
(f,,f,) {code}
Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
tuple() with empty inner fields.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"

2024-04-12 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5416:
-

Assignee: Koji Noguchi

> Spark unit tests failing randomly with "java.lang.RuntimeException: 
> Unexpected job execution status RUNNING"
> 
>
> Key: PIG-5416
> URL: https://issues.apache.org/jira/browse/PIG-5416
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5416-v01.patch
>
>
> Spark unit tests fail randomly with same errors. 
>  Sample stack trace showing "Caused by: java.lang.RuntimeException: 
> Unexpected job execution status RUNNING".
> {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF}
> Unable to store alias B
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias B
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:721)
> at 
> org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.execute(PigServer.java:1453)
> at org.apache.pig.PigServer.access$500(PigServer.java:119)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing

2024-03-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832323#comment-17832323
 ] 

Koji Noguchi commented on PIG-5451:
---

This was caused by conflict of orc.version.  

./build/ivy/lib/Pig/orc-core-1.5.6.jar
./lib/h3/orc-core-1.5.6.jar

and

spark/jars/orc-core-1.6.14.jar

> Pig-on-Spark3 E2E Orc_Pushdown_5 failing 
> -
>
> Key: PIG-5451
> URL: https://issues.apache.org/jira/browse/PIG-5451
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
>
> Test failing with
> "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate 
> cannot access its superclass org.threeten.extra.chrono.AbstractDate"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing

2024-03-29 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5451:
-

 Summary: Pig-on-Spark3 E2E Orc_Pushdown_5 failing 
 Key: PIG-5451
 URL: https://issues.apache.org/jira/browse/PIG-5451
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi


Test failing with
"java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate 
cannot access its superclass org.threeten.extra.chrono.AbstractDate"






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing

2024-03-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832320#comment-17832320
 ] 

Koji Noguchi commented on PIG-5451:
---

Full stack trace.
{noformat}
2024-03-29 10:57:31,787 [dag-scheduler-event-loop] INFO 
org.apache.spark.scheduler.DAGScheduler - ResultStage 3 (runJob at 
SparkHadoopWriter.scala:83) failed in 36.126 s due to Job aborted due to stage 
failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 
in stage 3.0 (TID 8) (gsrd479n10.red.ygrid.yahoo.com executor 4): 
java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate cannot 
access its superclass org.threeten.extra.chrono.AbstractDate
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at 
org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:46)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.threeten.extra.chrono.HybridChronology.date(HybridChronology.java:235)
at org.threeten.extra.chrono.HybridChronology.date(HybridChronology.java:88)
at java.time.chrono.AbstractChronology.resolveYMD(AbstractChronology.java:563)
at java.time.chrono.AbstractChronology.resolveDate(AbstractChronology.java:472)
at 
org.threeten.extra.chrono.HybridChronology.resolveDate(HybridChronology.java:452)
at 
org.threeten.extra.chrono.HybridChronology.resolveDate(HybridChronology.java:88)
at java.time.format.Parsed.resolveDateFields(Parsed.java:351)
at java.time.format.Parsed.resolveFields(Parsed.java:257)
at java.time.format.Parsed.resolve(Parsed.java:244)
at 
java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331)
at 
java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1955)
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777)
at org.apache.orc.impl.DateUtils._clinit_(DateUtils.java:74)
at 
org.apache.orc.impl.ColumnStatisticsImpl$TimestampStatisticsImpl._init_(ColumnStatisticsImpl.java:1683)
at 
org.apache.orc.impl.ColumnStatisticsImpl.deserialize(ColumnStatisticsImpl.java:2131)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:522)
at 
org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:1045)
at 
org.apache.orc.impl.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:1117)
at org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1137)
at 
org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1187)
at 
org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1222)
at org.apache.orc.impl.RecordReaderImpl._init_(RecordReaderImpl.java:254)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl._init_(RecordReaderImpl.java:67)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:83)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:337)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat$OrcRecordReader._init_(OrcNewInputFormat.java:72)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.createRecordReader(OrcNewInputFormat.java:57)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:255)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader._init_(PigRecordReader.java:126)
at 
org.apache.pig.backend.hadoop.executionengine.spark.SparkPigRecordReader._init_(SparkPigRecordReader.java:44)
at 
org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark$SparkRecordReaderFactory.createRecordReader(PigInputFormatSpark.java:131)
at 
org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark.createRecordReader(PigInputFormatSpark.java:71)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:215)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1._init_(NewHadoopRDD.scala:213)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337

[jira] [Updated] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-03-29 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5450:
--
Attachment: pig-5450-v01.patch

It turns out the weird error was coming from conflicting jar. 
{{./build/ivy/lib/Pig/hive-storage-api-2.7.0.jar}}
and
{{spark/spark/jars/hive-storage-api-2.7.2.jar}}

Uploading a patch updating hive-storage-api version.

> Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
> --
>
> Key: PIG-5450
> URL: https://issues.apache.org/jira/browse/PIG-5450
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5450-v01.patch
>
>
> {noformat}
> Caused by: java.lang.VerifyError: Bad return type
> Exception Details:
> Location:
> org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
>  @117: areturn
> Reason:
> Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-03-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832318#comment-17832318
 ] 

Koji Noguchi commented on PIG-5450:
---

Weird full trace.
{noformat}
024-03-27 10:50:40,088 [task-result-getter-0] WARN 
org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0) 
(gsrd238n05.red.ygrid.yahoo.com executor 1): org.apache.spark.SparkException: 
Task failed while writing rows
at 
org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:163)
at 
org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.VerifyError: Bad return type
Exception Details:
Location:
org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
 @117: areturn
Reason:
Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
stack[0]) is not assignable to 
'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
Current Frame:
bci: @117
flags: { }
locals: { 'org/apache/orc/TypeDescription', 
'org/apache/orc/TypeDescription$RowBatchVersion', integer }
stack: { 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' }
Bytecode:
0x000: b200 022a b600 03b6 0004 2eaa  0181
0x010:  0001  0013  0059  0059
0x020:  0059  0059  0059  0062
0x030:  006b  006b  0074  0074
0x040:  007d  00ad  00ad  00ad
0x050:  00ad  00b6  00f7  0138
0x060:  0155 bb00 0559 1cb7 0006 b0bb 0007
0x070: 591c b700 08b0 bb00 0959 1cb7 000a b0bb
0x080: 000b 591c b700 0cb0 2ab6 000d 3e2a b600
0x090: 0e36 042b b200 0fa5 0009 1d10 12a4 000f
0x0a0: bb00 1159 1c1d 1504 b700 12b0 bb00 1359
0x0b0: 1c1d 1504 b700 14b0 bb00 1559 1cb7 0016
0x0c0: b02a b600 174e 2db9 0018 0100 bd00 193a
0x0d0: 0403 3605 1505 1904 bea2 001e 1904 1505
0x0e0: 2d15 05b9 001a 0200 c000 102b 1cb8 001b
0x0f0: 5384 0501 a7ff e0bb 001c 591c 1904 b700
0x100: 1db0 2ab6 0017 4e2d b900 1801 00bd 0019
0x110: 3a04 0336 0515 0519 04be a200 1e19 0415
0x120: 052d 1505 b900 1a02 00c0 0010 2b1c b800
0x130: 1b53 8405 01a7 ffe0 bb00 1e59 1c19 04b7
0x140: 001f b02a b600 174e bb00 2059 1c2d 03b9
0x150: 001a 0200 c000 102b 1cb8 001b b700 21b0
0x160: 2ab6 0017 4ebb 0022 591c 2d03 b900 1a02
0x170: 00c0 0010 2b1c b800 1b2d 04b9 001a 0200
0x180: c000 102b 1cb8 001b b700 23b0 bb00 2459
0x190: bb00 2559 b700 2612 27b6 0028 2ab6 0003
0x1a0: b600 29b6 002a b700 2bbf
Stackmap Table:
same_frame_extended(@100)
same_frame(@109)
same_frame(@118)
same_frame(@127)
same_frame(@136)
append_frame(@160,Integer,Integer)
same_frame(@172)
chop_frame(@184,2)
same_frame(@193)
append_frame(@212,Object[_75],Object[_76],Integer)
chop_frame(@247,1)
chop_frame(@258,2)
append_frame(@277,Object[_75],Object[_76],Integer)
chop_frame(@312,1)
chop_frame(@323,2)
same_frame(@352)
same_frame(@396)

at org.apache.orc.TypeDescription.createRowBatch(TypeDescription.java:483)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl._init_(WriterImpl.java:100)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:334)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:51)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:37)
at org.apache.pig.builtin.OrcStorage.putNext(OrcStorage.java:249)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:146)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at 
org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.write(SparkHadoopWriter.scala:368)
at 
org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:138)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
at 
org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:135

[jira] [Created] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-03-29 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5450:
-

 Summary: Pig-on-Spark3 E2E ORC test failing with 
java.lang.VerifyError: Bad return type
 Key: PIG-5450
 URL: https://issues.apache.org/jira/browse/PIG-5450
 Project: Pig
  Issue Type: Bug
  Components: spark
Reporter: Koji Noguchi
Assignee: Koji Noguchi


{noformat}
Caused by: java.lang.VerifyError: Bad return type
Exception Details:
Location:
org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
 @117: areturn
Reason:
Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
stack[0]) is not assignable to 
'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
 {noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5410) Support Python 3 for streaming_python

2024-03-29 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5410:
--
Attachment: pig-5410-v02.patch

Testing the patch, it was failing with
{noformat}
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File 
"/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py",
 line 365
WRAPPED_MAP_END)
^
SyntaxError: invalid syntax
{noformat}
it seems like the patch was missing a '+'.   Uploading a new patch with '+'.  



> Support Python 3 for streaming_python
> -
>
> Key: PIG-5410
> URL: https://issues.apache.org/jira/browse/PIG-5410
> Project: Pig
>  Issue Type: New Feature
>Reporter: Rohini Palaniswamy
>Assignee: Venkatasubrahmanian Narayanan
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5410.patch, pig-5410-v02.patch
>
>
> Python 3 is incompatible with Python 2. We need to make it work with both. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (PIG-5410) Support Python 3 for streaming_python

2024-03-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832317#comment-17832317
 ] 

Koji Noguchi edited comment on PIG-5410 at 3/29/24 9:10 PM:


Testing the patch, it was failing with
{noformat}
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File 
"/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py",
 line 365
WRAPPED_MAP_END)
^
SyntaxError: invalid syntax
{noformat}
it seems like the patch was missing a '+'. Uploading a new patch.


was (Author: knoguchi):
Testing the patch, it was failing with
{noformat}
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File 
"/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py",
 line 365
WRAPPED_MAP_END)
^
SyntaxError: invalid syntax
{noformat}
it seems like the patch was missing a '+'.   Uploading a new patch with '+'.  



> Support Python 3 for streaming_python
> -
>
> Key: PIG-5410
> URL: https://issues.apache.org/jira/browse/PIG-5410
> Project: Pig
>  Issue Type: New Feature
>Reporter: Rohini Palaniswamy
>Assignee: Venkatasubrahmanian Narayanan
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5410.patch, pig-5410-v02.patch
>
>
> Python 3 is incompatible with Python 2. We need to make it work with both. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3

2024-03-22 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5449:
--
Attachment: pig-5449-v01.patch

Before (in spark2 land), this used to work by checking empty list returned by 
getjobIDs.
https://github.com/apache/pig/blob/branch-0.17/src/org/apache/pig/backend/hadoop/executionengine/spark/JobGraphBuilder.java#L210-L219

But with spark3, this is returning actual jobid but no metrics stored behind.  

Instead of adding another logic for spark3, I think we can treat metrics 
retrieval as optional like we do in mapreduce & tez.Attaching a patch. 
(pig-5449-v01.patch)

> TestEmptyInputDir failing on pig-on-spark3
> --
>
> Key: PIG-5449
> URL: https://issues.apache.org/jira/browse/PIG-5449
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5449-v01.patch
>
>
> TestEmptyInputDir failing on pig-on-spark3 with 
> {noformat:title=TestEmptyInputDir.testMergeJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141)
> {noformat}
> {noformat:title=TestEmptyInputDir.testGroupByFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80)
> {noformat}
> {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297)
> {noformat}
> {noformat:title=TestEmptyInputDir.testFRJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171)
> {noformat}
> {noformat:title=TestEmptyInputDir.testBloomJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267)
>  {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3

2024-03-22 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5449:
-

 Summary: TestEmptyInputDir failing on pig-on-spark3
 Key: PIG-5449
 URL: https://issues.apache.org/jira/browse/PIG-5449
 Project: Pig
  Issue Type: Bug
  Components: spark
Reporter: Koji Noguchi
Assignee: Koji Noguchi


TestEmptyInputDir failing on pig-on-spark3 with 
{noformat:title=TestEmptyInputDir.testMergeJoinFailure}
junit.framework.AssertionFailedError
at 
org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141)
{noformat}
{noformat:title=TestEmptyInputDir.testGroupByFailure}
junit.framework.AssertionFailedError
at org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80)
{noformat}
{noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure}
junit.framework.AssertionFailedError
at 
org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297)
{noformat}
{noformat:title=TestEmptyInputDir.testFRJoinFailure}
junit.framework.AssertionFailedError
at org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171)
{noformat}
{noformat:title=TestEmptyInputDir.testBloomJoinFailure}
junit.framework.AssertionFailedError
at 
org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267) 
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3

2024-03-19 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5448:
--
Attachment: pig-5448-v01.patch

{quote}No task metrics available for jobId 0
{quote}
This is actually failing because Pig is succeeding without running anything. 
Looking further, found out that Spark is filtering out all input splits and 
reporting successful empty job results with no metrics.

Setting a flag so that Spark would not ignore PigSplit which looks empty but 
still have (non-hdfs) inputs. (pig-5448-v01.patch)

> All TestHBaseStorage tests failing on pig-on-spark3
> ---
>
> Key: PIG-5448
> URL: https://issues.apache.org/jira/browse/PIG-5448
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5448-v01.patch
>
>
> For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are 
> failing with 
> {noformat}
> org.apache.pig.PigException: ERROR 1002: Unable to store alias b
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at 
> org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.RuntimeException: No task metrics available for jobId 0
> at 
> org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5439) Support Spark 3 and drop SparkShim

2024-03-19 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5439:
--
Attachment: pig-5439-v02.patch

Adding missing spark-scala.version. (pig-5439-v02.patch)

> Support Spark 3 and drop SparkShim
> --
>
> Key: PIG-5439
> URL: https://issues.apache.org/jira/browse/PIG-5439
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5439-v01.patch, pig-5439-v02.patch
>
>
> Support Pig-on-Spark to run on spark3. 
> Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. 
> This is due to log4j mismatch. 
> After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher.
> So far, not all unit/e2e tests pass with the proposed patch but at least 
> compilation goes through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3

2024-03-19 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5448:
-

 Summary: All TestHBaseStorage tests failing on pig-on-spark3
 Key: PIG-5448
 URL: https://issues.apache.org/jira/browse/PIG-5448
 Project: Pig
  Issue Type: Bug
  Components: spark
Reporter: Koji Noguchi
Assignee: Koji Noguchi


For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are 
failing with 
{noformat}
org.apache.pig.PigException: ERROR 1002: Unable to store alias b
at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
at org.apache.pig.PigServer.store(PigServer.java:1086)
at 
org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251)
Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get the 
rdds of this spark operator:
at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
at 
org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
at 
org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
at 
org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
Caused by: java.lang.RuntimeException: No task metrics available for jobId 0
at 
org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109)
at 
org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77)
at 
org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73)
at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing

2024-03-13 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826791#comment-17826791
 ] 

Rohini Palaniswamy commented on PIG-5446:
-

+1

> Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
> ---
>
> Key: PIG-5446
> URL: https://issues.apache.org/jira/browse/PIG-5446
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5446-v01.patch
>
>
> {noformat}
> Unable to open iterator for alias B. Backend error : Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, 
> vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, 
> taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Attempt failed because it appears to make no progress for 
> 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms]], Vertex did not succeed due to 
> OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at org.apache.pig.PigServer.openIterator(PigServer.java:1014)
> at 
> org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58)
> Caused by: org.apache.tez.dag.api.TezException: Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at 
> org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 45.647 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"

2024-03-13 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826790#comment-17826790
 ] 

Rohini Palaniswamy commented on PIG-5416:
-

+1

> Spark unit tests failing randomly with "java.lang.RuntimeException: 
> Unexpected job execution status RUNNING"
> 
>
> Key: PIG-5416
> URL: https://issues.apache.org/jira/browse/PIG-5416
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: pig-5416-v01.patch
>
>
> Spark unit tests fail randomly with same errors. 
>  Sample stack trace showing "Caused by: java.lang.RuntimeException: 
> Unexpected job execution status RUNNING".
> {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF}
> Unable to store alias B
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias B
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:721)
> at 
> org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.execute(PigServer.java:1453)
> at org.apache.pig.PigServer.access$500(PigServer.java:119)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-03-13 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826789#comment-17826789
 ] 

Rohini Palaniswamy commented on PIG-5447:
-

+1

> Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with 
> NoSuchElementException
> ---
>
> Key: PIG-5447
> URL: https://issues.apache.org/jira/browse/PIG-5447
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5447-v01.patch
>
>
> TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer 
> and full-outer joins.
> "Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-03-06 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5447:
--
Attachment: pig-5447-v01.patch

There is no simple way to implement hasNext() for this implementation.  I think 
iterator is not the right fit for this but I prefer not to touch the logic.   
Here, writing a hacked iterator which basically calls next() within hasNext and 
caches the result. 
pig-5447-v01.patch

> Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with 
> NoSuchElementException
> ---
>
> Key: PIG-5447
> URL: https://issues.apache.org/jira/browse/PIG-5447
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5447-v01.patch
>
>
> TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer 
> and full-outer joins.
> "Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-03-06 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824136#comment-17824136
 ] 

Koji Noguchi commented on PIG-5447:
---

> However, inside {{{}next(){}}}, it sometimes recursively traverses the 
> delegated iterator by calling {{next()}} inside

>

This only happens when key is oversampled as described in PIG-4377.  Maybe 
that's why we were not seeing the failure elsewhere. 

> Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with 
> NoSuchElementException
> ---
>
> Key: PIG-5447
> URL: https://issues.apache.org/jira/browse/PIG-5447
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer 
> and full-outer joins.
> "Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-03-06 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824097#comment-17824097
 ] 

Koji Noguchi commented on PIG-5447:
---

I don't see how this ever worked.

Iterator under {{SkewedJoinConverter.ToValueFunction.Tuple2TransformIterable}} 
is NOT following the api requirement. 
{{hasNext()}} simply returns by checking the delegated iterator.
{quote}{{delegate.hasNext();}}
{quote}
However, inside {{{}next(){}}}, it sometimes recursively traverses the 
delegated iterator by calling {{next()}} inside. So even when {{hasNext()}} 
returns true, there are times when {{next()}} doesn't have an element to return 
and ending up with {{{}NoSuchElementException{}}}.

> Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with 
> NoSuchElementException
> ---
>
> Key: PIG-5447
> URL: https://issues.apache.org/jira/browse/PIG-5447
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer 
> and full-outer joins.
> "Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-03-06 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824093#comment-17824093
 ] 

Koji Noguchi commented on PIG-5447:
---

Full stack trace.

{noformat}
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias C. Backend error : Job aborted.
at org.apache.pig.PigServer.openIterator(PigServer.java:1014)
at 
org.apache.pig.test.TestSkewedJoin.testSkewedJoinOuter(TestSkewedJoin.java:386)
Caused by: org.apache.spark.SparkException: Job aborted.
at 
org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:100)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1000)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:991)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:991)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at 
org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:991)
at 
org.apache.pig.backend.hadoop.executionengine.spark.converter.StoreConverter.convert(StoreConverter.java:104)
at 
org.apache.pig.backend.hadoop.executionengine.spark.converter.StoreConverter.convert(StoreConverter.java:57)
at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:292)
at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:182)
at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
at 
org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
at 
org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
at 
org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
at org.apache.pig.PigServer.store(PigServer.java:1086)
at org.apache.pig.PigServer.openIterator(PigServer.java:999)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 94.0 failed 4 times, most recent failure: Lost task 1.3 in 
stage 94.0 (TID 436, gsrd238n19.red.ygrid.yahoo.com, executor 2): 
org.apache.spark.SparkException: Task failed while writing rows
at 
org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:157)
at 
org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
at 
org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.convert.Wrappers$IteratorWrapper.next

[jira] [Created] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-03-06 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5447:
-

 Summary: Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing 
with NoSuchElementException
 Key: PIG-5447
 URL: https://issues.apache.org/jira/browse/PIG-5447
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi


TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer and 
full-outer joins.

"Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"

2024-02-08 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815770#comment-17815770
 ] 

Koji Noguchi commented on PIG-5416:
---

Issue seems to be on the Spark side.  
For now, added a silly polling after "waitForJobToEnd" to double check that job 
is finished.

> Spark unit tests failing randomly with "java.lang.RuntimeException: 
> Unexpected job execution status RUNNING"
> 
>
> Key: PIG-5416
>     URL: https://issues.apache.org/jira/browse/PIG-5416
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: pig-5416-v01.patch
>
>
> Spark unit tests fail randomly with same errors. 
>  Sample stack trace showing "Caused by: java.lang.RuntimeException: 
> Unexpected job execution status RUNNING".
> {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF}
> Unable to store alias B
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias B
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:721)
> at 
> org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.execute(PigServer.java:1453)
> at org.apache.pig.PigServer.access$500(PigServer.java:119)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"

2024-02-08 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5416:
--
Attachment: pig-5416-v01.patch

> Spark unit tests failing randomly with "java.lang.RuntimeException: 
> Unexpected job execution status RUNNING"
> 
>
> Key: PIG-5416
> URL: https://issues.apache.org/jira/browse/PIG-5416
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: pig-5416-v01.patch
>
>
> Spark unit tests fail randomly with same errors. 
>  Sample stack trace showing "Caused by: java.lang.RuntimeException: 
> Unexpected job execution status RUNNING".
> {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF}
> Unable to store alias B
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias B
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:721)
> at 
> org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.execute(PigServer.java:1453)
> at org.apache.pig.PigServer.access$500(PigServer.java:119)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing

2024-02-06 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5446:
--
Attachment: pig-5446-v01.patch

> Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
> ---
>
> Key: PIG-5446
> URL: https://issues.apache.org/jira/browse/PIG-5446
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5446-v01.patch
>
>
> {noformat}
> Unable to open iterator for alias B. Backend error : Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, 
> vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, 
> taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Attempt failed because it appears to make no progress for 
> 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms]], Vertex did not succeed due to 
> OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at org.apache.pig.PigServer.openIterator(PigServer.java:1014)
> at 
> org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58)
> Caused by: org.apache.tez.dag.api.TezException: Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at 
> org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 45.647 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing

2024-02-06 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5446:
--

It seems like "reporter.progress()" is a no-op in Tez pig. 
Test started failing after upgrading the dependent Tez version.

PIG-4700 only enabled progress reporting after Tez 0.8.5 and after.

Attaching a patch that simply calls "context.notifyProgress" for every 
"reporter.progress()" calls. 

> Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
> ---
>
> Key: PIG-5446
>     URL: https://issues.apache.org/jira/browse/PIG-5446
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> {noformat}
> Unable to open iterator for alias B. Backend error : Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, 
> vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, 
> taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Attempt failed because it appears to make no progress for 
> 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms]], Vertex did not succeed due to 
> OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at org.apache.pig.PigServer.openIterator(PigServer.java:1014)
> at 
> org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58)
> Caused by: org.apache.tez.dag.api.TezException: Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at 
> org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 45.647 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing

2024-02-06 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814970#comment-17814970
 ] 

Koji Noguchi commented on PIG-5446:
---

From 
{code:title=testProgressReportingWithStatusMessage}
cluster.setProperty(MRConfiguration.TASK_TIMEOUT, "1");
...
pig.registerQuery("A = load 'a.txt' as (f1:chararray);");
pig.registerQuery("B = foreach A generate 
org.apache.pig.test.utils.ReportingUDF();");
{code}
{code:title=ReportingUDF()}
public class ReportingUDF extends EvalFunc {

@Override
public Integer exec(Tuple input) throws IOException {

try {
Thread.sleep(7500);
PigStatusReporter reporter = PigStatusReporter.getInstance();
reporter.progress();
Thread.sleep(7500);
} catch (InterruptedException e) {
}
return 100;
}
}
{code}

So basically, even though pig is calling "reporter.progress()" after 7.5 
seconds, Tez task is failing with "Attempt failed because it appears to make no 
progress for 1ms". 



> Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
> ---
>
> Key: PIG-5446
> URL: https://issues.apache.org/jira/browse/PIG-5446
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> {noformat}
> Unable to open iterator for alias B. Backend error : Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, 
> vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, 
> taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Attempt failed because it appears to make no progress for 
> 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms]], Vertex did not succeed due to 
> OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at org.apache.pig.PigServer.openIterator(PigServer.java:1014)
> at 
> org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58)
> Caused by: org.apache.tez.dag.api.TezException: Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at 
> org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 45.647 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing

2024-02-06 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5446:
-

 Summary: Tez 
TestPigProgressReporting.testProgressReportingWithStatusMessage failing
 Key: PIG-5446
 URL: https://issues.apache.org/jira/browse/PIG-5446
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Koji Noguchi
Assignee: Koji Noguchi


{noformat}
Unable to open iterator for alias B. Backend error : Vertex failed, 
vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task 
failed, taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 
failed, info=[Attempt failed because it appears to make no progress for 
1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to make 
no progress for 1ms]], Vertex did not succeed due to OWN_TASK_FAILURE, 
failedTasks:1 killedTasks:0, Vertex vertex_1707216362777_0001_1_00 [scope-4] 
killed/failed due to:OWN_TASK_FAILURE] DAG did not succeed due to 
VERTEX_FAILURE. failedVertices:1 killedVertices:0

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, 
vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, 
taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 failed, 
info=[Attempt failed because it appears to make no progress for 1ms], 
TaskAttempt 1 failed, info=[Attempt failed because it appears to make no 
progress for 1ms]], Vertex did not succeed due to OWN_TASK_FAILURE, 
failedTasks:1 killedTasks:0, Vertex vertex_1707216362777_0001_1_00 [scope-4] 
killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
at org.apache.pig.PigServer.openIterator(PigServer.java:1014)
at 
org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58)
Caused by: org.apache.tez.dag.api.TezException: Vertex failed, 
vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task 
failed, taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 
failed, info=[Attempt failed because it appears to make no progress for 
1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to make 
no progress for 1ms]], Vertex did not succeed due to OWN_TASK_FAILURE, 
failedTasks:1 killedTasks:0, Vertex vertex_1707216362777_0001_1_00 [scope-4] 
killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
at 
org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204)
at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
45.647 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5445) TestTezCompiler.testMergeCogroup fails whenever config is updated

2024-02-05 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5445:
--
Attachment: pig-5445-v01.patch

I have no understanding of how the cogroup are implemented, 
but checking MergeJoinIndexer.java 
{code:java}
 70     public MergeJoinIndexer(String funcSpec, String innerPlan, String 
serializedPhyPlan,
 71             String udfCntxtSignature, String scope, String ignoreNulls) 
throws ExecException{
 72
 73         loader = 
...
 82             precedingPhyPlan = 
(PhysicalPlan)ObjectSerializer.deserialize(serializedPhyPlan);
 83             if(precedingPhyPlan != null){
 84                     if(precedingPhyPlan.getLeaves().size() != 1 || 
precedingPhyPlan.getRoots().size() != 1){
 85                         int errCode = 2168;
 86                         String errMsg = "Expected physical plan with 
exactly one root and one leaf.";
 87                         throw new 
ExecException(errMsg,errCode,PigException.BUG);
 88                     }
 89                 this.rightPipelineLeaf = 
precedingPhyPlan.getLeaves().get(0);
 90                 this.rightPipelineRoot = precedingPhyPlan.getRoots().get(0);
 91                 this.rightPipelineRoot.setInputs(null); *
 92             }
 93         } {code}
MergeJoinIndexer is always overwriting the "inputs" with null.   This means 
"inputs" can be skipped at serialization time.   Attaching the patch 
(pig-5445-v01.patch) which does that.   Size of TEZC-MergeCogroup-1.gld was 
reduced by 5 with this patch since it no longer serialize PigContext and POLoad 
for MergeJoinIndexer.

> TestTezCompiler.testMergeCogroup fails whenever config is updated
> -
>
> Key: PIG-5445
>     URL: https://issues.apache.org/jira/browse/PIG-5445
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.19.0
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5445-v01.patch
>
>
> TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and 
> config that comes with it).
> {noformat}
> testMergeCogroupFailure
> expected:
> <|---a: 
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>   
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...>
>  
> but was:
> <|---a: 
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>   
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...>
> at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472)
> at 
> org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292) 
> {noformat}
> (edited the diff above a bit to make it easier to identify where the 
> difference was)
> Basically 3rd argument to MergeJoinIndexer differed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5445) TestTezCompiler.testMergeCogroup fails whenever config is updated

2024-02-05 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814456#comment-17814456
 ] 

Koji Noguchi commented on PIG-5445:
---

{quote}Basically 3rd argument to MergeJoinIndexer differed.
{quote}
This is a serializedPhyPlan passed to MergeJoinIndexer constructor.
{code:java}
/** @param funcSpec : Loader specification.
 *  @param innerPlan : This is serialized version of LR plan. We
 *  want to keep only keys in our index file and not the whole tuple. So, 
we need LR and thus its plan
 *  to get keys out of the sampled tuple.
 * @param serializedPhyPlan Serialized physical plan on right side.
 * @throws ExecException
 */
@SuppressWarnings("unchecked")
public MergeJoinIndexer(String funcSpec, String innerPlan, String 
serializedPhyPlan,
String udfCntxtSignature, String scope, String ignoreNulls) throws 
ExecException{
{code}
When deserializing both strings and printing out the physical plans, they both 
showed exact same physical plan
{noformat}
#---
# Physical Plan:
#---
a: New For Each(false,false)[bag] - scope-30
|   |
|   Cast[int] - scope-27
|   |
|   |---Project[bytearray][0] - scope-26
|   |
|   Cast[int] - scope-29
|   |
|   |---Project[bytearray][1] - scope-28
{noformat}
Comparing the serialized string and checking the memory dump, it turns out that 
difference came from 
POForeach from "a: New For Each" contains an "inputs" param pointing to POLoad 
which holds "PigContext pc". These POLoad and PigContext were serialized as 
part of the MergeJoinIndexer which caused the difference in goldenfile outputs 
whenever anything changed in the config (which is stored in the PigContext).

> TestTezCompiler.testMergeCogroup fails whenever config is updated
> -
>
> Key: PIG-5445
> URL: https://issues.apache.org/jira/browse/PIG-5445
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.19.0
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
>
> TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and 
> config that comes with it).
> {noformat}
> testMergeCogroupFailure
> expected:
> <|---a: 
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>   
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...>
>  
> but was:
> <|---a: 
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>   
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...>
> at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472)
> at 
> org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292) 
> {noformat}
> (edited the diff above a bit to make it easier to identify where the 
> difference was)
> Basically 3rd argument to MergeJoinIndexer differed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5445) TestTezCompiler.testMergeCogroup fails whenever config is updated

2024-02-05 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5445:
-

 Summary: TestTezCompiler.testMergeCogroup fails whenever config is 
updated
 Key: PIG-5445
 URL: https://issues.apache.org/jira/browse/PIG-5445
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.19.0
Reporter: Koji Noguchi
Assignee: Koji Noguchi


TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and 
config that comes with it).
{noformat}
testMergeCogroupFailure
expected:
<|---a: 
Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
  
pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...>
 
but was:
<|---a: 
Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
  
pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...>
at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472)
at 
org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292) 
{noformat}
(edited the diff above a bit to make it easier to identify where the difference 
was)

Basically 3rd argument to MergeJoinIndexer differed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2024-01-18 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5444:
--
Attachment: pig-5444-v03.patch

pig-5444-v02.patch worked on the failing test but it would not pick up the 
Split in right order if there were some additional splitees.  Attaching 
pig-5444-v03.patch which traverses any parent Splits. 

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
> Key: PIG-5444
> URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5444-v02.patch, pig-5444-v03.patch
>
>
> With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
> testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin 
> is run, these two tests on Tez are failing with
> {noformat}
> Unable to open iterator for alias E
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias E
> at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
> at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at org.apache.pig.PigServer.openIterator(PigServer.java:999)
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
> [scope-628 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> 
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> 
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2024-01-11 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5444:
--
Attachment: pig-5444-v02.patch

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
> Key: PIG-5444
> URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5444-v02.patch
>
>
> With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
> testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin 
> is run, these two tests on Tez are failing with
> {noformat}
> Unable to open iterator for alias E
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias E
> at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
> at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at org.apache.pig.PigServer.openIterator(PigServer.java:999)
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
> [scope-628 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> 
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> 
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2024-01-11 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5444:
--
Attachment: (was: pig-5444-v02.patch)

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
> Key: PIG-5444
> URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
> testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin 
> is run, these two tests on Tez are failing with
> {noformat}
> Unable to open iterator for alias E
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias E
> at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
> at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at org.apache.pig.PigServer.openIterator(PigServer.java:999)
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
> [scope-628 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> 
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> 
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2024-01-11 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5444:
--
Attachment: (was: pig-5444-v01.patch)

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
> Key: PIG-5444
> URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5444-v02.patch
>
>
> With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
> testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin 
> is run, these two tests on Tez are failing with
> {noformat}
> Unable to open iterator for alias E
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias E
> at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
> at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at org.apache.pig.PigServer.openIterator(PigServer.java:999)
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
> [scope-628 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> 
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> 
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2024-01-11 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5444:
--
Attachment: pig-5444-v02.patch

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
> Key: PIG-5444
> URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5444-v02.patch
>
>
> With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
> testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin 
> is run, these two tests on Tez are failing with
> {noformat}
> Unable to open iterator for alias E
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias E
> at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
> at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at org.apache.pig.PigServer.openIterator(PigServer.java:999)
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
> [scope-628 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> 
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> 
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2024-01-11 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5444:
--
Attachment: pig-5444-v01.patch

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
> Key: PIG-5444
> URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5444-v01.patch
>
>
> With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
> testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin 
> is run, these two tests on Tez are failing with
> {noformat}
> Unable to open iterator for alias E
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias E
> at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
> at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at org.apache.pig.PigServer.openIterator(PigServer.java:999)
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
> [scope-628 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> 
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> 
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2024-01-11 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5444:
--

Attaching the patch which basically does two things.

> [~rohini] , should I add a predecessor check to prevent this merge ?
> 
(1) Added a check to make sure there are no overlap on predecessors among all 
the "tentativeSuccessors". 

(2) Introduce a new walker which considers the dependency order of nodes' 
successors. 

For the example from this jira, 
{noformat}
               B5(input)
               / \
     -6   7
    /            /
   /  A1(input) /
   \ / \       /
    2   3--
     \ /
      4(shuffle->out)

{noformat}
2 and 3 are Joins.  Now, with the previous ReverseDependencyOrderWalker, we do 
not consider the dependency order of nodes' successors,  therefore visit order 
of A1 and B5 are indeterministic.    With the new 
ReverseSuccessorsDependencyOrderWalker, A1 will always get visited before B5. 

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
> Key: PIG-5444
>     URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5444-v01.patch
>
>
> With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
> testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin 
> is run, these two tests on Tez are failing with
> {noformat}
> Unable to open iterator for alias E
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias E
> at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
> at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at org.apache.pig.PigServer.openIterator(PigServer.java:999)
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
> [scope-628 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> 
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> 
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2023-11-14 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786019#comment-17786019
 ] 

Koji Noguchi commented on PIG-5444:
---

This is how it looks when testFRJoinOut8 is run by itself when 
MultiQueryOptimizerTez happen to work on A before B.   Note POValueOutputTez on 
B is there to prevent the overlapping edges.
{code:java}
Tez vertex scope-48
# Plan on vertex
B: Split - scope-61
|   |
|   Local Rearrange[tuple]{int}(false) - scope-27       ->       scope-44
|   |   |
|   |   Project[int][0] - scope-23
|   |
|   POValueOutputTez - scope-49 ->       [scope-54]
|
|---B: New For Each(false,false)[bag] - scope-18
    |   |
    |   Cast[int] - scope-13
    |   |
    |   |---Project[bytearray][0] - scope-12
    |   |
    |   Cast[int] - scope-16
    |   |
    |   |---Project[bytearray][1] - scope-15
    |
    |---B: 
Load(hdfs://localhost:38814/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage)
 - scope-11
Tez vertex scope-54
# Plan on vertex
Local Rearrange[tuple]{int}(false) - scope-39   ->       scope-44
|   |
|   Project[int][1] - scope-35
|
|---POValueInputTez - scope-55  <-       scope-48
Tez vertex scope-44
# Plan on vertex
A: Split - scope-60
|   |
|   E: 
Store(hdfs://localhost:38814/tmp/temp-1966813510/tmp-652837441:org.apache.pig.impl.io.InterStorage)
 - scope-62   ->       scope-43
|   |
|   |---D: FRJoin[tuple] - scope-36     <-       scope-54
|       |   |
|       |   Project[int][1] - scope-34
|       |   |
|       |   Project[int][1] - scope-35
|   |
|   E: 
Store(hdfs://localhost:38814/tmp/temp-1966813510/tmp-652837441:org.apache.pig.impl.io.InterStorage)
 - scope-63   ->       scope-43
|   |
|   |---C: FRJoin[tuple] - scope-24     <-       scope-48
|       |   |
|       |   Project[int][0] - scope-22
|       |   |
|       |   Project[int][0] - scope-23
|
|---A: New For Each(false,false)[bag] - scope-7
    |   |
    |   Cast[int] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[int] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |
    |---A: 
Load(hdfs://localhost:38814/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage)
 - scope-0
 {code}

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
> Key: PIG-5444
>     URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
> testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin 
> is run, these two tests on Tez are failing with
> {noformat}
> Unable to open iterator for alias E
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias E
> at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
> at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at org.apache.pig.PigServer.openIterator(PigServer.java:999)
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
> [scope-628 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> 
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> 
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
> at 
> org.a

[jira] [Commented] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2023-11-13 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785674#comment-17785674
 ] 

Koji Noguchi commented on PIG-5444:
---

[~rohini] , should I add a predecessor check to prevent this merge ?

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
> Key: PIG-5444
> URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
> testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin 
> is run, these two tests on Tez are failing with
> {noformat}
> Unable to open iterator for alias E
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias E
> at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
> at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
> Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at org.apache.pig.PigServer.openIterator(PigServer.java:999)
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 
> 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
> [scope-628 : 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
> BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> 
> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> 
> NullEdgeManager }) already defined!
> at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2023-11-13 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785673#comment-17785673
 ] 

Koji Noguchi edited comment on PIG-5444 at 11/13/23 9:44 PM:
-

Issue seems to be happening inside MultiQueryOptimizerTez.java.

Before the change
{noformat}
Tez vertex scope-132
# Plan on vertex
POValueOutputTez - scope-133->   [scope-134, scope-140]
|
|---A: New For Each(false,false)[bag] - scope-95
|   |
|   Cast[int] - scope-90
|   |
|   |---Project[bytearray][0] - scope-89
|   |
|   Cast[int] - scope-93
|   |
|   |---Project[bytearray][1] - scope-92
|
|---A: 
Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage)
 - scope-88
Tez vertex scope-136
# Plan on vertex
B: Split - scope-148
|   |
|   Local Rearrange[tuple]{int}(false) - scope-115  ->   scope-134
|   |   |
|   |   Project[int][0] - scope-111
|   |
|   Local Rearrange[tuple]{int}(false) - scope-127  ->   scope-140
|   |   |
|   |   Project[int][1] - scope-123
|
|---B: New For Each(false,false)[bag] - scope-106
|   |
|   Cast[int] - scope-101
|   |
|   |---Project[bytearray][0] - scope-100
|   |
|   Cast[int] - scope-104
|   |
|   |---Project[bytearray][1] - scope-103
|
|---B: 
Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage)
 - scope-99
Tez vertex scope-134
# Plan on vertex
POValueOutputTez - scope-146->   [scope-144]
|
|---C: FRJoin[tuple] - scope-112<-   scope-136
|   |
|   Project[int][0] - scope-110
|   |
|   Project[int][0] - scope-111
|
|---POValueInputTez - scope-135 <-   scope-132
Tez vertex scope-140
# Plan on vertex
POValueOutputTez - scope-147->   [scope-144]
|
|---D: FRJoin[tuple] - scope-124<-   scope-136
|   |
|   Project[int][1] - scope-122
|   |
|   Project[int][1] - scope-123
|
|---POValueInputTez - scope-141 <-   scope-132
Tez vertex scope-144
# Plan on vertex
E: 
Store(hdfs://localhost:39746/tmp/temp906575730/tmp1776475591:org.apache.pig.impl.io.InterStorage)
 - scope-131
|
|---POShuffledValueInputTez - scope-145 <-   [scope-134, scope-140]

{noformat}
 

After MultiQueryOptimizerTez::visitTezOp 
{code}
 // If all other conditions were satisfied, but it had a successor union
// with unsupported storefunc keep it in the tentative list. 
{code}
and decides to merge scope-134 and scope-140, 

{noformat}
Tez vertex scope-136
# Plan on vertex
B: Split - scope-148
|   |
|   Local Rearrange[tuple]{int}(false) - scope-115  ->   scope-132
|   |   |
|   |   Project[int][0] - scope-111
|   |
|   Local Rearrange[tuple]{int}(false) - scope-127  ->   scope-132
|   |   |
|   |   Project[int][1] - scope-123
|
|---B: New For Each(false,false)[bag] - scope-106
|   |
|   Cast[int] - scope-101
|   |
|   |---Project[bytearray][0] - scope-100
|   |
|   Cast[int] - scope-104
|   |
|   |---Project[bytearray][1] - scope-103
|
|---B: 
Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage)
 - scope-99
Tez vertex scope-132
# Plan on vertex
POValueOutputTez - scope-133->   []
|
|---A: New For Each(false,false)[bag] - scope-95
|   |
|   Cast[int] - scope-90
|   |
|   |---Project[bytearray][0] - scope-89
|   |
|   Cast[int] - scope-93
|   |
|   |---Project[bytearray][1] - scope-92
|
|---A: 
Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage)
 - scope-88
Tez vertex scope-144
# Plan on vertex
E: 
Store(hdfs://localhost:39746/tmp/temp906575730/tmp1776475591:org.apache.pig.impl.io.InterStorage)
 - scope-131
|
|---POShuffledValueInputTez - scope-145 <-   [scope-132]

{noformat}

This later fails with 
{panel}
Caused by: java.lang.IllegalArgumentException: Edge [scope-136 : 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
[scope-132 : 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] (\{ 
BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED 
>> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager 
}) already defined!
{panel}


was (Author: knoguchi):
Issue seems to be happening inside MultiQueryOptimizerTez.java.

Before the change
{noformat}
Tez vertex scope-132
# Plan on vertex
POValueOutputTez - scope-133->   [scope-134, scope-140]
|
|---A: New For Each(false,false)[bag] - scope-95
|   |
|   Cast[int] - scope-90
|   |
|   |---Project[bytearray][0] - scope-89
|   |
|   Cast[int] - scope-93
|   |
|   |---Project[bytearray][1] - 

[jira] [Commented] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2023-11-13 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785673#comment-17785673
 ] 

Koji Noguchi commented on PIG-5444:
---

Issue seems to be happening inside MultiQueryOptimizerTez.java.

Before the change
{noformat}
Tez vertex scope-132
# Plan on vertex
POValueOutputTez - scope-133->   [scope-134, scope-140]
|
|---A: New For Each(false,false)[bag] - scope-95
|   |
|   Cast[int] - scope-90
|   |
|   |---Project[bytearray][0] - scope-89
|   |
|   Cast[int] - scope-93
|   |
|   |---Project[bytearray][1] - scope-92
|
|---A: 
Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage)
 - scope-88
Tez vertex scope-136
# Plan on vertex
B: Split - scope-148
|   |
|   Local Rearrange[tuple]{int}(false) - scope-115  ->   scope-134
|   |   |
|   |   Project[int][0] - scope-111
|   |
|   Local Rearrange[tuple]{int}(false) - scope-127  ->   scope-140
|   |   |
|   |   Project[int][1] - scope-123
|
|---B: New For Each(false,false)[bag] - scope-106
|   |
|   Cast[int] - scope-101
|   |
|   |---Project[bytearray][0] - scope-100
|   |
|   Cast[int] - scope-104
|   |
|   |---Project[bytearray][1] - scope-103
|
|---B: 
Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage)
 - scope-99
Tez vertex scope-134
# Plan on vertex
POValueOutputTez - scope-146->   [scope-144]
|
|---C: FRJoin[tuple] - scope-112<-   scope-136
|   |
|   Project[int][0] - scope-110
|   |
|   Project[int][0] - scope-111
|
|---POValueInputTez - scope-135 <-   scope-132
Tez vertex scope-140
# Plan on vertex
POValueOutputTez - scope-147->   [scope-144]
|
|---D: FRJoin[tuple] - scope-124<-   scope-136
|   |
|   Project[int][1] - scope-122
|   |
|   Project[int][1] - scope-123
|
|---POValueInputTez - scope-141 <-   scope-132
Tez vertex scope-144
# Plan on vertex
E: 
Store(hdfs://localhost:39746/tmp/temp906575730/tmp1776475591:org.apache.pig.impl.io.InterStorage)
 - scope-131
|
|---POShuffledValueInputTez - scope-145 <-   [scope-134, scope-140]

{noformat}
 

After MultiQueryOptimizerTez::visitTezOp 
{code}
 // If all other conditions were satisfied, but it had a successor union
// with unsupported storefunc keep it in the tentative list. 
{code}
and decides to merge scope-134 and scope-140, 

{noformat}
Tez vertex scope-136
# Plan on vertex
B: Split - scope-148
|   |
|   Local Rearrange[tuple]{int}(false) - scope-115  ->   scope-132
|   |   |
|   |   Project[int][0] - scope-111
|   |
|   Local Rearrange[tuple]{int}(false) - scope-127  ->   scope-132
|   |   |
|   |   Project[int][1] - scope-123
|
|---B: New For Each(false,false)[bag] - scope-106
|   |
|   Cast[int] - scope-101
|   |
|   |---Project[bytearray][0] - scope-100
|   |
|   Cast[int] - scope-104
|   |
|   |---Project[bytearray][1] - scope-103
|
|---B: 
Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage)
 - scope-99
Tez vertex scope-132
# Plan on vertex
POValueOutputTez - scope-133->   []
|
|---A: New For Each(false,false)[bag] - scope-95
|   |
|   Cast[int] - scope-90
|   |
|   |---Project[bytearray][0] - scope-89
|   |
|   Cast[int] - scope-93
|   |
|   |---Project[bytearray][1] - scope-92
|
|---A: 
Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage)
 - scope-88
Tez vertex scope-144
# Plan on vertex
E: 
Store(hdfs://localhost:39746/tmp/temp906575730/tmp1776475591:org.apache.pig.impl.io.InterStorage)
 - scope-131
|
|---POShuffledValueInputTez - scope-145 <-   [scope-132]

{noformat}

This later fails with 
{noformat}
Caused by: java.lang.IllegalArgumentException: Edge [scope-136 : 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
[scope-132 : 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED 
>> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager 
}) already defined!
{noformat}

> TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already 
> defined error on Tez
> ---
>
>     Key: PIG-5444
> URL: https://issues.apache.org/jira/browse/PIG-5444
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> With T

[jira] [Created] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez

2023-11-11 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5444:
-

 Summary: TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with 
Edge already defined error on Tez
 Key: PIG-5444
 URL: https://issues.apache.org/jira/browse/PIG-5444
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Koji Noguchi
Assignee: Koji Noguchi


With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and 
testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin is 
run, these two tests on Tez are failing with
{noformat}
Unable to open iterator for alias E
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias E
at org.apache.pig.PigServer.openIterator(PigServer.java:1024)
at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E
at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
at org.apache.pig.PigServer.store(PigServer.java:1086)
at org.apache.pig.PigServer.openIterator(PigServer.java:999)
Caused by: org.apache.pig.backend.hadoop.executionengine.JobCreationException: 
ERROR 2017: Internal error creating job configuration.
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> 
[scope-628 : 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ 
BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED 
>> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager 
}) already defined!
at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69)
at 
org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5437) Add lib and idea folder to .gitignore

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5437:

Fix Version/s: 0.18.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

+1. Committed to trunk and branch-0.18. Thanks for the contribution [~maswin]

> Add lib and idea folder to .gitignore
> -
>
> Key: PIG-5437
> URL: https://issues.apache.org/jira/browse/PIG-5437
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alagappan Maruthappan
>Assignee: Alagappan Maruthappan
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: PIG-5437-0.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5420) Update accumulo dependency to 1.10.1

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5420:

Fix Version/s: 0.18.1

> Update accumulo dependency to 1.10.1
> 
>
> Key: PIG-5420
> URL: https://issues.apache.org/jira/browse/PIG-5420
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.18.1
>
> Attachments: pig-5420-v01.patch
>
>
> Following owasp/cve report. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5419) Upgrade Joda time version

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5419:

Fix Version/s: 0.18.1
   (was: 0.18.0)

Can you update to 2.12.5 ?

> Upgrade Joda time version
> -
>
> Key: PIG-5419
> URL: https://issues.apache.org/jira/browse/PIG-5419
> Project: Pig
>  Issue Type: Improvement
>Reporter: Venkatasubrahmanian Narayanan
>Assignee: Venkatasubrahmanian Narayanan
>Priority: Minor
> Fix For: 0.18.1
>
> Attachments: PIG-5419.patch
>
>
> Pig depends on an older version of Joda time, which can result in conflicts 
> with other versions in some workflows. Upgrading it to the latest 
> version(2.10.13) will resolve Pig's side of such issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5440) Extra jars needed for hive3

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-5440.
-
Fix Version/s: 0.18.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Committed to trunk and branch-0.18. Thanks [~knoguchi]

> Extra jars needed for hive3
> ---
>
> Key: PIG-5440
> URL: https://issues.apache.org/jira/browse/PIG-5440
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig-5440-v01.patch, pig-5440-v02.patch
>
>
> When testing Hive3,  e2e tests were failing with 
> {{Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/llap/security/LlapSigner$Signable}}  etc. 
> Updating dependent classes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5438:

Fix Version/s: 0.19.0

> Update SparkCounter.Accumulator to AccumulatorV2
> 
>
> Key: PIG-5438
> URL: https://issues.apache.org/jira/browse/PIG-5438
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5438-v01.patch
>
>
> Original Accumulator is deprecated in Spark2 and gone in Spark3.  
> AccumulatorV2 is usable on both Spark2 and Spark3. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5439) Support Spark 3 and drop SparkShim

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5439:

Fix Version/s: 0.19.0

> Support Spark 3 and drop SparkShim
> --
>
> Key: PIG-5439
> URL: https://issues.apache.org/jira/browse/PIG-5439
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5439-v01.patch
>
>
> Support Pig-on-Spark to run on spark3. 
> Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. 
> This is due to log4j mismatch. 
> After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher.
> So far, not all unit/e2e tests pass with the proposed patch but at least 
> compilation goes through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5414) Build failure on Linux ARM64 due to old Apache Avro

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5414:

Fix Version/s: 0.18.1

> Build failure on Linux ARM64 due to old Apache Avro
> ---
>
> Key: PIG-5414
> URL: https://issues.apache.org/jira/browse/PIG-5414
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.18.0
>Reporter: Martin Tzvetanov Grigorov
>Assignee: Martin Tzvetanov Grigorov
>Priority: Major
> Fix For: 0.18.1
>
> Attachments: 35.patch, 
> TEST-org.apache.pig.builtin.TestAvroStorage.txt, 
> TEST-org.apache.pig.builtin.TestOrcStorage.txt, 
> TEST-org.apache.pig.builtin.TestOrcStoragePushdown.txt
>
>
> Trying to build Apache Pig on Ubuntu 20.04.3 ARM64 fails because of old 
> version of Snappy and Avro libraries:
>  
> {code:java}
> Testsuite: org.apache.pig.builtin.TestAvroStorage
> Tests run: 0, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.1 sec
> - Standard Output ---
> 2021-10-12 14:43:35,483 [main] INFO  
> org.apache.pig.impl.util.SpillableMemoryManager  - Selected heap (PS Old Gen) 
> of size 1431830528 to monitor. collectionUsageThreshold = 1064828928, 
> usageThreshold = 1064828928
> 2021-10-12 14:43:35,489 [main] INFO  org.apache.pig.ExecTypeProvider  - 
> Trying ExecType : LOCAL
> 2021-10-12 14:43:35,489 [main] INFO  org.apache.pig.ExecTypeProvider  - 
> Picked LOCAL as the ExecType
> 2021-10-12 14:43:35,515 [main] WARN  org.apache.hadoop.conf.Configuration  - 
> DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml 
> is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml 
> to override properties of core-default.xml, mapred-default.xml and 
> hdfs-default.xml respectively
> 2021-10-12 14:43:35,755 [main] INFO  
> org.apache.hadoop.conf.Configuration.deprecation  - mapred.job.tracker is 
> deprecated. Instead, use mapreduce.jobtracker.address
> 2021-10-12 14:43:35,899 [main] WARN  org.apache.hadoop.util.NativeCodeLoader  
> - Unable to load native-hadoop library for your platform... using 
> builtin-java classes where applicable
> 2021-10-12 14:43:35,916 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting 
> to hadoop file system at: file:///
> 2021-10-12 14:43:36,116 [main] INFO  
> org.apache.hadoop.conf.Configuration.deprecation  - io.bytes.per.checksum is 
> deprecated. Instead, use dfs.bytes-per-checksum
> 2021-10-12 14:43:36,137 [main] INFO  org.apache.pig.PigServer  - Pig Script 
> ID for the session: PIG-default-01426621-bc19-499f-981e-b13959fe0d84
> 2021-10-12 14:43:36,137 [main] WARN  org.apache.pig.PigServer  - ATS is 
> disabled since yarn.timeline-service.enabled set to false
> 2021-10-12 14:43:36,150 [main] INFO  org.apache.pig.builtin.TestAvroStorage  
> - creating 
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/arraysAsOutputByPig.avro
> 2021-10-12 14:43:36,502 [main] INFO  org.apache.pig.builtin.TestAvroStorage  
> - Could not generate avro file: 
> test/org/apache/pig/builtin/avro/data/avro/uncompressed/arraysAsOutputByPig.avro
> java.net.ConnectException: Call From martin/127.0.0.1 to localhost:40073 
> failed on connection exception: java.net.ConnectException: Connection 
> refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> at org.apache.hadoop.ipc.Client.call(Client.java:1479)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ...
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5418) Utils.parseSchema(String), parseConstant(String) leak memory

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5418:

Fix Version/s: 0.18.1

> Utils.parseSchema(String), parseConstant(String) leak memory
> 
>
> Key: PIG-5418
> URL: https://issues.apache.org/jira/browse/PIG-5418
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jacob Tolar
>Assignee: Jacob Tolar
>Priority: Minor
> Fix For: 0.18.1
>
> Attachments: PIG-5418.patch
>
>
> A minor issue: I noticed that Utils.parseSchema() and parseConstant() leak 
> memory. I noticed this while running a unit test for a UDF several thousand 
> times and checking the heap. 
> Links are to latest commit as of creating this ticket: 
> https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/impl/util/Utils.java#L244-L256
> {{new PigContext()}} [creates a MapReduce 
> ExecutionEngine|https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/impl/PigContext.java#L269].
>  
> This creates a 
> [MapReduceLauncher|https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRExecutionEngine.java#L34].
>  
> This registers a [Hadoop shutdown 
> hook|https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L104-L105]
>  which doesn't go away until the JVM dies. See: 
> https://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/api/org/apache/hadoop/util/ShutdownHookManager.html
>  . 
> I will attach a proposed patch. From my reading of the code and running 
> tests, the existing schema parse APIs do not actually use anything from this 
> dummy PigContext, and with a minor tweak it can be passed in as NULL, 
> avoiding the creation of these extra resources. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5443) Add testcase for skew join for tez grace shuffle vertex manager

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5443:

Description: Need to add test case for fix in 
https://issues.apache.org/jira/browse/PIG-5441. Can just modify one of the 
existing skewed join unit or e2e test cases by increasing mappers (split size) 
or adding PARALLEL 2 for right side data. Also check if one-one edges are 
affected by this part of the code.  (was: Need to add test case for fix in 
https://issues.apache.org/jira/browse/PIG-5441. Can just modify one of the 
existing skewed join unit or e2e test cases by increasing mappers (split size) 
or adding PARALLEL 2 for right side data. )

> Add testcase for skew join for tez grace shuffle vertex manager
> ---
>
> Key: PIG-5443
> URL: https://issues.apache.org/jira/browse/PIG-5443
> Project: Pig
>  Issue Type: Task
>Reporter: Rohini Palaniswamy
>Priority: Minor
>
> Need to add test case for fix in 
> https://issues.apache.org/jira/browse/PIG-5441. Can just modify one of the 
> existing skewed join unit or e2e test cases by increasing mappers (split 
> size) or adding PARALLEL 2 for right side data. Also check if one-one edges 
> are affected by this part of the code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5443) Add testcase for skew join for tez grace shuffle vertex manager

2023-07-15 Thread Rohini Palaniswamy (Jira)
Rohini Palaniswamy created PIG-5443:
---

 Summary: Add testcase for skew join for tez grace shuffle vertex 
manager
 Key: PIG-5443
 URL: https://issues.apache.org/jira/browse/PIG-5443
 Project: Pig
  Issue Type: Task
Reporter: Rohini Palaniswamy


Need to add test case for fix in 
https://issues.apache.org/jira/browse/PIG-5441. Can just modify one of the 
existing skewed join unit or e2e test cases by increasing mappers (split size) 
or adding PARALLEL 2 for right side data. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-5442.
-
Fix Version/s: 0.18.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

+1. Committed to branch-0.18. and trunk. Thanks for the contribution [~maswin]

> Add only credentials from setStoreLocation to the Job Conf
> --
>
> Key: PIG-5442
> URL: https://issues.apache.org/jira/browse/PIG-5442
> Project: Pig
>  Issue Type: Bug
>Reporter: Alagappan Maruthappan
>Assignee: Alagappan Maruthappan
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5442-1.patch
>
>
> While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on 
> all Stores with the same Job object - 
> [https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081]
> Setting populated by one store is affecting the other stores. In my case the 
> "mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore 
> that is used by the Iceberg table and the other stores which inserts data to 
> a non-iceberg tables also use that setting and trying to use 
> HiveIcebergCommitter.
>  
> On checking with [~rohini] , it is called to get the credentials from all 
> stores since addCredentials API was added later and not all stores have 
> implemented it and some still set configuration in setLocation method (i.e, 
> HCatStorer). 
>  
> Fixed it by passing a separate copy of Job object to each store's setLocation 
> method and adding only the credential object from the call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5442:

Attachment: PIG-5442-1.patch

> Add only credentials from setStoreLocation to the Job Conf
> --
>
> Key: PIG-5442
> URL: https://issues.apache.org/jira/browse/PIG-5442
> Project: Pig
>  Issue Type: Bug
>Reporter: Alagappan Maruthappan
>Assignee: Alagappan Maruthappan
>Priority: Major
> Attachments: PIG-5442-1.patch
>
>
> While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on 
> all Stores with the same Job object - 
> [https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081]
> Setting populated by one store is affecting the other stores. In my case the 
> "mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore 
> that is used by the Iceberg table and the other stores which inserts data to 
> a non-iceberg tables also use that setting and trying to use 
> HiveIcebergCommitter.
>  
> On checking with [~rohini] , it is called to get the credentials from all 
> stores since addCredentials API was added later and not all stores have 
> implemented it and some still set configuration in setLocation method (i.e, 
> HCatStorer). 
>  
> Fixed it by passing a separate copy of Job object to each store's setLocation 
> method and adding only the credential object from the call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data

2023-07-15 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5441:

Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

Patch committed to branch-0.18 and trunk. Thanks [~yigress] for the 
contribution.

> Pig skew join tez grace reducer fails to find shuffle data
> --
>
> Key: PIG-5441
> URL: https://issues.apache.org/jira/browse/PIG-5441
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.17.0
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5441.patch
>
>
> User pig tez skew join encountered issue of not finding shuffle data from the 
> sampler aggregate vertex. The right side join has >1 reducers.
> For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid 
> spill to disk for the sampler aggregation vertex. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf

2023-07-03 Thread Alagappan Maruthappan (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alagappan Maruthappan updated PIG-5442:
---
External issue URL: https://github.com/apache/pig/pull/40

> Add only credentials from setStoreLocation to the Job Conf
> --
>
> Key: PIG-5442
> URL: https://issues.apache.org/jira/browse/PIG-5442
> Project: Pig
>  Issue Type: Bug
>Reporter: Alagappan Maruthappan
>Assignee: Alagappan Maruthappan
>Priority: Major
>
> While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on 
> all Stores with the same Job object - 
> [https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081]
> Setting populated by one store is affecting the other stores. In my case the 
> "mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore 
> that is used by the Iceberg table and the other stores which inserts data to 
> a non-iceberg tables also use that setting and trying to use 
> HiveIcebergCommitter.
>  
> On checking with [~rohini] , it is called to get the credentials from all 
> stores since addCredentials API was added later and not all stores have 
> implemented it and some still set configuration in setLocation method (i.e, 
> HCatStorer). 
>  
> Fixed it by passing a separate copy of Job object to each store's setLocation 
> method and adding only the credential object from the call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf

2023-07-03 Thread Alagappan Maruthappan (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alagappan Maruthappan reassigned PIG-5442:
--

Assignee: Alagappan Maruthappan

> Add only credentials from setStoreLocation to the Job Conf
> --
>
> Key: PIG-5442
> URL: https://issues.apache.org/jira/browse/PIG-5442
> Project: Pig
>  Issue Type: Bug
>Reporter: Alagappan Maruthappan
>Assignee: Alagappan Maruthappan
>Priority: Major
>
> While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on 
> all Stores with the same Job object - 
> [https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081]
> Setting populated by one store is affecting the other stores. In my case the 
> "mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore 
> that is used by the Iceberg table and the other stores which inserts data to 
> a non-iceberg tables also use that setting and trying to use 
> HiveIcebergCommitter.
>  
> On checking with [~rohini] , it is called to get the credentials from all 
> stores since addCredentials API was added later and not all stores have 
> implemented it and some still set configuration in setLocation method (i.e, 
> HCatStorer). 
>  
> Fixed it by passing a separate copy of Job object to each store's setLocation 
> method and adding only the credential object from the call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf

2023-07-03 Thread Alagappan Maruthappan (Jira)
Alagappan Maruthappan created PIG-5442:
--

 Summary: Add only credentials from setStoreLocation to the Job Conf
 Key: PIG-5442
 URL: https://issues.apache.org/jira/browse/PIG-5442
 Project: Pig
  Issue Type: Bug
Reporter: Alagappan Maruthappan


While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on 
all Stores with the same Job object - 
[https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081]

Setting populated by one store is affecting the other stores. In my case the 
"mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore that 
is used by the Iceberg table and the other stores which inserts data to a 
non-iceberg tables also use that setting and trying to use HiveIcebergCommitter.
 
On checking with [~rohini] , it is called to get the credentials from all 
stores since addCredentials API was added later and not all stores have 
implemented it and some still set configuration in setLocation method (i.e, 
HCatStorer). 
 
Fixed it by passing a separate copy of Job object to each store's setLocation 
method and adding only the credential object from the call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data

2023-05-25 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated PIG-5441:
--
Attachment: PIG-5441.patch

> Pig skew join tez grace reducer fails to find shuffle data
> --
>
> Key: PIG-5441
> URL: https://issues.apache.org/jira/browse/PIG-5441
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.17.0
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5441.patch
>
>
> User pig tez skew join encountered issue of not finding shuffle data from the 
> sampler aggregate vertex. The right side join has >1 reducers.
> For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid 
> spill to disk for the sampler aggregation vertex. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data

2023-05-24 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726009#comment-17726009
 ] 

Yi Zhang commented on PIG-5441:
---

[~knoguchi] can you add unit test as separate jira? I am not actively working 
on Pig itself and don't have bandwidth right now. Thank you! 

> Pig skew join tez grace reducer fails to find shuffle data
> --
>
> Key: PIG-5441
> URL: https://issues.apache.org/jira/browse/PIG-5441
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.17.0
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
> Fix For: 0.18.0
>
>
> User pig tez skew join encountered issue of not finding shuffle data from the 
> sampler aggregate vertex. The right side join has >1 reducers.
> For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid 
> spill to disk for the sampler aggregation vertex. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data

2023-05-24 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725786#comment-17725786
 ] 

Koji Noguchi commented on PIG-5441:
---

It would be nice if you can add a unit test.  
(However, if you don't have bandwidth I understand.  I can try to add the test 
later as a separate jira.)

> Pig skew join tez grace reducer fails to find shuffle data
> --
>
> Key: PIG-5441
> URL: https://issues.apache.org/jira/browse/PIG-5441
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.17.0
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
> Fix For: 0.18.0
>
>
> User pig tez skew join encountered issue of not finding shuffle data from the 
> sampler aggregate vertex. The right side join has >1 reducers.
> For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid 
> spill to disk for the sampler aggregation vertex. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data

2023-05-24 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725781#comment-17725781
 ] 

Rohini Palaniswamy commented on PIG-5441:
-

+1. Can you just attach the patch to jira ?

> Pig skew join tez grace reducer fails to find shuffle data
> --
>
> Key: PIG-5441
> URL: https://issues.apache.org/jira/browse/PIG-5441
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.17.0
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
> Fix For: 0.18.0
>
>
> User pig tez skew join encountered issue of not finding shuffle data from the 
> sampler aggregate vertex. The right side join has >1 reducers.
> For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid 
> spill to disk for the sampler aggregation vertex. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data

2023-05-24 Thread Rohini Palaniswamy (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-5441:

Fix Version/s: 0.18.0
 Assignee: Yi Zhang
   Status: Patch Available  (was: Open)

> Pig skew join tez grace reducer fails to find shuffle data
> --
>
> Key: PIG-5441
> URL: https://issues.apache.org/jira/browse/PIG-5441
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Affects Versions: 0.17.0
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
> Fix For: 0.18.0
>
>
> User pig tez skew join encountered issue of not finding shuffle data from the 
> sampler aggregate vertex. The right side join has >1 reducers.
> For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid 
> spill to disk for the sampler aggregation vertex. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5440) Extra jars needed for hive3

2023-05-24 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725780#comment-17725780
 ] 

Rohini Palaniswamy commented on PIG-5440:
-

+1

> Extra jars needed for hive3
> ---
>
> Key: PIG-5440
> URL: https://issues.apache.org/jira/browse/PIG-5440
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5440-v01.patch, pig-5440-v02.patch
>
>
> When testing Hive3,  e2e tests were failing with 
> {{Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/llap/security/LlapSigner$Signable}}  etc. 
> Updating dependent classes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data

2023-05-19 Thread Yi Zhang (Jira)
Yi Zhang created PIG-5441:
-

 Summary: Pig skew join tez grace reducer fails to find shuffle data
 Key: PIG-5441
 URL: https://issues.apache.org/jira/browse/PIG-5441
 Project: Pig
  Issue Type: Bug
  Components: tez
Affects Versions: 0.17.0
Reporter: Yi Zhang


User pig tez skew join encountered issue of not finding shuffle data from the 
sampler aggregate vertex. The right side join has >1 reducers.

For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid 
spill to disk for the sampler aggregation vertex. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5440) Extra jars needed for hive3

2023-05-12 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722278#comment-17722278
 ] 

Koji Noguchi commented on PIG-5440:
---

bq. Can you add space between "orc-shims","aircompressor" before commit ?

Attached pig-5440-v02.patch.

> Extra jars needed for hive3
> ---
>
> Key: PIG-5440
> URL: https://issues.apache.org/jira/browse/PIG-5440
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5440-v01.patch, pig-5440-v02.patch
>
>
> When testing Hive3,  e2e tests were failing with 
> {{Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/llap/security/LlapSigner$Signable}}  etc. 
> Updating dependent classes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5440) Extra jars needed for hive3

2023-05-12 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5440:
--
Attachment: pig-5440-v02.patch

> Extra jars needed for hive3
> ---
>
> Key: PIG-5440
> URL: https://issues.apache.org/jira/browse/PIG-5440
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5440-v01.patch, pig-5440-v02.patch
>
>
> When testing Hive3,  e2e tests were failing with 
> {{Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/llap/security/LlapSigner$Signable}}  etc. 
> Updating dependent classes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5440) Extra jars needed for hive3

2023-05-12 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722276#comment-17722276
 ] 

Rohini Palaniswamy commented on PIG-5440:
-

+1. Can you add space between "orc-shims","aircompressor" before commit ?

> Extra jars needed for hive3
> ---
>
> Key: PIG-5440
> URL: https://issues.apache.org/jira/browse/PIG-5440
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5440-v01.patch
>
>
> When testing Hive3,  e2e tests were failing with 
> {{Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/llap/security/LlapSigner$Signable}}  etc. 
> Updating dependent classes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5440) Extra jars needed for hive3

2023-05-12 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5440:
--
Attachment: pig-5440-v01.patch

> Extra jars needed for hive3
> ---
>
> Key: PIG-5440
> URL: https://issues.apache.org/jira/browse/PIG-5440
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5440-v01.patch
>
>
> When testing Hive3,  e2e tests were failing with 
> {{Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/llap/security/LlapSigner$Signable}}  etc. 
> Updating dependent classes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5440) Extra jars needed for hive3

2023-05-12 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5440:
-

 Summary: Extra jars needed for hive3
 Key: PIG-5440
 URL: https://issues.apache.org/jira/browse/PIG-5440
 Project: Pig
  Issue Type: Improvement
Reporter: Koji Noguchi
Assignee: Koji Noguchi


When testing Hive3,  e2e tests were failing with 
{{Caused by: java.lang.NoClassDefFoundError: 
org/apache/hadoop/hive/llap/security/LlapSigner$Signable}}  etc. 

Updating dependent classes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PIG-5437) Add lib and idea folder to .gitignore

2023-05-11 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5437:
-

Assignee: Alagappan Maruthappan

> Add lib and idea folder to .gitignore
> -
>
> Key: PIG-5437
> URL: https://issues.apache.org/jira/browse/PIG-5437
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alagappan Maruthappan
>Assignee: Alagappan Maruthappan
>Priority: Minor
> Attachments: PIG-5437-0.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5439) Support Spark 3 and drop SparkShim

2023-05-11 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5439:
--
Attachment: pig-5439-v01.patch

> Support Spark 3 and drop SparkShim
> --
>
> Key: PIG-5439
> URL: https://issues.apache.org/jira/browse/PIG-5439
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5439-v01.patch
>
>
> Support Pig-on-Spark to run on spark3. 
> Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. 
> This is due to log4j mismatch. 
> After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher.
> So far, not all unit/e2e tests pass with the proposed patch but at least 
> compilation goes through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5439) Support Spark 3 and drop SparkShim

2023-05-11 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5439:
-

 Summary: Support Spark 3 and drop SparkShim
 Key: PIG-5439
 URL: https://issues.apache.org/jira/browse/PIG-5439
 Project: Pig
  Issue Type: Improvement
  Components: spark
Reporter: Koji Noguchi
Assignee: Koji Noguchi


Support Pig-on-Spark to run on spark3. 

Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. 
This is due to log4j mismatch. 

After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher.

So far, not all unit/e2e tests pass with the proposed patch but at least 
compilation goes through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >