[jira] [Reopened] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reopened PIG-5453: --- While tracking multiple jiras, I missed that this patch was not put through full unit/e2e tests. (Thus the previous syntax error.) After fixing the simple syntax error, saw a couple of regression test failures. At this point, reverting the patch while I debug and come up with a new patch. So sorry. > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch, pig-5453-v02.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847097#comment-17847097 ] Daniel Dai commented on PIG-5453: - +1 > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch, pig-5453-v02.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847095#comment-17847095 ] Koji Noguchi commented on PIG-5453: --- Sorry, original patch had extra comma causing the compile error for TestFlatten.java. Uploaded pig-5453-v02.patch. To fix the broken trunk, I pushed the change. > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch, pig-5453-v02.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5453: -- Attachment: pig-5453-v02.patch > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch, pig-5453-v02.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5453. --- Fix Version/s: 0.19.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Daniel! Committed to trunk. > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)
[ https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5452. --- Fix Version/s: 0.19.0 Resolution: Fixed Thanks for the review Daniel! Committed to trunk. > Null handling of FLATTEN with user defined schema (as clause) > - > > Key: PIG-5452 > URL: https://issues.apache.org/jira/browse/PIG-5452 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5452-v01.patch > > > Follow up from PIG-5201, > {code:java} > A = load 'input' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 > as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2), a3; > dump C;{code} > This produces right number of nulls. > {code:java} > (a,,,a) > (b,,,b) > (c,,,c) > (d,,,d) > (f,,,f) {code} > > However, > {code:java} > A = load 'input.txt' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; > dump C;{code} > This produces wrong number of null and the output is shifted incorrectly. > {code:java} > (a,,a,) > (b,,b,) > (c,,c,) > (d,,d,) > (f,,f,) {code} > Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of > tuple() with empty inner fields but with user defined schema of "as > (A1:chararray, A2:chararray)". > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
[ https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5450. --- Fix Version/s: 0.19.0 Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type > -- > > Key: PIG-5450 > URL: https://issues.apache.org/jira/browse/PIG-5450 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5450-v01.patch > > > {noformat} > Caused by: java.lang.VerifyError: Bad return type > Exception Details: > Location: > org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; > @117: areturn > Reason: > Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, > stack[0]) is not assignable to > 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
[ https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5446. --- Fix Version/s: 0.19.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing > --- > > Key: PIG-5446 > URL: https://issues.apache.org/jira/browse/PIG-5446 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5446-v01.patch > > > {noformat} > Unable to open iterator for alias B. Backend error : Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:0 > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, > vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, > taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Attempt failed because it appears to make no progress for > 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to > make no progress for 1ms]], Vertex did not succeed due to > OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at org.apache.pig.PigServer.openIterator(PigServer.java:1014) > at > org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58) > Caused by: org.apache.tez.dag.api.TezException: Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 45.647 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5448. --- Fix Version/s: 0.19.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > All TestHBaseStorage tests failing on pig-on-spark3 > --- > > Key: PIG-5448 > URL: https://issues.apache.org/jira/browse/PIG-5448 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.19.0 > > Attachments: pig-5448-v01.patch > > > For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are > failing with > {noformat} > org.apache.pig.PigException: ERROR 1002: Unable to store alias b > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at > org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.RuntimeException: No task metrics available for jobId 0 > at > org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5447. --- Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with > NoSuchElementException > --- > > Key: PIG-5447 > URL: https://issues.apache.org/jira/browse/PIG-5447 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5447-v01.patch > > > TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer > and full-outer joins. > "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5447: -- Fix Version/s: 0.19.0 > Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with > NoSuchElementException > --- > > Key: PIG-5447 > URL: https://issues.apache.org/jira/browse/PIG-5447 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5447-v01.patch > > > TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer > and full-outer joins. > "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5439) Support Spark 3 and drop SparkShim
[ https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5439. --- Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk > Support Spark 3 and drop SparkShim > -- > > Key: PIG-5439 > URL: https://issues.apache.org/jira/browse/PIG-5439 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5439-v01.patch, pig-5439-v02.patch > > > Support Pig-on-Spark to run on spark3. > Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. > This is due to log4j mismatch. > After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher. > So far, not all unit/e2e tests pass with the proposed patch but at least > compilation goes through. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2
[ https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5438: -- Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the review Rohini! Committed to trunk. > Update SparkCounter.Accumulator to AccumulatorV2 > > > Key: PIG-5438 > URL: https://issues.apache.org/jira/browse/PIG-5438 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5438-v01.patch > > > Original Accumulator is deprecated in Spark2 and gone in Spark3. > AccumulatorV2 is usable on both Spark2 and Spark3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"
[ https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5416. --- Fix Version/s: 0.19.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > Spark unit tests failing randomly with "java.lang.RuntimeException: > Unexpected job execution status RUNNING" > > > Key: PIG-5416 > URL: https://issues.apache.org/jira/browse/PIG-5416 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.19.0 > > Attachments: pig-5416-v01.patch > > > Spark unit tests fail randomly with same errors. > Sample stack trace showing "Caused by: java.lang.RuntimeException: > Unexpected job execution status RUNNING". > {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF} > Unable to store alias B > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to > store alias B > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783) > at org.apache.pig.PigServer.registerQuery(PigServer.java:708) > at org.apache.pig.PigServer.registerQuery(PigServer.java:721) > at > org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.execute(PigServer.java:1453) > at org.apache.pig.PigServer.access$500(PigServer.java:119) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778) > Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5439) Support Spark 3 and drop SparkShim
[ https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844427#comment-17844427 ] Rohini Palaniswamy commented on PIG-5439: - +1 > Support Spark 3 and drop SparkShim > -- > > Key: PIG-5439 > URL: https://issues.apache.org/jira/browse/PIG-5439 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5439-v01.patch, pig-5439-v02.patch > > > Support Pig-on-Spark to run on spark3. > Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. > This is due to log4j mismatch. > After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher. > So far, not all unit/e2e tests pass with the proposed patch but at least > compilation goes through. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838865#comment-17838865 ] Daniel Dai commented on PIG-5453: - +1 > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5453-v01.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5454) Make ParallelGC the default Garbage Collection
Koji Noguchi created PIG-5454: - Summary: Make ParallelGC the default Garbage Collection Key: PIG-5454 URL: https://issues.apache.org/jira/browse/PIG-5454 Project: Pig Issue Type: Bug Components: impl Reporter: Koji Noguchi >From JDK9 and beyond, G1GC became the default GC. I've seen our users hitting OOM after migrating to recent jdk and the issue going away after reverting back to ParallelGC. Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5453: -- Attachment: pig-5453-v01.patch Uploading the patch that uses a new field introduced as part of PIG-5201, PIG-5452. If number of fields are less than the expected number of fields, it will now fills the rest with null. If number of fields are more, then it would now fills up to the expected number of fields only. (pig-5453-v01.patch) > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5453-v01.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
[ https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837863#comment-17837863 ] Rohini Palaniswamy commented on PIG-5450: - +1 > Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type > -- > > Key: PIG-5450 > URL: https://issues.apache.org/jira/browse/PIG-5450 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5450-v01.patch > > > {noformat} > Caused by: java.lang.VerifyError: Bad return type > Exception Details: > Location: > org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; > @117: areturn > Reason: > Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, > stack[0]) is not assignable to > 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837862#comment-17837862 ] Rohini Palaniswamy commented on PIG-5449: - +1 > TestEmptyInputDir failing on pig-on-spark3 > -- > > Key: PIG-5449 > URL: https://issues.apache.org/jira/browse/PIG-5449 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5449-v01.patch > > > TestEmptyInputDir failing on pig-on-spark3 with > {noformat:title=TestEmptyInputDir.testMergeJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141) > {noformat} > {noformat:title=TestEmptyInputDir.testGroupByFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80) > {noformat} > {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297) > {noformat} > {noformat:title=TestEmptyInputDir.testFRJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171) > {noformat} > {noformat:title=TestEmptyInputDir.testBloomJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267) > {noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837861#comment-17837861 ] Rohini Palaniswamy commented on PIG-5448: - +1 > All TestHBaseStorage tests failing on pig-on-spark3 > --- > > Key: PIG-5448 > URL: https://issues.apache.org/jira/browse/PIG-5448 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5448-v01.patch > > > For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are > failing with > {noformat} > org.apache.pig.PigException: ERROR 1002: Unable to store alias b > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at > org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.RuntimeException: No task metrics available for jobId 0 > at > org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2
[ https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837860#comment-17837860 ] Rohini Palaniswamy commented on PIG-5438: - +1 > Update SparkCounter.Accumulator to AccumulatorV2 > > > Key: PIG-5438 > URL: https://issues.apache.org/jira/browse/PIG-5438 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5438-v01.patch > > > Original Accumulator is deprecated in Spark2 and gone in Spark3. > AccumulatorV2 is usable on both Spark2 and Spark3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5453) FLATTEN shifting fields incorrectly
Koji Noguchi created PIG-5453: - Summary: FLATTEN shifting fields incorrectly Key: PIG-5453 URL: https://issues.apache.org/jira/browse/PIG-5453 Project: Pig Issue Type: Bug Components: impl Reporter: Koji Noguchi Assignee: Koji Noguchi Follow up from PIG-5201, PIG-5452. When flatten-ed tuple has less or more fields than specified, entire fields shift incorrectly. Input {noformat} A (a,b,c) B (a,b,c) C (a,b,c) Y (a,b) Z (a,b,c,d,e,f) E{noformat} Script {code:java} A = load 'input.txt' as (a1:chararray, a2:tuple()); B = FOREACH A GENERATE a1, FLATTEN(a2) as (b1:chararray,b2:chararray,b3:chararray), a1 as a4; dump B; {code} Incorrect results {noformat} (A,a,b,c,A) (B,a,b,c,B) (C,a,b,c,C) (Y,a,b,Y,) (Z,a,b,c,d) (EE){noformat} E is correct. It's fixed as part of PIG-5201, PIG-5452. Y has shifted a4(Y) to the left incorrectly. Should have been (Y,a,b,,Y) Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). Should have been (Z,a,b,c,Z). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)
[ https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5452: -- Description: Follow up from PIG-5201, {code:java} A = load 'input' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2), a3; dump C;{code} This produces right number of nulls. {code:java} (a,,,a) (b,,,b) (c,,,c) (d,,,d) (f,,,f) {code} However, {code:java} A = load 'input.txt' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; dump C;{code} This produces wrong number of null and the output is shifted incorrectly. {code:java} (a,,a,) (b,,b,) (c,,c,) (d,,d,) (f,,f,) {code} Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of tuple() with empty inner fields but with user defined schema of "as (A1:chararray, A2:chararray)". was: Follow up from PIG-5201, {code:java} A = load 'input' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2), a3; dump C;{code} This produces right number of nulls. {code:java} (a,,,a) (b,,,b) (c,,,c) (d,,,d) (f,,,f) {code} However, {code:java} A = load 'input.txt' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; dump C;{code} This produces wrong number of null and the output is shifted incorrectly. {code:java} (a,,a,) (b,,b,) (c,,c,) (d,,d,) (f,,f,) {code} Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of tuple() with empty inner fields. > Null handling of FLATTEN with user defined schema (as clause) > - > > Key: PIG-5452 > URL: https://issues.apache.org/jira/browse/PIG-5452 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5452-v01.patch > > > Follow up from PIG-5201, > {code:java} > A = load 'input' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 > as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2), a3; > dump C;{code} > This produces right number of nulls. > {code:java} > (a,,,a) > (b,,,b) > (c,,,c) > (d,,,d) > (f,,,f) {code} > > However, > {code:java} > A = load 'input.txt' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; > dump C;{code} > This produces wrong number of null and the output is shifted incorrectly. > {code:java} > (a,,a,) > (b,,b,) > (c,,c,) > (d,,d,) > (f,,f,) {code} > Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of > tuple() with empty inner fields but with user defined schema of "as > (A1:chararray, A2:chararray)". > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)
[ https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5452: -- Attachment: pig-5452-v01.patch Instead of relying on innerfield schema, using the output schema which combines schema of data and user-defined schema. > Null handling of FLATTEN with user defined schema (as clause) > - > > Key: PIG-5452 > URL: https://issues.apache.org/jira/browse/PIG-5452 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5452-v01.patch > > > Follow up from PIG-5201, > {code:java} > A = load 'input' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 > as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2), a3; > dump C;{code} > This produces right number of nulls. > {code:java} > (a,,,a) > (b,,,b) > (c,,,c) > (d,,,d) > (f,,,f) {code} > > However, > {code:java} > A = load 'input.txt' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; > dump C;{code} > This produces wrong number of null and the output is shifted incorrectly. > {code:java} > (a,,a,) > (b,,b,) > (c,,c,) > (d,,d,) > (f,,f,) {code} > Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of > tuple() with empty inner fields. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)
Koji Noguchi created PIG-5452: - Summary: Null handling of FLATTEN with user defined schema (as clause) Key: PIG-5452 URL: https://issues.apache.org/jira/browse/PIG-5452 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Assignee: Koji Noguchi Follow up from PIG-5201, {code:java} A = load 'input' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2), a3; dump C;{code} This produces right number of nulls. {code:java} (a,,,a) (b,,,b) (c,,,c) (d,,,d) (f,,,f) {code} However, {code:java} A = load 'input.txt' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; dump C;{code} This produces wrong number of null and the output is shifted incorrectly. {code:java} (a,,a,) (b,,b,) (c,,c,) (d,,d,) (f,,f,) {code} Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of tuple() with empty inner fields. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"
[ https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5416: - Assignee: Koji Noguchi > Spark unit tests failing randomly with "java.lang.RuntimeException: > Unexpected job execution status RUNNING" > > > Key: PIG-5416 > URL: https://issues.apache.org/jira/browse/PIG-5416 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5416-v01.patch > > > Spark unit tests fail randomly with same errors. > Sample stack trace showing "Caused by: java.lang.RuntimeException: > Unexpected job execution status RUNNING". > {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF} > Unable to store alias B > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to > store alias B > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783) > at org.apache.pig.PigServer.registerQuery(PigServer.java:708) > at org.apache.pig.PigServer.registerQuery(PigServer.java:721) > at > org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.execute(PigServer.java:1453) > at org.apache.pig.PigServer.access$500(PigServer.java:119) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778) > Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing
[ https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832323#comment-17832323 ] Koji Noguchi commented on PIG-5451: --- This was caused by conflict of orc.version. ./build/ivy/lib/Pig/orc-core-1.5.6.jar ./lib/h3/orc-core-1.5.6.jar and spark/jars/orc-core-1.6.14.jar > Pig-on-Spark3 E2E Orc_Pushdown_5 failing > - > > Key: PIG-5451 > URL: https://issues.apache.org/jira/browse/PIG-5451 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > > Test failing with > "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate > cannot access its superclass org.threeten.extra.chrono.AbstractDate" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing
Koji Noguchi created PIG-5451: - Summary: Pig-on-Spark3 E2E Orc_Pushdown_5 failing Key: PIG-5451 URL: https://issues.apache.org/jira/browse/PIG-5451 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Assignee: Koji Noguchi Test failing with "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate cannot access its superclass org.threeten.extra.chrono.AbstractDate" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing
[ https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832320#comment-17832320 ] Koji Noguchi commented on PIG-5451: --- Full stack trace. {noformat} 2024-03-29 10:57:31,787 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - ResultStage 3 (runJob at SparkHadoopWriter.scala:83) failed in 36.126 s due to Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 8) (gsrd479n10.red.ygrid.yahoo.com executor 4): java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate cannot access its superclass org.threeten.extra.chrono.AbstractDate at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:46) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.threeten.extra.chrono.HybridChronology.date(HybridChronology.java:235) at org.threeten.extra.chrono.HybridChronology.date(HybridChronology.java:88) at java.time.chrono.AbstractChronology.resolveYMD(AbstractChronology.java:563) at java.time.chrono.AbstractChronology.resolveDate(AbstractChronology.java:472) at org.threeten.extra.chrono.HybridChronology.resolveDate(HybridChronology.java:452) at org.threeten.extra.chrono.HybridChronology.resolveDate(HybridChronology.java:88) at java.time.format.Parsed.resolveDateFields(Parsed.java:351) at java.time.format.Parsed.resolveFields(Parsed.java:257) at java.time.format.Parsed.resolve(Parsed.java:244) at java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331) at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1955) at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777) at org.apache.orc.impl.DateUtils._clinit_(DateUtils.java:74) at org.apache.orc.impl.ColumnStatisticsImpl$TimestampStatisticsImpl._init_(ColumnStatisticsImpl.java:1683) at org.apache.orc.impl.ColumnStatisticsImpl.deserialize(ColumnStatisticsImpl.java:2131) at org.apache.orc.impl.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:522) at org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:1045) at org.apache.orc.impl.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:1117) at org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1137) at org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1187) at org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1222) at org.apache.orc.impl.RecordReaderImpl._init_(RecordReaderImpl.java:254) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl._init_(RecordReaderImpl.java:67) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:83) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:337) at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat$OrcRecordReader._init_(OrcNewInputFormat.java:72) at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.createRecordReader(OrcNewInputFormat.java:57) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:255) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader._init_(PigRecordReader.java:126) at org.apache.pig.backend.hadoop.executionengine.spark.SparkPigRecordReader._init_(SparkPigRecordReader.java:44) at org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark$SparkRecordReaderFactory.createRecordReader(PigInputFormatSpark.java:131) at org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark.createRecordReader(PigInputFormatSpark.java:71) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:215) at org.apache.spark.rdd.NewHadoopRDD$$anon$1._init_(NewHadoopRDD.scala:213) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:168) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337
[jira] [Updated] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
[ https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5450: -- Attachment: pig-5450-v01.patch It turns out the weird error was coming from conflicting jar. {{./build/ivy/lib/Pig/hive-storage-api-2.7.0.jar}} and {{spark/spark/jars/hive-storage-api-2.7.2.jar}} Uploading a patch updating hive-storage-api version. > Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type > -- > > Key: PIG-5450 > URL: https://issues.apache.org/jira/browse/PIG-5450 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5450-v01.patch > > > {noformat} > Caused by: java.lang.VerifyError: Bad return type > Exception Details: > Location: > org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; > @117: areturn > Reason: > Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, > stack[0]) is not assignable to > 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
[ https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832318#comment-17832318 ] Koji Noguchi commented on PIG-5450: --- Weird full trace. {noformat} 024-03-27 10:50:40,088 [task-result-getter-0] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0) (gsrd238n05.red.ygrid.yahoo.com executor 1): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:163) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.VerifyError: Bad return type Exception Details: Location: org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; @117: areturn Reason: Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, stack[0]) is not assignable to 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) Current Frame: bci: @117 flags: { } locals: { 'org/apache/orc/TypeDescription', 'org/apache/orc/TypeDescription$RowBatchVersion', integer } stack: { 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' } Bytecode: 0x000: b200 022a b600 03b6 0004 2eaa 0181 0x010: 0001 0013 0059 0059 0x020: 0059 0059 0059 0062 0x030: 006b 006b 0074 0074 0x040: 007d 00ad 00ad 00ad 0x050: 00ad 00b6 00f7 0138 0x060: 0155 bb00 0559 1cb7 0006 b0bb 0007 0x070: 591c b700 08b0 bb00 0959 1cb7 000a b0bb 0x080: 000b 591c b700 0cb0 2ab6 000d 3e2a b600 0x090: 0e36 042b b200 0fa5 0009 1d10 12a4 000f 0x0a0: bb00 1159 1c1d 1504 b700 12b0 bb00 1359 0x0b0: 1c1d 1504 b700 14b0 bb00 1559 1cb7 0016 0x0c0: b02a b600 174e 2db9 0018 0100 bd00 193a 0x0d0: 0403 3605 1505 1904 bea2 001e 1904 1505 0x0e0: 2d15 05b9 001a 0200 c000 102b 1cb8 001b 0x0f0: 5384 0501 a7ff e0bb 001c 591c 1904 b700 0x100: 1db0 2ab6 0017 4e2d b900 1801 00bd 0019 0x110: 3a04 0336 0515 0519 04be a200 1e19 0415 0x120: 052d 1505 b900 1a02 00c0 0010 2b1c b800 0x130: 1b53 8405 01a7 ffe0 bb00 1e59 1c19 04b7 0x140: 001f b02a b600 174e bb00 2059 1c2d 03b9 0x150: 001a 0200 c000 102b 1cb8 001b b700 21b0 0x160: 2ab6 0017 4ebb 0022 591c 2d03 b900 1a02 0x170: 00c0 0010 2b1c b800 1b2d 04b9 001a 0200 0x180: c000 102b 1cb8 001b b700 23b0 bb00 2459 0x190: bb00 2559 b700 2612 27b6 0028 2ab6 0003 0x1a0: b600 29b6 002a b700 2bbf Stackmap Table: same_frame_extended(@100) same_frame(@109) same_frame(@118) same_frame(@127) same_frame(@136) append_frame(@160,Integer,Integer) same_frame(@172) chop_frame(@184,2) same_frame(@193) append_frame(@212,Object[_75],Object[_76],Integer) chop_frame(@247,1) chop_frame(@258,2) append_frame(@277,Object[_75],Object[_76],Integer) chop_frame(@312,1) chop_frame(@323,2) same_frame(@352) same_frame(@396) at org.apache.orc.TypeDescription.createRowBatch(TypeDescription.java:483) at org.apache.hadoop.hive.ql.io.orc.WriterImpl._init_(WriterImpl.java:100) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:334) at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:51) at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:37) at org.apache.pig.builtin.OrcStorage.putNext(OrcStorage.java:249) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:146) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.write(SparkHadoopWriter.scala:368) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:138) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525) at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:135
[jira] [Created] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
Koji Noguchi created PIG-5450: - Summary: Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type Key: PIG-5450 URL: https://issues.apache.org/jira/browse/PIG-5450 Project: Pig Issue Type: Bug Components: spark Reporter: Koji Noguchi Assignee: Koji Noguchi {noformat} Caused by: java.lang.VerifyError: Bad return type Exception Details: Location: org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; @117: areturn Reason: Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, stack[0]) is not assignable to 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5410) Support Python 3 for streaming_python
[ https://issues.apache.org/jira/browse/PIG-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5410: -- Attachment: pig-5410-v02.patch Testing the patch, it was failing with {noformat} Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File "/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py", line 365 WRAPPED_MAP_END) ^ SyntaxError: invalid syntax {noformat} it seems like the patch was missing a '+'. Uploading a new patch with '+'. > Support Python 3 for streaming_python > - > > Key: PIG-5410 > URL: https://issues.apache.org/jira/browse/PIG-5410 > Project: Pig > Issue Type: New Feature >Reporter: Rohini Palaniswamy >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5410.patch, pig-5410-v02.patch > > > Python 3 is incompatible with Python 2. We need to make it work with both. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (PIG-5410) Support Python 3 for streaming_python
[ https://issues.apache.org/jira/browse/PIG-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832317#comment-17832317 ] Koji Noguchi edited comment on PIG-5410 at 3/29/24 9:10 PM: Testing the patch, it was failing with {noformat} Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File "/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py", line 365 WRAPPED_MAP_END) ^ SyntaxError: invalid syntax {noformat} it seems like the patch was missing a '+'. Uploading a new patch. was (Author: knoguchi): Testing the patch, it was failing with {noformat} Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File "/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py", line 365 WRAPPED_MAP_END) ^ SyntaxError: invalid syntax {noformat} it seems like the patch was missing a '+'. Uploading a new patch with '+'. > Support Python 3 for streaming_python > - > > Key: PIG-5410 > URL: https://issues.apache.org/jira/browse/PIG-5410 > Project: Pig > Issue Type: New Feature >Reporter: Rohini Palaniswamy >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5410.patch, pig-5410-v02.patch > > > Python 3 is incompatible with Python 2. We need to make it work with both. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5449: -- Attachment: pig-5449-v01.patch Before (in spark2 land), this used to work by checking empty list returned by getjobIDs. https://github.com/apache/pig/blob/branch-0.17/src/org/apache/pig/backend/hadoop/executionengine/spark/JobGraphBuilder.java#L210-L219 But with spark3, this is returning actual jobid but no metrics stored behind. Instead of adding another logic for spark3, I think we can treat metrics retrieval as optional like we do in mapreduce & tez.Attaching a patch. (pig-5449-v01.patch) > TestEmptyInputDir failing on pig-on-spark3 > -- > > Key: PIG-5449 > URL: https://issues.apache.org/jira/browse/PIG-5449 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5449-v01.patch > > > TestEmptyInputDir failing on pig-on-spark3 with > {noformat:title=TestEmptyInputDir.testMergeJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141) > {noformat} > {noformat:title=TestEmptyInputDir.testGroupByFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80) > {noformat} > {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297) > {noformat} > {noformat:title=TestEmptyInputDir.testFRJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171) > {noformat} > {noformat:title=TestEmptyInputDir.testBloomJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267) > {noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3
Koji Noguchi created PIG-5449: - Summary: TestEmptyInputDir failing on pig-on-spark3 Key: PIG-5449 URL: https://issues.apache.org/jira/browse/PIG-5449 Project: Pig Issue Type: Bug Components: spark Reporter: Koji Noguchi Assignee: Koji Noguchi TestEmptyInputDir failing on pig-on-spark3 with {noformat:title=TestEmptyInputDir.testMergeJoinFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141) {noformat} {noformat:title=TestEmptyInputDir.testGroupByFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80) {noformat} {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297) {noformat} {noformat:title=TestEmptyInputDir.testFRJoinFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171) {noformat} {noformat:title=TestEmptyInputDir.testBloomJoinFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5448: -- Attachment: pig-5448-v01.patch {quote}No task metrics available for jobId 0 {quote} This is actually failing because Pig is succeeding without running anything. Looking further, found out that Spark is filtering out all input splits and reporting successful empty job results with no metrics. Setting a flag so that Spark would not ignore PigSplit which looks empty but still have (non-hdfs) inputs. (pig-5448-v01.patch) > All TestHBaseStorage tests failing on pig-on-spark3 > --- > > Key: PIG-5448 > URL: https://issues.apache.org/jira/browse/PIG-5448 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5448-v01.patch > > > For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are > failing with > {noformat} > org.apache.pig.PigException: ERROR 1002: Unable to store alias b > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at > org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.RuntimeException: No task metrics available for jobId 0 > at > org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5439) Support Spark 3 and drop SparkShim
[ https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5439: -- Attachment: pig-5439-v02.patch Adding missing spark-scala.version. (pig-5439-v02.patch) > Support Spark 3 and drop SparkShim > -- > > Key: PIG-5439 > URL: https://issues.apache.org/jira/browse/PIG-5439 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5439-v01.patch, pig-5439-v02.patch > > > Support Pig-on-Spark to run on spark3. > Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. > This is due to log4j mismatch. > After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher. > So far, not all unit/e2e tests pass with the proposed patch but at least > compilation goes through. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3
Koji Noguchi created PIG-5448: - Summary: All TestHBaseStorage tests failing on pig-on-spark3 Key: PIG-5448 URL: https://issues.apache.org/jira/browse/PIG-5448 Project: Pig Issue Type: Bug Components: spark Reporter: Koji Noguchi Assignee: Koji Noguchi For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are failing with {noformat} org.apache.pig.PigException: ERROR 1002: Unable to store alias b at org.apache.pig.PigServer.storeEx(PigServer.java:1127) at org.apache.pig.PigServer.store(PigServer.java:1086) at org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251) Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get the rdds of this spark operator: at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) at org.apache.pig.PigServer.storeEx(PigServer.java:1123) Caused by: java.lang.RuntimeException: No task metrics available for jobId 0 at org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109) at org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77) at org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
[ https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826791#comment-17826791 ] Rohini Palaniswamy commented on PIG-5446: - +1 > Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing > --- > > Key: PIG-5446 > URL: https://issues.apache.org/jira/browse/PIG-5446 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5446-v01.patch > > > {noformat} > Unable to open iterator for alias B. Backend error : Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:0 > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, > vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, > taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Attempt failed because it appears to make no progress for > 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to > make no progress for 1ms]], Vertex did not succeed due to > OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at org.apache.pig.PigServer.openIterator(PigServer.java:1014) > at > org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58) > Caused by: org.apache.tez.dag.api.TezException: Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 45.647 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"
[ https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826790#comment-17826790 ] Rohini Palaniswamy commented on PIG-5416: - +1 > Spark unit tests failing randomly with "java.lang.RuntimeException: > Unexpected job execution status RUNNING" > > > Key: PIG-5416 > URL: https://issues.apache.org/jira/browse/PIG-5416 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Priority: Minor > Attachments: pig-5416-v01.patch > > > Spark unit tests fail randomly with same errors. > Sample stack trace showing "Caused by: java.lang.RuntimeException: > Unexpected job execution status RUNNING". > {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF} > Unable to store alias B > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to > store alias B > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783) > at org.apache.pig.PigServer.registerQuery(PigServer.java:708) > at org.apache.pig.PigServer.registerQuery(PigServer.java:721) > at > org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.execute(PigServer.java:1453) > at org.apache.pig.PigServer.access$500(PigServer.java:119) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778) > Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826789#comment-17826789 ] Rohini Palaniswamy commented on PIG-5447: - +1 > Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with > NoSuchElementException > --- > > Key: PIG-5447 > URL: https://issues.apache.org/jira/browse/PIG-5447 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5447-v01.patch > > > TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer > and full-outer joins. > "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5447: -- Attachment: pig-5447-v01.patch There is no simple way to implement hasNext() for this implementation. I think iterator is not the right fit for this but I prefer not to touch the logic. Here, writing a hacked iterator which basically calls next() within hasNext and caches the result. pig-5447-v01.patch > Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with > NoSuchElementException > --- > > Key: PIG-5447 > URL: https://issues.apache.org/jira/browse/PIG-5447 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5447-v01.patch > > > TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer > and full-outer joins. > "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824136#comment-17824136 ] Koji Noguchi commented on PIG-5447: --- > However, inside {{{}next(){}}}, it sometimes recursively traverses the > delegated iterator by calling {{next()}} inside > This only happens when key is oversampled as described in PIG-4377. Maybe that's why we were not seeing the failure elsewhere. > Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with > NoSuchElementException > --- > > Key: PIG-5447 > URL: https://issues.apache.org/jira/browse/PIG-5447 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer > and full-outer joins. > "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824097#comment-17824097 ] Koji Noguchi commented on PIG-5447: --- I don't see how this ever worked. Iterator under {{SkewedJoinConverter.ToValueFunction.Tuple2TransformIterable}} is NOT following the api requirement. {{hasNext()}} simply returns by checking the delegated iterator. {quote}{{delegate.hasNext();}} {quote} However, inside {{{}next(){}}}, it sometimes recursively traverses the delegated iterator by calling {{next()}} inside. So even when {{hasNext()}} returns true, there are times when {{next()}} doesn't have an element to return and ending up with {{{}NoSuchElementException{}}}. > Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with > NoSuchElementException > --- > > Key: PIG-5447 > URL: https://issues.apache.org/jira/browse/PIG-5447 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer > and full-outer joins. > "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824093#comment-17824093 ] Koji Noguchi commented on PIG-5447: --- Full stack trace. {noformat} org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C. Backend error : Job aborted. at org.apache.pig.PigServer.openIterator(PigServer.java:1014) at org.apache.pig.test.TestSkewedJoin.testSkewedJoinOuter(TestSkewedJoin.java:386) Caused by: org.apache.spark.SparkException: Job aborted. at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:100) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:385) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1000) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:991) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:991) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:385) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:991) at org.apache.pig.backend.hadoop.executionengine.spark.converter.StoreConverter.convert(StoreConverter.java:104) at org.apache.pig.backend.hadoop.executionengine.spark.converter.StoreConverter.convert(StoreConverter.java:57) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:292) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:182) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) at org.apache.pig.PigServer.storeEx(PigServer.java:1123) at org.apache.pig.PigServer.store(PigServer.java:1086) at org.apache.pig.PigServer.openIterator(PigServer.java:999) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 94.0 failed 4 times, most recent failure: Lost task 1.3 in stage 94.0 (TID 436, gsrd238n19.red.ygrid.yahoo.com, executor 2): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:157) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.NoSuchElementException: next on empty iterator at scala.collection.Iterator$$anon$2.next(Iterator.scala:39) at scala.collection.Iterator$$anon$2.next(Iterator.scala:37) at scala.collection.Iterator$$anon$12.next(Iterator.scala:445) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.convert.Wrappers$IteratorWrapper.next
[jira] [Created] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
Koji Noguchi created PIG-5447: - Summary: Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException Key: PIG-5447 URL: https://issues.apache.org/jira/browse/PIG-5447 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Assignee: Koji Noguchi TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer and full-outer joins. "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"
[ https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815770#comment-17815770 ] Koji Noguchi commented on PIG-5416: --- Issue seems to be on the Spark side. For now, added a silly polling after "waitForJobToEnd" to double check that job is finished. > Spark unit tests failing randomly with "java.lang.RuntimeException: > Unexpected job execution status RUNNING" > > > Key: PIG-5416 > URL: https://issues.apache.org/jira/browse/PIG-5416 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Priority: Minor > Attachments: pig-5416-v01.patch > > > Spark unit tests fail randomly with same errors. > Sample stack trace showing "Caused by: java.lang.RuntimeException: > Unexpected job execution status RUNNING". > {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF} > Unable to store alias B > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to > store alias B > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783) > at org.apache.pig.PigServer.registerQuery(PigServer.java:708) > at org.apache.pig.PigServer.registerQuery(PigServer.java:721) > at > org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.execute(PigServer.java:1453) > at org.apache.pig.PigServer.access$500(PigServer.java:119) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778) > Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"
[ https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5416: -- Attachment: pig-5416-v01.patch > Spark unit tests failing randomly with "java.lang.RuntimeException: > Unexpected job execution status RUNNING" > > > Key: PIG-5416 > URL: https://issues.apache.org/jira/browse/PIG-5416 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Priority: Minor > Attachments: pig-5416-v01.patch > > > Spark unit tests fail randomly with same errors. > Sample stack trace showing "Caused by: java.lang.RuntimeException: > Unexpected job execution status RUNNING". > {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF} > Unable to store alias B > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to > store alias B > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783) > at org.apache.pig.PigServer.registerQuery(PigServer.java:708) > at org.apache.pig.PigServer.registerQuery(PigServer.java:721) > at > org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.execute(PigServer.java:1453) > at org.apache.pig.PigServer.access$500(PigServer.java:119) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778) > Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
[ https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5446: -- Attachment: pig-5446-v01.patch > Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing > --- > > Key: PIG-5446 > URL: https://issues.apache.org/jira/browse/PIG-5446 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5446-v01.patch > > > {noformat} > Unable to open iterator for alias B. Backend error : Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:0 > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, > vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, > taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Attempt failed because it appears to make no progress for > 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to > make no progress for 1ms]], Vertex did not succeed due to > OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at org.apache.pig.PigServer.openIterator(PigServer.java:1014) > at > org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58) > Caused by: org.apache.tez.dag.api.TezException: Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 45.647 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
[ https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5446: -- It seems like "reporter.progress()" is a no-op in Tez pig. Test started failing after upgrading the dependent Tez version. PIG-4700 only enabled progress reporting after Tez 0.8.5 and after. Attaching a patch that simply calls "context.notifyProgress" for every "reporter.progress()" calls. > Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing > --- > > Key: PIG-5446 > URL: https://issues.apache.org/jira/browse/PIG-5446 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > {noformat} > Unable to open iterator for alias B. Backend error : Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:0 > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, > vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, > taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Attempt failed because it appears to make no progress for > 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to > make no progress for 1ms]], Vertex did not succeed due to > OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at org.apache.pig.PigServer.openIterator(PigServer.java:1014) > at > org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58) > Caused by: org.apache.tez.dag.api.TezException: Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 45.647 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
[ https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814970#comment-17814970 ] Koji Noguchi commented on PIG-5446: --- From {code:title=testProgressReportingWithStatusMessage} cluster.setProperty(MRConfiguration.TASK_TIMEOUT, "1"); ... pig.registerQuery("A = load 'a.txt' as (f1:chararray);"); pig.registerQuery("B = foreach A generate org.apache.pig.test.utils.ReportingUDF();"); {code} {code:title=ReportingUDF()} public class ReportingUDF extends EvalFunc { @Override public Integer exec(Tuple input) throws IOException { try { Thread.sleep(7500); PigStatusReporter reporter = PigStatusReporter.getInstance(); reporter.progress(); Thread.sleep(7500); } catch (InterruptedException e) { } return 100; } } {code} So basically, even though pig is calling "reporter.progress()" after 7.5 seconds, Tez task is failing with "Attempt failed because it appears to make no progress for 1ms". > Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing > --- > > Key: PIG-5446 > URL: https://issues.apache.org/jira/browse/PIG-5446 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > {noformat} > Unable to open iterator for alias B. Backend error : Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:0 > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, > vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, > taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Attempt failed because it appears to make no progress for > 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to > make no progress for 1ms]], Vertex did not succeed due to > OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at org.apache.pig.PigServer.openIterator(PigServer.java:1014) > at > org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58) > Caused by: org.apache.tez.dag.api.TezException: Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 45.647 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
Koji Noguchi created PIG-5446: - Summary: Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing Key: PIG-5446 URL: https://issues.apache.org/jira/browse/PIG-5446 Project: Pig Issue Type: Bug Components: tez Reporter: Koji Noguchi Assignee: Koji Noguchi {noformat} Unable to open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to make no progress for 1ms]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1707216362777_0001_1_00 [scope-4] killed/failed due to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to make no progress for 1ms]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1707216362777_0001_1_00 [scope-4] killed/failed due to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 at org.apache.pig.PigServer.openIterator(PigServer.java:1014) at org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58) Caused by: org.apache.tez.dag.api.TezException: Vertex failed, vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to make no progress for 1ms]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1707216362777_0001_1_00 [scope-4] killed/failed due to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 at org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204) at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 45.647 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5445) TestTezCompiler.testMergeCogroup fails whenever config is updated
[ https://issues.apache.org/jira/browse/PIG-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5445: -- Attachment: pig-5445-v01.patch I have no understanding of how the cogroup are implemented, but checking MergeJoinIndexer.java {code:java} 70 public MergeJoinIndexer(String funcSpec, String innerPlan, String serializedPhyPlan, 71 String udfCntxtSignature, String scope, String ignoreNulls) throws ExecException{ 72 73 loader = ... 82 precedingPhyPlan = (PhysicalPlan)ObjectSerializer.deserialize(serializedPhyPlan); 83 if(precedingPhyPlan != null){ 84 if(precedingPhyPlan.getLeaves().size() != 1 || precedingPhyPlan.getRoots().size() != 1){ 85 int errCode = 2168; 86 String errMsg = "Expected physical plan with exactly one root and one leaf."; 87 throw new ExecException(errMsg,errCode,PigException.BUG); 88 } 89 this.rightPipelineLeaf = precedingPhyPlan.getLeaves().get(0); 90 this.rightPipelineRoot = precedingPhyPlan.getRoots().get(0); 91 this.rightPipelineRoot.setInputs(null); * 92 } 93 } {code} MergeJoinIndexer is always overwriting the "inputs" with null. This means "inputs" can be skipped at serialization time. Attaching the patch (pig-5445-v01.patch) which does that. Size of TEZC-MergeCogroup-1.gld was reduced by 5 with this patch since it no longer serialize PigContext and POLoad for MergeJoinIndexer. > TestTezCompiler.testMergeCogroup fails whenever config is updated > - > > Key: PIG-5445 > URL: https://issues.apache.org/jira/browse/PIG-5445 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.19.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5445-v01.patch > > > TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and > config that comes with it). > {noformat} > testMergeCogroupFailure > expected: > <|---a: > Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a > > pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...> > > but was: > <|---a: > Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a > > pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...> > at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472) > at > org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292) > {noformat} > (edited the diff above a bit to make it easier to identify where the > difference was) > Basically 3rd argument to MergeJoinIndexer differed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5445) TestTezCompiler.testMergeCogroup fails whenever config is updated
[ https://issues.apache.org/jira/browse/PIG-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814456#comment-17814456 ] Koji Noguchi commented on PIG-5445: --- {quote}Basically 3rd argument to MergeJoinIndexer differed. {quote} This is a serializedPhyPlan passed to MergeJoinIndexer constructor. {code:java} /** @param funcSpec : Loader specification. * @param innerPlan : This is serialized version of LR plan. We * want to keep only keys in our index file and not the whole tuple. So, we need LR and thus its plan * to get keys out of the sampled tuple. * @param serializedPhyPlan Serialized physical plan on right side. * @throws ExecException */ @SuppressWarnings("unchecked") public MergeJoinIndexer(String funcSpec, String innerPlan, String serializedPhyPlan, String udfCntxtSignature, String scope, String ignoreNulls) throws ExecException{ {code} When deserializing both strings and printing out the physical plans, they both showed exact same physical plan {noformat} #--- # Physical Plan: #--- a: New For Each(false,false)[bag] - scope-30 | | | Cast[int] - scope-27 | | | |---Project[bytearray][0] - scope-26 | | | Cast[int] - scope-29 | | | |---Project[bytearray][1] - scope-28 {noformat} Comparing the serialized string and checking the memory dump, it turns out that difference came from POForeach from "a: New For Each" contains an "inputs" param pointing to POLoad which holds "PigContext pc". These POLoad and PigContext were serialized as part of the MergeJoinIndexer which caused the difference in goldenfile outputs whenever anything changed in the config (which is stored in the PigContext). > TestTezCompiler.testMergeCogroup fails whenever config is updated > - > > Key: PIG-5445 > URL: https://issues.apache.org/jira/browse/PIG-5445 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.19.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > > TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and > config that comes with it). > {noformat} > testMergeCogroupFailure > expected: > <|---a: > Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a > > pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...> > > but was: > <|---a: > Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a > > pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...> > at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472) > at > org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292) > {noformat} > (edited the diff above a bit to make it easier to identify where the > difference was) > Basically 3rd argument to MergeJoinIndexer differed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5445) TestTezCompiler.testMergeCogroup fails whenever config is updated
Koji Noguchi created PIG-5445: - Summary: TestTezCompiler.testMergeCogroup fails whenever config is updated Key: PIG-5445 URL: https://issues.apache.org/jira/browse/PIG-5445 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.19.0 Reporter: Koji Noguchi Assignee: Koji Noguchi TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and config that comes with it). {noformat} testMergeCogroupFailure expected: <|---a: Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...> but was: <|---a: Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...> at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472) at org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292) {noformat} (edited the diff above a bit to make it easier to identify where the difference was) Basically 3rd argument to MergeJoinIndexer differed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5444: -- Attachment: pig-5444-v03.patch pig-5444-v02.patch worked on the failing test but it would not pick up the Split in right order if there were some additional splitees. Attaching pig-5444-v03.patch which traverses any parent Splits. > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5444-v02.patch, pig-5444-v03.patch > > > With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and > testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin > is run, these two tests on Tez are failing with > {noformat} > Unable to open iterator for alias E > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E > at org.apache.pig.PigServer.openIterator(PigServer.java:1024) > at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at org.apache.pig.PigServer.openIterator(PigServer.java:999) > Caused by: > org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR > 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> > [scope-628 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ > BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> > PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager }) already defined! > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5444: -- Attachment: pig-5444-v02.patch > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5444-v02.patch > > > With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and > testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin > is run, these two tests on Tez are failing with > {noformat} > Unable to open iterator for alias E > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E > at org.apache.pig.PigServer.openIterator(PigServer.java:1024) > at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at org.apache.pig.PigServer.openIterator(PigServer.java:999) > Caused by: > org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR > 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> > [scope-628 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ > BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> > PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager }) already defined! > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5444: -- Attachment: (was: pig-5444-v02.patch) > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and > testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin > is run, these two tests on Tez are failing with > {noformat} > Unable to open iterator for alias E > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E > at org.apache.pig.PigServer.openIterator(PigServer.java:1024) > at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at org.apache.pig.PigServer.openIterator(PigServer.java:999) > Caused by: > org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR > 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> > [scope-628 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ > BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> > PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager }) already defined! > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5444: -- Attachment: (was: pig-5444-v01.patch) > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5444-v02.patch > > > With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and > testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin > is run, these two tests on Tez are failing with > {noformat} > Unable to open iterator for alias E > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E > at org.apache.pig.PigServer.openIterator(PigServer.java:1024) > at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at org.apache.pig.PigServer.openIterator(PigServer.java:999) > Caused by: > org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR > 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> > [scope-628 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ > BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> > PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager }) already defined! > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5444: -- Attachment: pig-5444-v02.patch > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5444-v02.patch > > > With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and > testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin > is run, these two tests on Tez are failing with > {noformat} > Unable to open iterator for alias E > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E > at org.apache.pig.PigServer.openIterator(PigServer.java:1024) > at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at org.apache.pig.PigServer.openIterator(PigServer.java:999) > Caused by: > org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR > 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> > [scope-628 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ > BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> > PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager }) already defined! > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5444: -- Attachment: pig-5444-v01.patch > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5444-v01.patch > > > With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and > testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin > is run, these two tests on Tez are failing with > {noformat} > Unable to open iterator for alias E > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E > at org.apache.pig.PigServer.openIterator(PigServer.java:1024) > at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at org.apache.pig.PigServer.openIterator(PigServer.java:999) > Caused by: > org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR > 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> > [scope-628 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ > BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> > PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager }) already defined! > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5444: -- Attaching the patch which basically does two things. > [~rohini] , should I add a predecessor check to prevent this merge ? > (1) Added a check to make sure there are no overlap on predecessors among all the "tentativeSuccessors". (2) Introduce a new walker which considers the dependency order of nodes' successors. For the example from this jira, {noformat} B5(input) / \ -6 7 / / / A1(input) / \ / \ / 2 3-- \ / 4(shuffle->out) {noformat} 2 and 3 are Joins. Now, with the previous ReverseDependencyOrderWalker, we do not consider the dependency order of nodes' successors, therefore visit order of A1 and B5 are indeterministic. With the new ReverseSuccessorsDependencyOrderWalker, A1 will always get visited before B5. > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5444-v01.patch > > > With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and > testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin > is run, these two tests on Tez are failing with > {noformat} > Unable to open iterator for alias E > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E > at org.apache.pig.PigServer.openIterator(PigServer.java:1024) > at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at org.apache.pig.PigServer.openIterator(PigServer.java:999) > Caused by: > org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR > 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> > [scope-628 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ > BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> > PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager }) already defined! > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786019#comment-17786019 ] Koji Noguchi commented on PIG-5444: --- This is how it looks when testFRJoinOut8 is run by itself when MultiQueryOptimizerTez happen to work on A before B. Note POValueOutputTez on B is there to prevent the overlapping edges. {code:java} Tez vertex scope-48 # Plan on vertex B: Split - scope-61 | | | Local Rearrange[tuple]{int}(false) - scope-27 -> scope-44 | | | | | Project[int][0] - scope-23 | | | POValueOutputTez - scope-49 -> [scope-54] | |---B: New For Each(false,false)[bag] - scope-18 | | | Cast[int] - scope-13 | | | |---Project[bytearray][0] - scope-12 | | | Cast[int] - scope-16 | | | |---Project[bytearray][1] - scope-15 | |---B: Load(hdfs://localhost:38814/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage) - scope-11 Tez vertex scope-54 # Plan on vertex Local Rearrange[tuple]{int}(false) - scope-39 -> scope-44 | | | Project[int][1] - scope-35 | |---POValueInputTez - scope-55 <- scope-48 Tez vertex scope-44 # Plan on vertex A: Split - scope-60 | | | E: Store(hdfs://localhost:38814/tmp/temp-1966813510/tmp-652837441:org.apache.pig.impl.io.InterStorage) - scope-62 -> scope-43 | | | |---D: FRJoin[tuple] - scope-36 <- scope-54 | | | | | Project[int][1] - scope-34 | | | | | Project[int][1] - scope-35 | | | E: Store(hdfs://localhost:38814/tmp/temp-1966813510/tmp-652837441:org.apache.pig.impl.io.InterStorage) - scope-63 -> scope-43 | | | |---C: FRJoin[tuple] - scope-24 <- scope-48 | | | | | Project[int][0] - scope-22 | | | | | Project[int][0] - scope-23 | |---A: New For Each(false,false)[bag] - scope-7 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[int] - scope-5 | | | |---Project[bytearray][1] - scope-4 | |---A: Load(hdfs://localhost:38814/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage) - scope-0 {code} > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and > testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin > is run, these two tests on Tez are failing with > {noformat} > Unable to open iterator for alias E > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E > at org.apache.pig.PigServer.openIterator(PigServer.java:1024) > at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at org.apache.pig.PigServer.openIterator(PigServer.java:999) > Caused by: > org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR > 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> > [scope-628 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ > BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> > PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager }) already defined! > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) > at > org.a
[jira] [Commented] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785674#comment-17785674 ] Koji Noguchi commented on PIG-5444: --- [~rohini] , should I add a predecessor check to prevent this merge ? > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and > testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin > is run, these two tests on Tez are failing with > {noformat} > Unable to open iterator for alias E > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E > at org.apache.pig.PigServer.openIterator(PigServer.java:1024) > at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) > Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at org.apache.pig.PigServer.openIterator(PigServer.java:999) > Caused by: > org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR > 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> > [scope-628 : > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ > BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> > PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> > NullEdgeManager }) already defined! > at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785673#comment-17785673 ] Koji Noguchi edited comment on PIG-5444 at 11/13/23 9:44 PM: - Issue seems to be happening inside MultiQueryOptimizerTez.java. Before the change {noformat} Tez vertex scope-132 # Plan on vertex POValueOutputTez - scope-133-> [scope-134, scope-140] | |---A: New For Each(false,false)[bag] - scope-95 | | | Cast[int] - scope-90 | | | |---Project[bytearray][0] - scope-89 | | | Cast[int] - scope-93 | | | |---Project[bytearray][1] - scope-92 | |---A: Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage) - scope-88 Tez vertex scope-136 # Plan on vertex B: Split - scope-148 | | | Local Rearrange[tuple]{int}(false) - scope-115 -> scope-134 | | | | | Project[int][0] - scope-111 | | | Local Rearrange[tuple]{int}(false) - scope-127 -> scope-140 | | | | | Project[int][1] - scope-123 | |---B: New For Each(false,false)[bag] - scope-106 | | | Cast[int] - scope-101 | | | |---Project[bytearray][0] - scope-100 | | | Cast[int] - scope-104 | | | |---Project[bytearray][1] - scope-103 | |---B: Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage) - scope-99 Tez vertex scope-134 # Plan on vertex POValueOutputTez - scope-146-> [scope-144] | |---C: FRJoin[tuple] - scope-112<- scope-136 | | | Project[int][0] - scope-110 | | | Project[int][0] - scope-111 | |---POValueInputTez - scope-135 <- scope-132 Tez vertex scope-140 # Plan on vertex POValueOutputTez - scope-147-> [scope-144] | |---D: FRJoin[tuple] - scope-124<- scope-136 | | | Project[int][1] - scope-122 | | | Project[int][1] - scope-123 | |---POValueInputTez - scope-141 <- scope-132 Tez vertex scope-144 # Plan on vertex E: Store(hdfs://localhost:39746/tmp/temp906575730/tmp1776475591:org.apache.pig.impl.io.InterStorage) - scope-131 | |---POShuffledValueInputTez - scope-145 <- [scope-134, scope-140] {noformat} After MultiQueryOptimizerTez::visitTezOp {code} // If all other conditions were satisfied, but it had a successor union // with unsupported storefunc keep it in the tentative list. {code} and decides to merge scope-134 and scope-140, {noformat} Tez vertex scope-136 # Plan on vertex B: Split - scope-148 | | | Local Rearrange[tuple]{int}(false) - scope-115 -> scope-132 | | | | | Project[int][0] - scope-111 | | | Local Rearrange[tuple]{int}(false) - scope-127 -> scope-132 | | | | | Project[int][1] - scope-123 | |---B: New For Each(false,false)[bag] - scope-106 | | | Cast[int] - scope-101 | | | |---Project[bytearray][0] - scope-100 | | | Cast[int] - scope-104 | | | |---Project[bytearray][1] - scope-103 | |---B: Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage) - scope-99 Tez vertex scope-132 # Plan on vertex POValueOutputTez - scope-133-> [] | |---A: New For Each(false,false)[bag] - scope-95 | | | Cast[int] - scope-90 | | | |---Project[bytearray][0] - scope-89 | | | Cast[int] - scope-93 | | | |---Project[bytearray][1] - scope-92 | |---A: Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage) - scope-88 Tez vertex scope-144 # Plan on vertex E: Store(hdfs://localhost:39746/tmp/temp906575730/tmp1776475591:org.apache.pig.impl.io.InterStorage) - scope-131 | |---POShuffledValueInputTez - scope-145 <- [scope-132] {noformat} This later fails with {panel} Caused by: java.lang.IllegalArgumentException: Edge [scope-136 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> [scope-132 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] (\{ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined! {panel} was (Author: knoguchi): Issue seems to be happening inside MultiQueryOptimizerTez.java. Before the change {noformat} Tez vertex scope-132 # Plan on vertex POValueOutputTez - scope-133-> [scope-134, scope-140] | |---A: New For Each(false,false)[bag] - scope-95 | | | Cast[int] - scope-90 | | | |---Project[bytearray][0] - scope-89 | | | Cast[int] - scope-93 | | | |---Project[bytearray][1] -
[jira] [Commented] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
[ https://issues.apache.org/jira/browse/PIG-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785673#comment-17785673 ] Koji Noguchi commented on PIG-5444: --- Issue seems to be happening inside MultiQueryOptimizerTez.java. Before the change {noformat} Tez vertex scope-132 # Plan on vertex POValueOutputTez - scope-133-> [scope-134, scope-140] | |---A: New For Each(false,false)[bag] - scope-95 | | | Cast[int] - scope-90 | | | |---Project[bytearray][0] - scope-89 | | | Cast[int] - scope-93 | | | |---Project[bytearray][1] - scope-92 | |---A: Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage) - scope-88 Tez vertex scope-136 # Plan on vertex B: Split - scope-148 | | | Local Rearrange[tuple]{int}(false) - scope-115 -> scope-134 | | | | | Project[int][0] - scope-111 | | | Local Rearrange[tuple]{int}(false) - scope-127 -> scope-140 | | | | | Project[int][1] - scope-123 | |---B: New For Each(false,false)[bag] - scope-106 | | | Cast[int] - scope-101 | | | |---Project[bytearray][0] - scope-100 | | | Cast[int] - scope-104 | | | |---Project[bytearray][1] - scope-103 | |---B: Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage) - scope-99 Tez vertex scope-134 # Plan on vertex POValueOutputTez - scope-146-> [scope-144] | |---C: FRJoin[tuple] - scope-112<- scope-136 | | | Project[int][0] - scope-110 | | | Project[int][0] - scope-111 | |---POValueInputTez - scope-135 <- scope-132 Tez vertex scope-140 # Plan on vertex POValueOutputTez - scope-147-> [scope-144] | |---D: FRJoin[tuple] - scope-124<- scope-136 | | | Project[int][1] - scope-122 | | | Project[int][1] - scope-123 | |---POValueInputTez - scope-141 <- scope-132 Tez vertex scope-144 # Plan on vertex E: Store(hdfs://localhost:39746/tmp/temp906575730/tmp1776475591:org.apache.pig.impl.io.InterStorage) - scope-131 | |---POShuffledValueInputTez - scope-145 <- [scope-134, scope-140] {noformat} After MultiQueryOptimizerTez::visitTezOp {code} // If all other conditions were satisfied, but it had a successor union // with unsupported storefunc keep it in the tentative list. {code} and decides to merge scope-134 and scope-140, {noformat} Tez vertex scope-136 # Plan on vertex B: Split - scope-148 | | | Local Rearrange[tuple]{int}(false) - scope-115 -> scope-132 | | | | | Project[int][0] - scope-111 | | | Local Rearrange[tuple]{int}(false) - scope-127 -> scope-132 | | | | | Project[int][1] - scope-123 | |---B: New For Each(false,false)[bag] - scope-106 | | | Cast[int] - scope-101 | | | |---Project[bytearray][0] - scope-100 | | | Cast[int] - scope-104 | | | |---Project[bytearray][1] - scope-103 | |---B: Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput2.txt:org.apache.pig.builtin.PigStorage) - scope-99 Tez vertex scope-132 # Plan on vertex POValueOutputTez - scope-133-> [] | |---A: New For Each(false,false)[bag] - scope-95 | | | Cast[int] - scope-90 | | | |---Project[bytearray][0] - scope-89 | | | Cast[int] - scope-93 | | | |---Project[bytearray][1] - scope-92 | |---A: Load(hdfs://localhost:39746/user/gtrain/testFrJoinInput.txt:org.apache.pig.builtin.PigStorage) - scope-88 Tez vertex scope-144 # Plan on vertex E: Store(hdfs://localhost:39746/tmp/temp906575730/tmp1776475591:org.apache.pig.impl.io.InterStorage) - scope-131 | |---POShuffledValueInputTez - scope-145 <- [scope-132] {noformat} This later fails with {noformat} Caused by: java.lang.IllegalArgumentException: Edge [scope-136 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> [scope-132 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined! {noformat} > TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already > defined error on Tez > --- > > Key: PIG-5444 > URL: https://issues.apache.org/jira/browse/PIG-5444 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > With T
[jira] [Created] (PIG-5444) TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez
Koji Noguchi created PIG-5444: - Summary: TestFRJoin.testFRJoinOut7 and testFRJoinOut8 failing with Edge already defined error on Tez Key: PIG-5444 URL: https://issues.apache.org/jira/browse/PIG-5444 Project: Pig Issue Type: Bug Components: tez Reporter: Koji Noguchi Assignee: Koji Noguchi With Tez, when testing individual tests (TestFRJoin.testFRJoinOut7 and testFRJoinOut8) separately, they pass the tests. But when entire TestFRJoin is run, these two tests on Tez are failing with {noformat} Unable to open iterator for alias E org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias E at org.apache.pig.PigServer.openIterator(PigServer.java:1024) at org.apache.pig.test.TestFRJoin.testFRJoinOut7(TestFRJoin.java:409) Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias E at org.apache.pig.PigServer.storeEx(PigServer.java:1127) at org.apache.pig.PigServer.store(PigServer.java:1086) at org.apache.pig.PigServer.openIterator(PigServer.java:999) Caused by: org.apache.pig.backend.hadoop.executionengine.JobCreationException: ERROR 2017: Internal error creating job configuration. at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:153) at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.compile(TezJobCompiler.java:81) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:200) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) at org.apache.pig.PigServer.storeEx(PigServer.java:1123) Caused by: java.lang.IllegalArgumentException: Edge [scope-632 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] -> [scope-628 : org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor] ({ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined! at org.apache.tez.dag.api.DAG.addEdge(DAG.java:296) at org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder.visitTezOp(TezDagBuilder.java:410) at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:265) at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:56) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.buildDAG(TezJobCompiler.java:69) at org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler.getJob(TezJobCompiler.java:120) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5437) Add lib and idea folder to .gitignore
[ https://issues.apache.org/jira/browse/PIG-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5437: Fix Version/s: 0.18.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) +1. Committed to trunk and branch-0.18. Thanks for the contribution [~maswin] > Add lib and idea folder to .gitignore > - > > Key: PIG-5437 > URL: https://issues.apache.org/jira/browse/PIG-5437 > Project: Pig > Issue Type: Improvement >Reporter: Alagappan Maruthappan >Assignee: Alagappan Maruthappan >Priority: Minor > Fix For: 0.18.0 > > Attachments: PIG-5437-0.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5420) Update accumulo dependency to 1.10.1
[ https://issues.apache.org/jira/browse/PIG-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5420: Fix Version/s: 0.18.1 > Update accumulo dependency to 1.10.1 > > > Key: PIG-5420 > URL: https://issues.apache.org/jira/browse/PIG-5420 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.18.1 > > Attachments: pig-5420-v01.patch > > > Following owasp/cve report. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5419) Upgrade Joda time version
[ https://issues.apache.org/jira/browse/PIG-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5419: Fix Version/s: 0.18.1 (was: 0.18.0) Can you update to 2.12.5 ? > Upgrade Joda time version > - > > Key: PIG-5419 > URL: https://issues.apache.org/jira/browse/PIG-5419 > Project: Pig > Issue Type: Improvement >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Minor > Fix For: 0.18.1 > > Attachments: PIG-5419.patch > > > Pig depends on an older version of Joda time, which can result in conflicts > with other versions in some workflows. Upgrading it to the latest > version(2.10.13) will resolve Pig's side of such issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5440) Extra jars needed for hive3
[ https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy resolved PIG-5440. - Fix Version/s: 0.18.0 Hadoop Flags: Reviewed Resolution: Fixed Committed to trunk and branch-0.18. Thanks [~knoguchi] > Extra jars needed for hive3 > --- > > Key: PIG-5440 > URL: https://issues.apache.org/jira/browse/PIG-5440 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.18.0 > > Attachments: pig-5440-v01.patch, pig-5440-v02.patch > > > When testing Hive3, e2e tests were failing with > {{Caused by: java.lang.NoClassDefFoundError: > org/apache/hadoop/hive/llap/security/LlapSigner$Signable}} etc. > Updating dependent classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2
[ https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5438: Fix Version/s: 0.19.0 > Update SparkCounter.Accumulator to AccumulatorV2 > > > Key: PIG-5438 > URL: https://issues.apache.org/jira/browse/PIG-5438 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5438-v01.patch > > > Original Accumulator is deprecated in Spark2 and gone in Spark3. > AccumulatorV2 is usable on both Spark2 and Spark3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5439) Support Spark 3 and drop SparkShim
[ https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5439: Fix Version/s: 0.19.0 > Support Spark 3 and drop SparkShim > -- > > Key: PIG-5439 > URL: https://issues.apache.org/jira/browse/PIG-5439 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5439-v01.patch > > > Support Pig-on-Spark to run on spark3. > Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. > This is due to log4j mismatch. > After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher. > So far, not all unit/e2e tests pass with the proposed patch but at least > compilation goes through. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5414) Build failure on Linux ARM64 due to old Apache Avro
[ https://issues.apache.org/jira/browse/PIG-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5414: Fix Version/s: 0.18.1 > Build failure on Linux ARM64 due to old Apache Avro > --- > > Key: PIG-5414 > URL: https://issues.apache.org/jira/browse/PIG-5414 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: 0.18.0 >Reporter: Martin Tzvetanov Grigorov >Assignee: Martin Tzvetanov Grigorov >Priority: Major > Fix For: 0.18.1 > > Attachments: 35.patch, > TEST-org.apache.pig.builtin.TestAvroStorage.txt, > TEST-org.apache.pig.builtin.TestOrcStorage.txt, > TEST-org.apache.pig.builtin.TestOrcStoragePushdown.txt > > > Trying to build Apache Pig on Ubuntu 20.04.3 ARM64 fails because of old > version of Snappy and Avro libraries: > > {code:java} > Testsuite: org.apache.pig.builtin.TestAvroStorage > Tests run: 0, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.1 sec > - Standard Output --- > 2021-10-12 14:43:35,483 [main] INFO > org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) > of size 1431830528 to monitor. collectionUsageThreshold = 1064828928, > usageThreshold = 1064828928 > 2021-10-12 14:43:35,489 [main] INFO org.apache.pig.ExecTypeProvider - > Trying ExecType : LOCAL > 2021-10-12 14:43:35,489 [main] INFO org.apache.pig.ExecTypeProvider - > Picked LOCAL as the ExecType > 2021-10-12 14:43:35,515 [main] WARN org.apache.hadoop.conf.Configuration - > DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml > is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml > to override properties of core-default.xml, mapred-default.xml and > hdfs-default.xml respectively > 2021-10-12 14:43:35,755 [main] INFO > org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is > deprecated. Instead, use mapreduce.jobtracker.address > 2021-10-12 14:43:35,899 [main] WARN org.apache.hadoop.util.NativeCodeLoader > - Unable to load native-hadoop library for your platform... using > builtin-java classes where applicable > 2021-10-12 14:43:35,916 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: file:/// > 2021-10-12 14:43:36,116 [main] INFO > org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is > deprecated. Instead, use dfs.bytes-per-checksum > 2021-10-12 14:43:36,137 [main] INFO org.apache.pig.PigServer - Pig Script > ID for the session: PIG-default-01426621-bc19-499f-981e-b13959fe0d84 > 2021-10-12 14:43:36,137 [main] WARN org.apache.pig.PigServer - ATS is > disabled since yarn.timeline-service.enabled set to false > 2021-10-12 14:43:36,150 [main] INFO org.apache.pig.builtin.TestAvroStorage > - creating > test/org/apache/pig/builtin/avro/data/avro/uncompressed/arraysAsOutputByPig.avro > 2021-10-12 14:43:36,502 [main] INFO org.apache.pig.builtin.TestAvroStorage > - Could not generate avro file: > test/org/apache/pig/builtin/avro/data/avro/uncompressed/arraysAsOutputByPig.avro > java.net.ConnectException: Call From martin/127.0.0.1 to localhost:40073 > failed on connection exception: java.net.ConnectException: Connection > refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) > at org.apache.hadoop.ipc.Client.call(Client.java:1479) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5418) Utils.parseSchema(String), parseConstant(String) leak memory
[ https://issues.apache.org/jira/browse/PIG-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5418: Fix Version/s: 0.18.1 > Utils.parseSchema(String), parseConstant(String) leak memory > > > Key: PIG-5418 > URL: https://issues.apache.org/jira/browse/PIG-5418 > Project: Pig > Issue Type: Improvement >Reporter: Jacob Tolar >Assignee: Jacob Tolar >Priority: Minor > Fix For: 0.18.1 > > Attachments: PIG-5418.patch > > > A minor issue: I noticed that Utils.parseSchema() and parseConstant() leak > memory. I noticed this while running a unit test for a UDF several thousand > times and checking the heap. > Links are to latest commit as of creating this ticket: > https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/impl/util/Utils.java#L244-L256 > {{new PigContext()}} [creates a MapReduce > ExecutionEngine|https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/impl/PigContext.java#L269]. > > This creates a > [MapReduceLauncher|https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRExecutionEngine.java#L34]. > > This registers a [Hadoop shutdown > hook|https://github.com/apache/pig/blob/59ec4a326079c9f937a052194405415b1e3a2b06/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L104-L105] > which doesn't go away until the JVM dies. See: > https://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/api/org/apache/hadoop/util/ShutdownHookManager.html > . > I will attach a proposed patch. From my reading of the code and running > tests, the existing schema parse APIs do not actually use anything from this > dummy PigContext, and with a minor tweak it can be passed in as NULL, > avoiding the creation of these extra resources. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5443) Add testcase for skew join for tez grace shuffle vertex manager
[ https://issues.apache.org/jira/browse/PIG-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5443: Description: Need to add test case for fix in https://issues.apache.org/jira/browse/PIG-5441. Can just modify one of the existing skewed join unit or e2e test cases by increasing mappers (split size) or adding PARALLEL 2 for right side data. Also check if one-one edges are affected by this part of the code. (was: Need to add test case for fix in https://issues.apache.org/jira/browse/PIG-5441. Can just modify one of the existing skewed join unit or e2e test cases by increasing mappers (split size) or adding PARALLEL 2 for right side data. ) > Add testcase for skew join for tez grace shuffle vertex manager > --- > > Key: PIG-5443 > URL: https://issues.apache.org/jira/browse/PIG-5443 > Project: Pig > Issue Type: Task >Reporter: Rohini Palaniswamy >Priority: Minor > > Need to add test case for fix in > https://issues.apache.org/jira/browse/PIG-5441. Can just modify one of the > existing skewed join unit or e2e test cases by increasing mappers (split > size) or adding PARALLEL 2 for right side data. Also check if one-one edges > are affected by this part of the code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5443) Add testcase for skew join for tez grace shuffle vertex manager
Rohini Palaniswamy created PIG-5443: --- Summary: Add testcase for skew join for tez grace shuffle vertex manager Key: PIG-5443 URL: https://issues.apache.org/jira/browse/PIG-5443 Project: Pig Issue Type: Task Reporter: Rohini Palaniswamy Need to add test case for fix in https://issues.apache.org/jira/browse/PIG-5441. Can just modify one of the existing skewed join unit or e2e test cases by increasing mappers (split size) or adding PARALLEL 2 for right side data. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf
[ https://issues.apache.org/jira/browse/PIG-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy resolved PIG-5442. - Fix Version/s: 0.18.0 Hadoop Flags: Reviewed Resolution: Fixed +1. Committed to branch-0.18. and trunk. Thanks for the contribution [~maswin] > Add only credentials from setStoreLocation to the Job Conf > -- > > Key: PIG-5442 > URL: https://issues.apache.org/jira/browse/PIG-5442 > Project: Pig > Issue Type: Bug >Reporter: Alagappan Maruthappan >Assignee: Alagappan Maruthappan >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5442-1.patch > > > While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on > all Stores with the same Job object - > [https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081] > Setting populated by one store is affecting the other stores. In my case the > "mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore > that is used by the Iceberg table and the other stores which inserts data to > a non-iceberg tables also use that setting and trying to use > HiveIcebergCommitter. > > On checking with [~rohini] , it is called to get the credentials from all > stores since addCredentials API was added later and not all stores have > implemented it and some still set configuration in setLocation method (i.e, > HCatStorer). > > Fixed it by passing a separate copy of Job object to each store's setLocation > method and adding only the credential object from the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf
[ https://issues.apache.org/jira/browse/PIG-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5442: Attachment: PIG-5442-1.patch > Add only credentials from setStoreLocation to the Job Conf > -- > > Key: PIG-5442 > URL: https://issues.apache.org/jira/browse/PIG-5442 > Project: Pig > Issue Type: Bug >Reporter: Alagappan Maruthappan >Assignee: Alagappan Maruthappan >Priority: Major > Attachments: PIG-5442-1.patch > > > While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on > all Stores with the same Job object - > [https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081] > Setting populated by one store is affecting the other stores. In my case the > "mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore > that is used by the Iceberg table and the other stores which inserts data to > a non-iceberg tables also use that setting and trying to use > HiveIcebergCommitter. > > On checking with [~rohini] , it is called to get the credentials from all > stores since addCredentials API was added later and not all stores have > implemented it and some still set configuration in setLocation method (i.e, > HCatStorer). > > Fixed it by passing a separate copy of Job object to each store's setLocation > method and adding only the credential object from the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data
[ https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5441: Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to branch-0.18 and trunk. Thanks [~yigress] for the contribution. > Pig skew join tez grace reducer fails to find shuffle data > -- > > Key: PIG-5441 > URL: https://issues.apache.org/jira/browse/PIG-5441 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.17.0 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5441.patch > > > User pig tez skew join encountered issue of not finding shuffle data from the > sampler aggregate vertex. The right side join has >1 reducers. > For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid > spill to disk for the sampler aggregation vertex. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf
[ https://issues.apache.org/jira/browse/PIG-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alagappan Maruthappan updated PIG-5442: --- External issue URL: https://github.com/apache/pig/pull/40 > Add only credentials from setStoreLocation to the Job Conf > -- > > Key: PIG-5442 > URL: https://issues.apache.org/jira/browse/PIG-5442 > Project: Pig > Issue Type: Bug >Reporter: Alagappan Maruthappan >Assignee: Alagappan Maruthappan >Priority: Major > > While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on > all Stores with the same Job object - > [https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081] > Setting populated by one store is affecting the other stores. In my case the > "mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore > that is used by the Iceberg table and the other stores which inserts data to > a non-iceberg tables also use that setting and trying to use > HiveIcebergCommitter. > > On checking with [~rohini] , it is called to get the credentials from all > stores since addCredentials API was added later and not all stores have > implemented it and some still set configuration in setLocation method (i.e, > HCatStorer). > > Fixed it by passing a separate copy of Job object to each store's setLocation > method and adding only the credential object from the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf
[ https://issues.apache.org/jira/browse/PIG-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alagappan Maruthappan reassigned PIG-5442: -- Assignee: Alagappan Maruthappan > Add only credentials from setStoreLocation to the Job Conf > -- > > Key: PIG-5442 > URL: https://issues.apache.org/jira/browse/PIG-5442 > Project: Pig > Issue Type: Bug >Reporter: Alagappan Maruthappan >Assignee: Alagappan Maruthappan >Priority: Major > > While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on > all Stores with the same Job object - > [https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081] > Setting populated by one store is affecting the other stores. In my case the > "mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore > that is used by the Iceberg table and the other stores which inserts data to > a non-iceberg tables also use that setting and trying to use > HiveIcebergCommitter. > > On checking with [~rohini] , it is called to get the credentials from all > stores since addCredentials API was added later and not all stores have > implemented it and some still set configuration in setLocation method (i.e, > HCatStorer). > > Fixed it by passing a separate copy of Job object to each store's setLocation > method and adding only the credential object from the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5442) Add only credentials from setStoreLocation to the Job Conf
Alagappan Maruthappan created PIG-5442: -- Summary: Add only credentials from setStoreLocation to the Job Conf Key: PIG-5442 URL: https://issues.apache.org/jira/browse/PIG-5442 Project: Pig Issue Type: Bug Reporter: Alagappan Maruthappan While testing HCatStorer with Iceberg realized Pig calls setStoreLocation on all Stores with the same Job object - [https://github.com/apache/pig/blob/b050a33c66fc22d648370b5c6bda04e0e51d3aa3/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java#L1081] Setting populated by one store is affecting the other stores. In my case the "mapred.output.committer.class" is set as HiveIcebergCommitter by PigStore that is used by the Iceberg table and the other stores which inserts data to a non-iceberg tables also use that setting and trying to use HiveIcebergCommitter. On checking with [~rohini] , it is called to get the credentials from all stores since addCredentials API was added later and not all stores have implemented it and some still set configuration in setLocation method (i.e, HCatStorer). Fixed it by passing a separate copy of Job object to each store's setLocation method and adding only the credential object from the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data
[ https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated PIG-5441: -- Attachment: PIG-5441.patch > Pig skew join tez grace reducer fails to find shuffle data > -- > > Key: PIG-5441 > URL: https://issues.apache.org/jira/browse/PIG-5441 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.17.0 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5441.patch > > > User pig tez skew join encountered issue of not finding shuffle data from the > sampler aggregate vertex. The right side join has >1 reducers. > For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid > spill to disk for the sampler aggregation vertex. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data
[ https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726009#comment-17726009 ] Yi Zhang commented on PIG-5441: --- [~knoguchi] can you add unit test as separate jira? I am not actively working on Pig itself and don't have bandwidth right now. Thank you! > Pig skew join tez grace reducer fails to find shuffle data > -- > > Key: PIG-5441 > URL: https://issues.apache.org/jira/browse/PIG-5441 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.17.0 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Fix For: 0.18.0 > > > User pig tez skew join encountered issue of not finding shuffle data from the > sampler aggregate vertex. The right side join has >1 reducers. > For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid > spill to disk for the sampler aggregation vertex. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data
[ https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725786#comment-17725786 ] Koji Noguchi commented on PIG-5441: --- It would be nice if you can add a unit test. (However, if you don't have bandwidth I understand. I can try to add the test later as a separate jira.) > Pig skew join tez grace reducer fails to find shuffle data > -- > > Key: PIG-5441 > URL: https://issues.apache.org/jira/browse/PIG-5441 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.17.0 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Fix For: 0.18.0 > > > User pig tez skew join encountered issue of not finding shuffle data from the > sampler aggregate vertex. The right side join has >1 reducers. > For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid > spill to disk for the sampler aggregation vertex. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data
[ https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725781#comment-17725781 ] Rohini Palaniswamy commented on PIG-5441: - +1. Can you just attach the patch to jira ? > Pig skew join tez grace reducer fails to find shuffle data > -- > > Key: PIG-5441 > URL: https://issues.apache.org/jira/browse/PIG-5441 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.17.0 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Fix For: 0.18.0 > > > User pig tez skew join encountered issue of not finding shuffle data from the > sampler aggregate vertex. The right side join has >1 reducers. > For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid > spill to disk for the sampler aggregation vertex. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data
[ https://issues.apache.org/jira/browse/PIG-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5441: Fix Version/s: 0.18.0 Assignee: Yi Zhang Status: Patch Available (was: Open) > Pig skew join tez grace reducer fails to find shuffle data > -- > > Key: PIG-5441 > URL: https://issues.apache.org/jira/browse/PIG-5441 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.17.0 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Fix For: 0.18.0 > > > User pig tez skew join encountered issue of not finding shuffle data from the > sampler aggregate vertex. The right side join has >1 reducers. > For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid > spill to disk for the sampler aggregation vertex. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5440) Extra jars needed for hive3
[ https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725780#comment-17725780 ] Rohini Palaniswamy commented on PIG-5440: - +1 > Extra jars needed for hive3 > --- > > Key: PIG-5440 > URL: https://issues.apache.org/jira/browse/PIG-5440 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5440-v01.patch, pig-5440-v02.patch > > > When testing Hive3, e2e tests were failing with > {{Caused by: java.lang.NoClassDefFoundError: > org/apache/hadoop/hive/llap/security/LlapSigner$Signable}} etc. > Updating dependent classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5441) Pig skew join tez grace reducer fails to find shuffle data
Yi Zhang created PIG-5441: - Summary: Pig skew join tez grace reducer fails to find shuffle data Key: PIG-5441 URL: https://issues.apache.org/jira/browse/PIG-5441 Project: Pig Issue Type: Bug Components: tez Affects Versions: 0.17.0 Reporter: Yi Zhang User pig tez skew join encountered issue of not finding shuffle data from the sampler aggregate vertex. The right side join has >1 reducers. For workaround adjust tez.runtime.transfer.data-via-events.max-size to avoid spill to disk for the sampler aggregation vertex. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5440) Extra jars needed for hive3
[ https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722278#comment-17722278 ] Koji Noguchi commented on PIG-5440: --- bq. Can you add space between "orc-shims","aircompressor" before commit ? Attached pig-5440-v02.patch. > Extra jars needed for hive3 > --- > > Key: PIG-5440 > URL: https://issues.apache.org/jira/browse/PIG-5440 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5440-v01.patch, pig-5440-v02.patch > > > When testing Hive3, e2e tests were failing with > {{Caused by: java.lang.NoClassDefFoundError: > org/apache/hadoop/hive/llap/security/LlapSigner$Signable}} etc. > Updating dependent classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5440) Extra jars needed for hive3
[ https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5440: -- Attachment: pig-5440-v02.patch > Extra jars needed for hive3 > --- > > Key: PIG-5440 > URL: https://issues.apache.org/jira/browse/PIG-5440 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5440-v01.patch, pig-5440-v02.patch > > > When testing Hive3, e2e tests were failing with > {{Caused by: java.lang.NoClassDefFoundError: > org/apache/hadoop/hive/llap/security/LlapSigner$Signable}} etc. > Updating dependent classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5440) Extra jars needed for hive3
[ https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722276#comment-17722276 ] Rohini Palaniswamy commented on PIG-5440: - +1. Can you add space between "orc-shims","aircompressor" before commit ? > Extra jars needed for hive3 > --- > > Key: PIG-5440 > URL: https://issues.apache.org/jira/browse/PIG-5440 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5440-v01.patch > > > When testing Hive3, e2e tests were failing with > {{Caused by: java.lang.NoClassDefFoundError: > org/apache/hadoop/hive/llap/security/LlapSigner$Signable}} etc. > Updating dependent classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5440) Extra jars needed for hive3
[ https://issues.apache.org/jira/browse/PIG-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5440: -- Attachment: pig-5440-v01.patch > Extra jars needed for hive3 > --- > > Key: PIG-5440 > URL: https://issues.apache.org/jira/browse/PIG-5440 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5440-v01.patch > > > When testing Hive3, e2e tests were failing with > {{Caused by: java.lang.NoClassDefFoundError: > org/apache/hadoop/hive/llap/security/LlapSigner$Signable}} etc. > Updating dependent classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5440) Extra jars needed for hive3
Koji Noguchi created PIG-5440: - Summary: Extra jars needed for hive3 Key: PIG-5440 URL: https://issues.apache.org/jira/browse/PIG-5440 Project: Pig Issue Type: Improvement Reporter: Koji Noguchi Assignee: Koji Noguchi When testing Hive3, e2e tests were failing with {{Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/llap/security/LlapSigner$Signable}} etc. Updating dependent classes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (PIG-5437) Add lib and idea folder to .gitignore
[ https://issues.apache.org/jira/browse/PIG-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5437: - Assignee: Alagappan Maruthappan > Add lib and idea folder to .gitignore > - > > Key: PIG-5437 > URL: https://issues.apache.org/jira/browse/PIG-5437 > Project: Pig > Issue Type: Improvement >Reporter: Alagappan Maruthappan >Assignee: Alagappan Maruthappan >Priority: Minor > Attachments: PIG-5437-0.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5439) Support Spark 3 and drop SparkShim
[ https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5439: -- Attachment: pig-5439-v01.patch > Support Spark 3 and drop SparkShim > -- > > Key: PIG-5439 > URL: https://issues.apache.org/jira/browse/PIG-5439 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5439-v01.patch > > > Support Pig-on-Spark to run on spark3. > Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. > This is due to log4j mismatch. > After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher. > So far, not all unit/e2e tests pass with the proposed patch but at least > compilation goes through. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5439) Support Spark 3 and drop SparkShim
Koji Noguchi created PIG-5439: - Summary: Support Spark 3 and drop SparkShim Key: PIG-5439 URL: https://issues.apache.org/jira/browse/PIG-5439 Project: Pig Issue Type: Improvement Components: spark Reporter: Koji Noguchi Assignee: Koji Noguchi Support Pig-on-Spark to run on spark3. Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. This is due to log4j mismatch. After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher. So far, not all unit/e2e tests pass with the proposed patch but at least compilation goes through. -- This message was sent by Atlassian Jira (v8.20.10#820010)