[jira] [Assigned] (PIG-5464) Move off from jackson-mapper-asl and jackson-core-asl

2024-09-19 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5464:
-

Attachment: pig-5464-jackson_avro.patch
  Assignee: Koji Noguchi

This patch is not to me committed.  Only works for hadoop3 version.  If we were 
to commit, we probably need a shim approach. 

> Move off from jackson-mapper-asl and jackson-core-asl
> -
>
> Key: PIG-5464
> URL: https://issues.apache.org/jira/browse/PIG-5464
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5464-jackson_avro.patch
>
>
> Similar to HADOOP-15983 and SPARK-30466, we need to move off from  
> jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13. 
> However, this is only possible for Hadoop3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (PIG-5464) Move off from jackson-mapper-asl and jackson-core-asl

2024-09-19 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883093#comment-17883093
 ] 

Koji Noguchi edited comment on PIG-5464 at 9/19/24 6:45 PM:


This patch is not to be committed.  Only works for hadoop3 version.  If we were 
to commit, we probably need a shim approach. 


was (Author: knoguchi):
This patch is not to me committed.  Only works for hadoop3 version.  If we were 
to commit, we probably need a shim approach. 

> Move off from jackson-mapper-asl and jackson-core-asl
> -
>
> Key: PIG-5464
> URL: https://issues.apache.org/jira/browse/PIG-5464
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5464-jackson_avro.patch
>
>
> Similar to HADOOP-15983 and SPARK-30466, we need to move off from  
> jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13. 
> However, this is only possible for Hadoop3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5464) Move off from jackson-mapper-asl and jackson-core-asl

2024-09-19 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5464:
-

 Summary: Move off from jackson-mapper-asl and jackson-core-asl
 Key: PIG-5464
 URL: https://issues.apache.org/jira/browse/PIG-5464
 Project: Pig
  Issue Type: Improvement
Reporter: Koji Noguchi


Similar to HADOOP-15983 and SPARK-30466, we need to move off from  
jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13. 

However, this is only possible for Hadoop3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)

2024-09-18 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882811#comment-17882811
 ] 

Rohini Palaniswamy commented on PIG-5459:
-

+1

> Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
> 
>
> Key: PIG-5459
> URL: https://issues.apache.org/jira/browse/PIG-5459
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5459-v01.patch
>
>
> {noformat}
> turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_
> from org.apache.hadoop.conf import *
> java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException;
> at java.lang.Class.getDeclaredFields0(Native Method)
> at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
> at java.lang.Class.privateGetPublicFields(Class.java:2614)
> at java.lang.Class.getFields(Class.java:1557)
> at org.python.core.PyJavaType.init(PyJavaType.java:419)
> at org.python.core.PyType.createType(PyType.java:1523)
> at org.python.core.PyType.addFromClass(PyType.java:1462)
> at org.python.core.PyType.fromClass(PyType.java:1551)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77)
> at 
> org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131)
> at org.python.core.Py.java2py(Py.java:2017)
> at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86)
> at 
> org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113)
> at 
> org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148)
> at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120)
> at org.python.core.imp.importAll(imp.java:1189)
> at org.python.core.imp.importAll(imp.java:1177)
> at 
> org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8)
> at 
> org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig)
> at org.python.core.PyTableCode.call(PyTableCode.java:171)
> at org.python.core.PyCode.call(PyCode.java:18)
> at org.python.core.Py.runCode(Py.java:1614)
> at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424)
> at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310)
> at org.apache.pig.Main.runEmbeddedScript(Main.java:1096)
> at org.apache.pig.Main.run(Main.java:584)
> at org.apache.pig.Main.main(Main.java:175)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:328)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:241)
> Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 37 more
> java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: 
> Lorg/junit/rules/ExpectedException;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing

2024-09-18 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882810#comment-17882810
 ] 

Rohini Palaniswamy commented on PIG-5451:
-

+1

> Pig-on-Spark3 E2E Orc_Pushdown_5 failing 
> -
>
> Key: PIG-5451
> URL: https://issues.apache.org/jira/browse/PIG-5451
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-9-5451-v01.patch
>
>
> Test failing with
> "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate 
> cannot access its superclass org.threeten.extra.chrono.AbstractDate"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5420) Update accumulo dependency to 1.10.1

2024-09-18 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882809#comment-17882809
 ] 

Rohini Palaniswamy commented on PIG-5420:
-

+1

> Update accumulo dependency to 1.10.1
> 
>
> Key: PIG-5420
> URL: https://issues.apache.org/jira/browse/PIG-5420
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.18.1
>
> Attachments: pig-5420-v01.patch, pig-9-5420-v02.patch
>
>
> Following owasp/cve report. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5420) Update accumulo dependency to 1.10.1

2024-09-18 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5420:
--
Attachment: pig-9-5420-v02.patch

> Update accumulo dependency to 1.10.1
> 
>
> Key: PIG-5420
> URL: https://issues.apache.org/jira/browse/PIG-5420
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.18.1
>
> Attachments: pig-5420-v01.patch, pig-9-5420-v02.patch
>
>
> Following owasp/cve report. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5420) Update accumulo dependency to 1.10.1

2024-09-18 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882808#comment-17882808
 ] 

Koji Noguchi commented on PIG-5420:
---

Uploaded pig-9-5420-v02.patch

> Update accumulo dependency to 1.10.1
> 
>
> Key: PIG-5420
> URL: https://issues.apache.org/jira/browse/PIG-5420
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.18.1
>
> Attachments: pig-5420-v01.patch, pig-9-5420-v02.patch
>
>
> Following owasp/cve report. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5460) Allow Tez to be launched from mapreduce job

2024-09-18 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882806#comment-17882806
 ] 

Rohini Palaniswamy commented on PIG-5460:
-

Change should just be
{code:java}
String tokenFile = System.getenv("HADOOP_TOKEN_FILE_LOCATION")
if(tokenFile != null && globalConf.get(MRConfiguration.JOB_CREDENTIALS_BINARY) 
== null) {
globalConf.set(MRConfiguration.JOB_CREDENTIALS_BINARY, tokenFile);
globalConf.set("tez.credentials.path", tokenFile);
 }
{code}

SecurityHelper.populateTokenCache will take care of reading from that. It would 
be even better if you can put the above into a 
configureCredentialFile(Configuration conf) method in SecurityHelper instead of 
TezDAGBuilder and just call it from there, so that all related code is in one 
place. 

> Allow Tez to be launched from mapreduce job
> ---
>
> Key: PIG-5460
> URL: https://issues.apache.org/jira/browse/PIG-5460
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5460-v01.patch
>
>
> It's like Oozie but not using Oozie launcher. 
> I would like to be able to submit Pig on Tez job from the mapper task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5458) Update metrics-core.version

2024-09-18 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882807#comment-17882807
 ] 

Rohini Palaniswamy commented on PIG-5458:
-

+1

> Update metrics-core.version 
> 
>
> Key: PIG-5458
> URL: https://issues.apache.org/jira/browse/PIG-5458
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5458-v01.patch
>
>
> Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics
> and
> Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics.
> I believe one from com.yammer.metrics (2.1.2) can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5461) E2E environment variables ignored

2024-09-18 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882803#comment-17882803
 ] 

Rohini Palaniswamy commented on PIG-5461:
-

+1

> E2E environment variables ignored
> -
>
> Key: PIG-5461
> URL: https://issues.apache.org/jira/browse/PIG-5461
> Project: Pig
>  Issue Type: Test
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5461-v01.patch
>
>
> When running e2e against Hadoop3 and using hadoop2+oldpig for verification, I 
> was confused why environment variables like OLD_HADOOP_HOME were ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5462) Always update Owasp version to latest

2024-09-18 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882802#comment-17882802
 ] 

Rohini Palaniswamy commented on PIG-5462:
-

+1

> Always update Owasp version to latest 
> --
>
> Key: PIG-5462
> URL: https://issues.apache.org/jira/browse/PIG-5462
> Project: Pig
>  Issue Type: Test
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5462-v01.patch, pig-5462-v02.patch
>
>
> While looking at owasp report, a lot of them were completely off.  
> (Like hadoop-shims-0.10.3 being reported as vulnerable.)
> Using latest org.owasp/dependency-check-ant 
> (https://mvnrepository.com/artifact/org.owasp/dependency-check-ant)
> seems to help cut down the false positives. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5457) Upgrade Zookeeper to 3.7.2 (from 3.5.7)

2024-09-18 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882801#comment-17882801
 ] 

Rohini Palaniswamy commented on PIG-5457:
-

+1

> Upgrade Zookeeper to 3.7.2 (from 3.5.7)
> ---
>
> Key: PIG-5457
> URL: https://issues.apache.org/jira/browse/PIG-5457
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5457-v01.patch, pig-5457-v02.patch
>
>
> As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in 
> log4j-1.2.17.jar that we want to avoid.  Updating to 3.6.4, making it same as 
> the dependency from hadoop 3.3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5463) Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10

2024-09-18 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882798#comment-17882798
 ] 

Rohini Palaniswamy commented on PIG-5463:
-

Can you just rename TestLocalDateTime.java to TestDateTimeLocal.java so that 
both files appear next to each other ?

> Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10
> --
>
> Key: PIG-5463
> URL: https://issues.apache.org/jira/browse/PIG-5463
> Project: Pig
>  Issue Type: Test
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.19.0
>
> Attachments: pig-5463-v01.patch
>
>
> Somehow TestDateTime  testLocalExecution started failing on Pig on Tez with 
> hadoop3. 
> {noformat}
> 2024-09-11 10:50:29,815 [IPC Server handler 30 on default port 34089] WARN  
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor  - Invalid 
> resource ask by application appattempt_1726051802536_0001_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is less 
> than 0! Requested resource type=[memory-mb], Requested resource= vCores:1>
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:525)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:415)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:349)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:304)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:268)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:254)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:434)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:105)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
> {noformat}
> Weird part is, it passes when tested alone or tested twice (with copy&paste). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PIG-5463) Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10

2024-09-12 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5463:
-

Assignee: Koji Noguchi

> Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10
> --
>
> Key: PIG-5463
> URL: https://issues.apache.org/jira/browse/PIG-5463
> Project: Pig
>  Issue Type: Test
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.19.0
>
> Attachments: pig-5463-v01.patch
>
>
> Somehow TestDateTime  testLocalExecution started failing on Pig on Tez with 
> hadoop3. 
> {noformat}
> 2024-09-11 10:50:29,815 [IPC Server handler 30 on default port 34089] WARN  
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor  - Invalid 
> resource ask by application appattempt_1726051802536_0001_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is less 
> than 0! Requested resource type=[memory-mb], Requested resource= vCores:1>
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:525)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:415)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:349)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:304)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:268)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:254)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:434)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:105)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
> {noformat}
> Weird part is, it passes when tested alone or tested twice (with copy&paste). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5463) Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10

2024-09-12 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5463:
--
   Attachment: pig-5463-v01.patch
Fix Version/s: 0.19.0

I believe this has something to do with having both 
{code}
pigServer = new PigServer(cluster.getExecType(), 
cluster.getProperties());
pigServerLocal = new PigServer(Util.getLocalTestMode(), new 
Properties());
{code}
Initialization of pigServer adds hdfs config etc. 

For now, splitting the test file into two to stabilize the test. 
Uploaded pig-5463-v01.patch.


> Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10
> --
>
> Key: PIG-5463
> URL: https://issues.apache.org/jira/browse/PIG-5463
> Project: Pig
>  Issue Type: Test
>Reporter: Koji Noguchi
>Priority: Minor
> Fix For: 0.19.0
>
> Attachments: pig-5463-v01.patch
>
>
> Somehow TestDateTime  testLocalExecution started failing on Pig on Tez with 
> hadoop3. 
> {noformat}
> 2024-09-11 10:50:29,815 [IPC Server handler 30 on default port 34089] WARN  
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor  - Invalid 
> resource ask by application appattempt_1726051802536_0001_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is less 
> than 0! Requested resource type=[memory-mb], Requested resource= vCores:1>
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:525)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:415)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:349)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:304)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:312)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:268)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:254)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:434)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:105)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
> {noformat}
> Weird part is, it passes when tested alone or tested twice (with copy&paste). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5463) Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10

2024-09-12 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5463:
-

 Summary: Pig on Tez TestDateTime.testLocalExecution failing on 
hadoop3/tez-0.10
 Key: PIG-5463
 URL: https://issues.apache.org/jira/browse/PIG-5463
 Project: Pig
  Issue Type: Test
Reporter: Koji Noguchi


Somehow TestDateTimetestLocalExecution started failing on Pig on Tez with 
hadoop3. 
{noformat}
2024-09-11 10:50:29,815 [IPC Server handler 30 on default port 34089] WARN  
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor  - Invalid 
resource ask by application appattempt_1726051802536_0001_01
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request! Cannot allocate containers as requested resource is less than 
0! Requested resource type=[memory-mb], Requested resource=
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:525)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:415)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:349)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:304)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:312)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:268)
at 
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:254)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
at 
org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:434)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:105)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
{noformat}

Weird part is, it passes when tested alone or tested twice (with copy&paste). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5454) Make ParallelGC the default Garbage Collection

2024-08-10 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5454:
--
Attachment: pig-5454-v03.patch

v02 still didn't work for Spark.  It turns out spark also needed pigcontext 
properties to be updated.  v03 uploaded.

> Make ParallelGC the default Garbage Collection
> --
>
> Key: PIG-5454
> URL: https://issues.apache.org/jira/browse/PIG-5454
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5454-v01.patch, pig-5454-v02.patch, 
> pig-5454-v03.patch
>
>
> From JDK9 and beyond, G1GC became the default GC. 
> I've seen our users hitting OOM after migrating to recent jdk and the issue 
> going away after reverting back to ParallelGC.  
> Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5454) Make ParallelGC the default Garbage Collection

2024-08-10 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5454:
--
Attachment: pig-5454-v02.patch

Initial patch didn't work for Tez.  Properties inside PigContext also needed to 
be updated.  Uploading v02 patch.

> Make ParallelGC the default Garbage Collection
> --
>
> Key: PIG-5454
> URL: https://issues.apache.org/jira/browse/PIG-5454
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5454-v01.patch, pig-5454-v02.patch
>
>
> From JDK9 and beyond, G1GC became the default GC. 
> I've seen our users hitting OOM after migrating to recent jdk and the issue 
> going away after reverting back to ParallelGC.  
> Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.7.2 (from 3.5.7)

2024-08-09 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5457:
--
Attachment: pig-5457-v02.patch

> Upgrade Zookeeper to 3.7.2 (from 3.5.7)
> ---
>
> Key: PIG-5457
> URL: https://issues.apache.org/jira/browse/PIG-5457
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5457-v01.patch, pig-5457-v02.patch
>
>
> As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in 
> log4j-1.2.17.jar that we want to avoid.  Updating to 3.6.4, making it same as 
> the dependency from hadoop 3.3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.7.2 (from 3.5.7)

2024-08-09 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5457:
--
Attachment: (was: pig-5457-zookeeper.patch)

> Upgrade Zookeeper to 3.7.2 (from 3.5.7)
> ---
>
> Key: PIG-5457
> URL: https://issues.apache.org/jira/browse/PIG-5457
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5457-v01.patch, pig-5457-v02.patch
>
>
> As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in 
> log4j-1.2.17.jar that we want to avoid.  Updating to 3.6.4, making it same as 
> the dependency from hadoop 3.3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.7.2 (from 3.5.7)

2024-08-09 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5457:
--
Attachment: pig-5457-zookeeper.patch
   Summary: Upgrade Zookeeper to 3.7.2 (from 3.5.7)  (was: Upgrade 
Zookeeper to 3.6.4 (from 3.5.7))

Instead of 3.6, upgrading to 3.7.  Tried 3.8 also but this made the tests 
unstable.  Will re-visit in the future.  

Also, spark is pulling zookeeper 3.6.  Skipping them. 

> Upgrade Zookeeper to 3.7.2 (from 3.5.7)
> ---
>
> Key: PIG-5457
> URL: https://issues.apache.org/jira/browse/PIG-5457
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5457-v01.patch, pig-5457-zookeeper.patch
>
>
> As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in 
> log4j-1.2.17.jar that we want to avoid.  Updating to 3.6.4, making it same as 
> the dependency from hadoop 3.3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5454) Make ParallelGC the default Garbage Collection

2024-08-09 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5454:
--
Attachment: pig-5454-v01.patch

This was not as simple as I hoped for. 
I was incorrectly assuming that when multiple GCs are specified, jvm will pick 
the last one.  Instead, jvm fails to start with 
bq. Conflicting collector combinations in option list; please refer to the 
release notes for the combinations allowed

Here, attaching a patch that looks at the specified options and only adds 
"-XX:+UseParallelGC" when other GC is not specified. 

> Make ParallelGC the default Garbage Collection
> --
>
> Key: PIG-5454
> URL: https://issues.apache.org/jira/browse/PIG-5454
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5454-v01.patch
>
>
> From JDK9 and beyond, G1GC became the default GC. 
> I've seen our users hitting OOM after migrating to recent jdk and the issue 
> going away after reverting back to ParallelGC.  
> Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PIG-5454) Make ParallelGC the default Garbage Collection

2024-08-01 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5454:
-

Assignee: Koji Noguchi

> Make ParallelGC the default Garbage Collection
> --
>
> Key: PIG-5454
> URL: https://issues.apache.org/jira/browse/PIG-5454
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
>
> From JDK9 and beyond, G1GC became the default GC. 
> I've seen our users hitting OOM after migrating to recent jdk and the issue 
> going away after reverting back to ParallelGC.  
> Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5462) Always update Owasp version to latest

2024-07-31 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5462:
--
Attachment: pig-5462-v02.patch
   Summary: Always update Owasp version to latest   (was: Update Owasp 
version to latest (10.0.3) )

Instead of hard coding the latest version, this will always pull the latest 
available.  Uploaded the v02 patch.

bq. Like hadoop-shims-0.10.3 being reported as vulnerable.
Unfortunately, this false positive remained. 
Reading 
https://nvd.nist.gov/vuln/search/results?form_type=Advanced&results_type=overview&search_type=all&cpe_vendor=cpe%3A%2F%3Aapache&cpe_product=cpe%3A%2F%3Aapache%3Ahadoop&cpe_version=cpe%3A%2F%3Aapache%3Ahadoop%3A0.10.3
it seems like it's showing the vulnerability of hadoop 0.10 version which is 
completely unrelated here.  I'll write a separate patch for ignoring those 
false positives. 

> Always update Owasp version to latest 
> --
>
> Key: PIG-5462
>     URL: https://issues.apache.org/jira/browse/PIG-5462
> Project: Pig
>  Issue Type: Test
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5462-v01.patch, pig-5462-v02.patch
>
>
> While looking at owasp report, a lot of them were completely off.  
> (Like hadoop-shims-0.10.3 being reported as vulnerable.)
> Using latest org.owasp/dependency-check-ant 
> (https://mvnrepository.com/artifact/org.owasp/dependency-check-ant)
> seems to help cut down the false positives. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5462) Update Owasp version to latest (10.0.3)

2024-07-31 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5462:
--
Attachment: pig-5462-v01.patch

> Update Owasp version to latest (10.0.3) 
> 
>
> Key: PIG-5462
> URL: https://issues.apache.org/jira/browse/PIG-5462
> Project: Pig
>  Issue Type: Test
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5462-v01.patch
>
>
> While looking at owasp report, a lot of them were completely off.  
> (Like hadoop-shims-0.10.3 being reported as vulnerable.)
> Using latest org.owasp/dependency-check-ant 
> (https://mvnrepository.com/artifact/org.owasp/dependency-check-ant)
> seems to help cut down the false positives. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5462) Update Owasp version to latest (10.0.3)

2024-07-31 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5462:
-

 Summary: Update Owasp version to latest (10.0.3) 
 Key: PIG-5462
 URL: https://issues.apache.org/jira/browse/PIG-5462
 Project: Pig
  Issue Type: Test
Reporter: Koji Noguchi
Assignee: Koji Noguchi


While looking at owasp report, a lot of them were completely off.  
(Like hadoop-shims-0.10.3 being reported as vulnerable.)

Using latest org.owasp/dependency-check-ant 
(https://mvnrepository.com/artifact/org.owasp/dependency-check-ant)
seems to help cut down the false positives. 





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5461) E2E environment variables ignored

2024-07-31 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5461:
--
Attachment: pig-5461-v01.patch

> E2E environment variables ignored
> -
>
> Key: PIG-5461
> URL: https://issues.apache.org/jira/browse/PIG-5461
> Project: Pig
>  Issue Type: Test
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5461-v01.patch
>
>
> When running e2e against Hadoop3 and using hadoop2+oldpig for verification, I 
> was confused why environment variables like OLD_HADOOP_HOME were ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5461) E2E environment variables ignored

2024-07-31 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5461:
-

 Summary: E2E environment variables ignored
 Key: PIG-5461
 URL: https://issues.apache.org/jira/browse/PIG-5461
 Project: Pig
  Issue Type: Test
Reporter: Koji Noguchi
Assignee: Koji Noguchi


When running e2e against Hadoop3 and using hadoop2+oldpig for verification, I 
was confused why environment variables like OLD_HADOOP_HOME were ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)

2024-07-30 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5459:
--
Attachment: pig-5459-v01.patch

> Second option is to give it up and add the required junit jars to lib dir.
>
Attaching a patch which does this.

> Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
> 
>
> Key: PIG-5459
> URL: https://issues.apache.org/jira/browse/PIG-5459
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5459-v01.patch
>
>
> {noformat}
> turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_
> from org.apache.hadoop.conf import *
> java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException;
> at java.lang.Class.getDeclaredFields0(Native Method)
> at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
> at java.lang.Class.privateGetPublicFields(Class.java:2614)
> at java.lang.Class.getFields(Class.java:1557)
> at org.python.core.PyJavaType.init(PyJavaType.java:419)
> at org.python.core.PyType.createType(PyType.java:1523)
> at org.python.core.PyType.addFromClass(PyType.java:1462)
> at org.python.core.PyType.fromClass(PyType.java:1551)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77)
> at 
> org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131)
> at org.python.core.Py.java2py(Py.java:2017)
> at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86)
> at 
> org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113)
> at 
> org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148)
> at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120)
> at org.python.core.imp.importAll(imp.java:1189)
> at org.python.core.imp.importAll(imp.java:1177)
> at 
> org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8)
> at 
> org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig)
> at org.python.core.PyTableCode.call(PyTableCode.java:171)
> at org.python.core.PyCode.call(PyCode.java:18)
> at org.python.core.Py.runCode(Py.java:1614)
> at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424)
> at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310)
> at org.apache.pig.Main.runEmbeddedScript(Main.java:1096)
> at org.apache.pig.Main.run(Main.java:584)
> at org.apache.pig.Main.main(Main.java:175)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:328)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:241)
> Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 37 more
> java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: 
> Lorg/junit/rules/ExpectedException;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5460) Allow Tez to be launched from mapreduce job

2024-07-30 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5460:
--
Attachment: pig-5460-v01.patch

> Allow Tez to be launched from mapreduce job
> ---
>
> Key: PIG-5460
> URL: https://issues.apache.org/jira/browse/PIG-5460
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: pig-5460-v01.patch
>
>
> It's like Oozie but not using Oozie launcher. 
> I would like to be able to submit Pig on Tez job from the mapper task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PIG-5460) Allow Tez to be launched from mapreduce job

2024-07-30 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5460:
-

Assignee: Koji Noguchi

> Allow Tez to be launched from mapreduce job
> ---
>
> Key: PIG-5460
> URL: https://issues.apache.org/jira/browse/PIG-5460
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5460-v01.patch
>
>
> It's like Oozie but not using Oozie launcher. 
> I would like to be able to submit Pig on Tez job from the mapper task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5460) Allow Tez to be launched from mapreduce job

2024-07-30 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5460:
-

 Summary: Allow Tez to be launched from mapreduce job
 Key: PIG-5460
 URL: https://issues.apache.org/jira/browse/PIG-5460
 Project: Pig
  Issue Type: Improvement
Reporter: Koji Noguchi


It's like Oozie but not using Oozie launcher. 
I would like to be able to submit Pig on Tez job from the mapper task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)

2024-07-30 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869647#comment-17869647
 ] 

Koji Noguchi commented on PIG-5459:
---

It confused me on why regular run (e2e) is requiring junit jar. 
It turns out 
"from org.apache.hadoop.conf import *" 
line matches classes from test jars that Hadoop3 added as part of regular lib.
For example 
{noformat}
/tmp/hadoop-3.3.6/share/hadoop/common/hadoop-common-3.3.6-tests.jar ===
 0 Sun Jun 18 08:22:40 UTC 2023 org/apache/hadoop/conf/
  2151 Sun Jun 18 08:22:38 UTC 2023 
org/apache/hadoop/conf/TestConfigurationDeprecation$1.class
   522 Sun Jun 18 08:22:38 UTC 2023 
org/apache/hadoop/conf/TestGetInstances$SampleClass.class
  2291 Sun Jun 18 08:22:38 UTC 2023 
org/apache/hadoop/conf/TestConfigurationDeprecation$2.class
   333 Sun Jun 18 08:22:38 UTC 2023 
org/apache/hadoop/conf/TestGetInstances$ChildInterface.class
  2203 Sun Jun 18 08:22:38 UTC 2023 
org/apache/hadoop/conf/TestGetInstances.class
  2358 Sun Jun 18 08:22:36 UTC 2023 
org/apache/hadoop/conf/TestConfigurationSubclass.class
  3335 Sun Jun 18 08:22:36 UTC 2023 
org/apache/hadoop/conf/TestDeprecatedKeys.class
 71538 Sun Jun 18 08:22:36 UTC 2023 
org/apache/hadoop/conf/TestConfiguration.class
...
/tmp/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.6-tests.jar
 ===
 0 Sun Jun 18 08:42:34 UTC 2023 org/apache/hadoop/conf/
  4469 Sun Jun 18 08:42:34 UTC 2023 
org/apache/hadoop/conf/TestNoDefaultsJobConf.class
...
{noformat}

Now, these classes requires junit.  
One option is to skip these test jars but that requires changes on the hadoop 
side (since pig is calling hadoop commandline to start up pig.)
Second option is to give it up and add the required junit jars to lib dir.
Third option is to skip this test and let users add junit jars if they really 
need to call 
"from org.apache.hadoop.conf import *". but it's pretty tough to understand 
what's happening when users hit this.


> Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
> 
>
> Key: PIG-5459
>     URL: https://issues.apache.org/jira/browse/PIG-5459
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
>
> {noformat}
> turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_
> from org.apache.hadoop.conf import *
> java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException;
> at java.lang.Class.getDeclaredFields0(Native Method)
> at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
> at java.lang.Class.privateGetPublicFields(Class.java:2614)
> at java.lang.Class.getFields(Class.java:1557)
> at org.python.core.PyJavaType.init(PyJavaType.java:419)
> at org.python.core.PyType.createType(PyType.java:1523)
> at org.python.core.PyType.addFromClass(PyType.java:1462)
> at org.python.core.PyType.fromClass(PyType.java:1551)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77)
> at 
> org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131)
> at org.python.core.Py.java2py(Py.java:2017)
> at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86)
> at 
> org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113)
> at 
> org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148)
> at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120)
> at org.python.core.imp.importAll(imp.java:1189)
> at org.python.core.imp.importAll(imp.java:1177)
> at 
> org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8)
> at 
> org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig)
> at org.python.core.PyTableCode.call(PyTableCode.java:171)
> at org.python.core.PyCode.call(PyCode.java:18)
> at org.python.core.Py.runCode(Py.java:1614)
> at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424)
> at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310)
> at org.apache.pig.Main.runEmbeddedScript(Main.java:1096)
>

[jira] [Assigned] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)

2024-07-30 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5459:
-

Assignee: Koji Noguchi

> Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
> 
>
> Key: PIG-5459
> URL: https://issues.apache.org/jira/browse/PIG-5459
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
>
> {noformat}
> turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_
> from org.apache.hadoop.conf import *
> java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException;
> at java.lang.Class.getDeclaredFields0(Native Method)
> at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
> at java.lang.Class.privateGetPublicFields(Class.java:2614)
> at java.lang.Class.getFields(Class.java:1557)
> at org.python.core.PyJavaType.init(PyJavaType.java:419)
> at org.python.core.PyType.createType(PyType.java:1523)
> at org.python.core.PyType.addFromClass(PyType.java:1462)
> at org.python.core.PyType.fromClass(PyType.java:1551)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77)
> at 
> org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131)
> at org.python.core.Py.java2py(Py.java:2017)
> at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86)
> at 
> org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113)
> at 
> org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148)
> at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120)
> at org.python.core.imp.importAll(imp.java:1189)
> at org.python.core.imp.importAll(imp.java:1177)
> at 
> org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8)
> at 
> org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig)
> at org.python.core.PyTableCode.call(PyTableCode.java:171)
> at org.python.core.PyCode.call(PyCode.java:18)
> at org.python.core.Py.runCode(Py.java:1614)
> at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424)
> at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310)
> at org.apache.pig.Main.runEmbeddedScript(Main.java:1096)
> at org.apache.pig.Main.run(Main.java:584)
> at org.apache.pig.Main.main(Main.java:175)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:328)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:241)
> Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 37 more
> java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: 
> Lorg/junit/rules/ExpectedException;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)

2024-07-30 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5459:
-

 Summary: Jython_Checkin_3 e2e failing with NoClassDefFoundError 
(hadoop3)
 Key: PIG-5459
 URL: https://issues.apache.org/jira/browse/PIG-5459
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi


{noformat}
turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_
from org.apache.hadoop.conf import *
java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException;
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
at java.lang.Class.privateGetPublicFields(Class.java:2614)
at java.lang.Class.getFields(Class.java:1557)
at org.python.core.PyJavaType.init(PyJavaType.java:419)
at org.python.core.PyType.createType(PyType.java:1523)
at org.python.core.PyType.addFromClass(PyType.java:1462)
at org.python.core.PyType.fromClass(PyType.java:1551)
at 
org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77)
at 
org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44)
at 
org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131)
at org.python.core.Py.java2py(Py.java:2017)
at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86)
at 
org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113)
at 
org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148)
at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120)
at org.python.core.imp.importAll(imp.java:1189)
at org.python.core.imp.importAll(imp.java:1177)
at 
org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8)
at 
org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig)
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1614)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
at 
org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217)
at 
org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440)
at 
org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424)
at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310)
at org.apache.pig.Main.runEmbeddedScript(Main.java:1096)
at org.apache.pig.Main.run(Main.java:584)
at org.apache.pig.Main.main(Main.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:328)
at org.apache.hadoop.util.RunJar.main(RunJar.java:241)
Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 37 more
java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: 
Lorg/junit/rules/ExpectedException;
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)

2024-07-30 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5459:
--
Priority: Minor  (was: Major)

> Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
> 
>
> Key: PIG-5459
> URL: https://issues.apache.org/jira/browse/PIG-5459
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Priority: Minor
>
> {noformat}
> turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_
> from org.apache.hadoop.conf import *
> java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException;
> at java.lang.Class.getDeclaredFields0(Native Method)
> at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
> at java.lang.Class.privateGetPublicFields(Class.java:2614)
> at java.lang.Class.getFields(Class.java:1557)
> at org.python.core.PyJavaType.init(PyJavaType.java:419)
> at org.python.core.PyType.createType(PyType.java:1523)
> at org.python.core.PyType.addFromClass(PyType.java:1462)
> at org.python.core.PyType.fromClass(PyType.java:1551)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77)
> at 
> org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44)
> at 
> org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131)
> at org.python.core.Py.java2py(Py.java:2017)
> at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86)
> at 
> org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113)
> at 
> org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148)
> at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120)
> at org.python.core.imp.importAll(imp.java:1189)
> at org.python.core.imp.importAll(imp.java:1177)
> at 
> org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8)
> at 
> org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig)
> at org.python.core.PyTableCode.call(PyTableCode.java:171)
> at org.python.core.PyCode.call(PyCode.java:18)
> at org.python.core.Py.runCode(Py.java:1614)
> at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440)
> at 
> org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424)
> at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310)
> at org.apache.pig.Main.runEmbeddedScript(Main.java:1096)
> at org.apache.pig.Main.run(Main.java:584)
> at org.apache.pig.Main.main(Main.java:175)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:328)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:241)
> Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 37 more
> java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: 
> Lorg/junit/rules/ExpectedException;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5458) Update metrics-core.version

2024-07-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869453#comment-17869453
 ] 

Koji Noguchi commented on PIG-5458:
---

Forgot to mention, after the change in PIG-5456, I noticed Pig on MR/Tez jobs 
were relying on metrics jar from Spark.Thus this patch.

> Update metrics-core.version 
> 
>
> Key: PIG-5458
> URL: https://issues.apache.org/jira/browse/PIG-5458
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5458-v01.patch
>
>
> Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics
> and
> Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics.
> I believe one from com.yammer.metrics (2.1.2) can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5456) Upgrade Spark to 3.4.3

2024-07-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869452#comment-17869452
 ] 

Koji Noguchi commented on PIG-5456:
---

In summary, changes to classloading is, for bin/pig and unit tests,  
* MR/Tez jobs will stop using jars from spark directory.
* For Spark3, it would stop using reload4j (and orc-core after PIG-5457) 

Former led to PIG-5458 where I noticed Pig on MR/Tez were relying on metrics 
jar from Spark.

> Upgrade Spark to 3.4.3
> --
>
> Key: PIG-5456
> URL: https://issues.apache.org/jira/browse/PIG-5456
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5456-v01.patch, pig-5456-v02.patch
>
>
> Major blocker for upgrading to Spark 3.4.3 was Spark started using log4j2. 
> Simple upgrade failing a lot of tests with  
> {noformat}
> java.lang.VerifyError: class org.apache.log4j.bridge.LogEventAdapter 
> overrides final method getTimeStamp.()J {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5458) Update metrics-core.version

2024-07-23 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5458:
--
Attachment: pig-5458-v01.patch

> Update metrics-core.version 
> 
>
> Key: PIG-5458
> URL: https://issues.apache.org/jira/browse/PIG-5458
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5458-v01.patch
>
>
> Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics
> and
> Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics.
> I believe one from com.yammer.metrics (2.1.2) can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PIG-5458) Update metrics-core.version

2024-07-23 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5458:
-

Assignee: Koji Noguchi

> Update metrics-core.version 
> 
>
> Key: PIG-5458
> URL: https://issues.apache.org/jira/browse/PIG-5458
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5458-v01.patch
>
>
> Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics
> and
> Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics.
> I believe one from com.yammer.metrics (2.1.2) can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5458) Update metrics-core.version

2024-07-23 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5458:
-

 Summary: Update metrics-core.version 
 Key: PIG-5458
 URL: https://issues.apache.org/jira/browse/PIG-5458
 Project: Pig
  Issue Type: Improvement
Reporter: Koji Noguchi


Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics
and
Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics.

I believe one from com.yammer.metrics (2.1.2) can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing

2024-07-23 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5451:
--
Attachment: pig-9-5451-v01.patch

{quote}
This was caused by conflict of orc.version.  
./build/ivy/lib/Pig/orc-core-1.5.6.jar
./lib/h3/orc-core-1.5.6.jar
and
spark/jars/orc-core-1.6.14.jar
{quote}
After upgrading Spark to 3.4.3 in PIG-5456, conflict changes a bit.

When downloading spark-core 3.4.3 through ivy, no orc-core dependency.

But, when downloading spark-3.4.3-bin-without-hadoop.tgz from Apache, it 
contains 
orc-core-1.8.7-shaded-protobuf.jar and orc-mapreduce-1.8.7-shaded-protobuf.jar. 

In order to make them consistent, adding extra pulls and adding steps to skip 
orc-1.5.6 jars (just like we do with reload4j jars in PIG-5456) for Spark3.

(pig-9-5451-v01.patch)
 

> Pig-on-Spark3 E2E Orc_Pushdown_5 failing 
> -
>
> Key: PIG-5451
> URL: https://issues.apache.org/jira/browse/PIG-5451
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-9-5451-v01.patch
>
>
> Test failing with
> "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate 
> cannot access its superclass org.threeten.extra.chrono.AbstractDate"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5456) Upgrade Spark to 3.4.3

2024-07-23 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5456:
--
Attachment: pig-5456-v02.patch

> log4j-1.2.17.jar was coming from stale zookeeper.  Will create a new Jira to 
> update the dependency.

Created PIG-5457

> As for how to skip reload4j
>
One option I considered was to move the reload4j to a different directory and 
only pick it up for non-spark3 jobs.   This may work if the way to start up pig 
was only from bin/pig or unit/e2e tests.   However, given we don't know if 
users have such custom startup script(s), taking another approach.   Leaving 
the reload4j jar in the same location but explicitly skipping it from bin/pig 
and build.xml(unit) tests.  This way, only Pig-on-spark jobs are affected 
leaving the rest untouched. (pig-5456-v02.patch)

> Upgrade Spark to 3.4.3
> --
>
> Key: PIG-5456
> URL: https://issues.apache.org/jira/browse/PIG-5456
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5456-v01.patch, pig-5456-v02.patch
>
>
> Major blocker for upgrading to Spark 3.4.3 was Spark started using log4j2. 
> Simple upgrade failing a lot of tests with  
> {noformat}
> java.lang.VerifyError: class org.apache.log4j.bridge.LogEventAdapter 
> overrides final method getTimeStamp.()J {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.6.4 (from 3.5.7)

2024-07-23 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5457:
--
Fix Version/s: 0.19.0

> Upgrade Zookeeper to 3.6.4 (from 3.5.7)
> ---
>
> Key: PIG-5457
> URL: https://issues.apache.org/jira/browse/PIG-5457
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5457-v01.patch
>
>
> As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in 
> log4j-1.2.17.jar that we want to avoid.  Updating to 3.6.4, making it same as 
> the dependency from hadoop 3.3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.6.4 (from 3.5.7)

2024-07-23 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5457:
--
Attachment: pig-5457-v01.patch

> Upgrade Zookeeper to 3.6.4 (from 3.5.7)
> ---
>
> Key: PIG-5457
> URL: https://issues.apache.org/jira/browse/PIG-5457
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5457-v01.patch
>
>
> As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in 
> log4j-1.2.17.jar that we want to avoid.  Updating to 3.6.4, making it same as 
> the dependency from hadoop 3.3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PIG-5457) Upgrade Zookeeper to 3.6.4 (from 3.5.7)

2024-07-23 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5457:
-

Assignee: Koji Noguchi

> Upgrade Zookeeper to 3.6.4 (from 3.5.7)
> ---
>
> Key: PIG-5457
> URL: https://issues.apache.org/jira/browse/PIG-5457
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Attachments: pig-5457-v01.patch
>
>
> As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in 
> log4j-1.2.17.jar that we want to avoid.  Updating to 3.6.4, making it same as 
> the dependency from hadoop 3.3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5457) Upgrade Zookeeper to 3.6.4 (from 3.5.7)

2024-07-23 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5457:
-

 Summary: Upgrade Zookeeper to 3.6.4 (from 3.5.7)
 Key: PIG-5457
 URL: https://issues.apache.org/jira/browse/PIG-5457
 Project: Pig
  Issue Type: Improvement
Reporter: Koji Noguchi


As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in log4j-1.2.17.jar 
that we want to avoid.  Updating to 3.6.4, making it same as the dependency 
from hadoop 3.3.6.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3

2024-07-23 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5455.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the review Rohini! 

Committed to trunk.

> Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
> -
>
> Key: PIG-5455
> URL: https://issues.apache.org/jira/browse/PIG-5455
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5455-v01.patch
>
>
> Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later 
> and simple upgrade of Hadoop failing the tests with 
> "Implementing class java.lang.IncompatibleClassChangeError: Implementing 
> class" 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5428) Update hadoop2,3 and tez to recent versions

2024-07-08 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863944#comment-17863944
 ] 

Koji Noguchi commented on PIG-5428:
---

> Setting tez.runtime.transfer.data-via-events.enabled to false helped but not 
> sure where 
> the problem is on. Pig? Tez?
>
It was due to a way how Pig uses Tez different from Hive. 
Hopefully handled in https://issues.apache.org/jira/browse/TEZ-4570.

> Update hadoop2,3 and tez to recent versions
> ---
>
> Key: PIG-5428
> URL: https://issues.apache.org/jira/browse/PIG-5428
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: pig-5428-v01.patch
>
>
> PIG-5253 hadoop3 patch is committed. 
> Now, updating hadoop2&3, tez and other dependent library versions. 
> Only testing using two different parameters. 
> * -Dhbaseversion=2 -Dhadoopversion=2 -Dhiveversion=1 -Dsparkversion=2
> and
> * -Dhbaseversion=2 -Dhadoopversion=3 -Dhiveversion=3 -Dsparkversion=2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3

2024-07-08 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863942#comment-17863942
 ] 

Koji Noguchi commented on PIG-5455:
---

Forgot to mention, I learned that disabling of 
tez.runtime.transfer.data-via-events.enabled done in PIG-5428 was necessary due 
to a bug reported in https://issues.apache.org/jira/browse/TEZ-4570.   


But somehow e2e tests were still not setting this flag.  Moved the disabling of 
tez.runtime.transfer.data-via-events.enabled from TezLauncher&TezMiniCluster to 
TezDagBuilder to enforce this configuration.

> Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
> -
>
> Key: PIG-5455
> URL: https://issues.apache.org/jira/browse/PIG-5455
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5455-v01.patch
>
>
> Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later 
> and simple upgrade of Hadoop failing the tests with 
> "Implementing class java.lang.IncompatibleClassChangeError: Implementing 
> class" 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5456) Upgrade Spark to 3.4.3

2024-07-08 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5456:
--
Attachment: pig-5456-v01.patch

It turns out log4j2 already provides log4j1 compatibility through 
"Log4j 1.x bridge" from 
[https://logging.apache.org/log4j/2.x/manual/migration.html.] 

But this spark/log4j-1.2-api-2.19.0.jar conflicted with reload4j-1.2.24.jar 
resulting in the error provided in the description.  So far, work around in 
running tests were to delete reload4j-1.2.24.jar (and log4j-1.2.17.jar).

log4j-1.2.17.jar was coming from stale zookeeper.  Will create a new Jira to 
update the dependency.

As for how to skip reload4j, [~rohini] , do you have any suggestions? 

In addition, this log4j-1.2-api-2.19.0.jar implementation seems to have a bug 
in use of SimpleLayout.    Reported at 
[https://github.com/apache/logging-log4j2/issues/2722.]  For now, replacing 
them with PatternLayout.

Also, log4j-api-2.19.0.jar from Spark somehow having a bug.  Updating to 
log4j-1.2-api-2.23.1.jar worked. 

Patch uploaded.

> Upgrade Spark to 3.4.3
> --
>
> Key: PIG-5456
> URL: https://issues.apache.org/jira/browse/PIG-5456
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5456-v01.patch
>
>
> Major blocker for upgrading to Spark 3.4.3 was Spark started using log4j2. 
> Simple upgrade failing a lot of tests with  
> {noformat}
> java.lang.VerifyError: class org.apache.log4j.bridge.LogEventAdapter 
> overrides final method getTimeStamp.()J {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5456) Upgrade Spark to 3.4.3

2024-07-08 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5456:
-

 Summary: Upgrade Spark to 3.4.3
 Key: PIG-5456
 URL: https://issues.apache.org/jira/browse/PIG-5456
 Project: Pig
  Issue Type: Improvement
  Components: spark
Reporter: Koji Noguchi
Assignee: Koji Noguchi
 Fix For: 0.19.0


Major blocker for upgrading to Spark 3.4.3 was Spark started using log4j2. 
Simple upgrade failing a lot of tests with  
{noformat}
java.lang.VerifyError: class org.apache.log4j.bridge.LogEventAdapter overrides 
final method getTimeStamp.()J {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3

2024-07-08 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863929#comment-17863929
 ] 

Rohini Palaniswamy commented on PIG-5455:
-

+1

> Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
> -
>
> Key: PIG-5455
> URL: https://issues.apache.org/jira/browse/PIG-5455
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5455-v01.patch
>
>
> Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later 
> and simple upgrade of Hadoop failing the tests with 
> "Implementing class java.lang.IncompatibleClassChangeError: Implementing 
> class" 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3

2024-07-08 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863927#comment-17863927
 ] 

Koji Noguchi commented on PIG-5455:
---

> "Implementing class java.lang.IncompatibleClassChangeError: Implementing 
> class" 
>

This error was coming from incompatible mockito version. "
Pig using mockito 1.8.4, Hadoop-3.3.6 using mockito 2.28.2.  

Uploading the patch which upgrades hadoop and tez, pulls the new dependencies 
and updates test that uses alternative Whitebox implementation provided in 
Hadoop which went away as part of the mockito upgrade. 

> Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
> -
>
> Key: PIG-5455
> URL: https://issues.apache.org/jira/browse/PIG-5455
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5455-v01.patch
>
>
> Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later 
> and simple upgrade of Hadoop failing the tests with 
> "Implementing class java.lang.IncompatibleClassChangeError: Implementing 
> class" 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3

2024-07-08 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5455:
--
Attachment: pig-5455-v01.patch

> Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
> -
>
> Key: PIG-5455
> URL: https://issues.apache.org/jira/browse/PIG-5455
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5455-v01.patch
>
>
> Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later 
> and simple upgrade of Hadoop failing the tests with 
> "Implementing class java.lang.IncompatibleClassChangeError: Implementing 
> class" 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3

2024-07-08 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5455:
-

 Summary: Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
 Key: PIG-5455
 URL: https://issues.apache.org/jira/browse/PIG-5455
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi
 Fix For: 0.19.0


Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later 
and simple upgrade of Hadoop failing the tests with 
"Implementing class java.lang.IncompatibleClassChangeError: Implementing class" 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-17 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reopened PIG-5453:
---

While tracking multiple jiras, I missed that this patch was not put through 
full unit/e2e tests.   (Thus the previous syntax error.) 

After fixing the simple syntax error, saw a couple of regression test failures. 
 At this point, reverting the patch while I debug and come up with a new patch. 

So sorry.

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch, pig-5453-v02.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-16 Thread Daniel Dai (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847097#comment-17847097
 ] 

Daniel Dai commented on PIG-5453:
-

+1

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch, pig-5453-v02.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-16 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847095#comment-17847095
 ] 

Koji Noguchi commented on PIG-5453:
---

Sorry, original patch had extra comma causing the compile error for 
TestFlatten.java. 
Uploaded pig-5453-v02.patch.   To fix the broken trunk, I pushed the change.

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch, pig-5453-v02.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-16 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5453:
--
Attachment: pig-5453-v02.patch

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch, pig-5453-v02.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5453.
---
Fix Version/s: 0.19.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the review Daniel!
Committed to trunk.

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5452.
---
Fix Version/s: 0.19.0
   Resolution: Fixed

Thanks for the review Daniel! 
Committed to trunk.

> Null handling of FLATTEN with user defined schema (as clause)
> -
>
> Key: PIG-5452
> URL: https://issues.apache.org/jira/browse/PIG-5452
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5452-v01.patch
>
>
> Follow up from PIG-5201, 
> {code:java}
> A = load 'input' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 
> as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
> dump C;{code}
> This produces right number of nulls.
> {code:java}
> (a,,,a)
> (b,,,b)
> (c,,,c)
> (d,,,d)
> (f,,,f) {code}
>  
> However, 
> {code:java}
> A = load 'input.txt' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
> dump C;{code}
> This produces wrong number of null and the output is shifted incorrectly. 
> {code:java}
> (a,,a,)
> (b,,b,)
> (c,,c,)
> (d,,d,)
> (f,,f,) {code}
> Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
> tuple() with empty inner fields but with user defined schema of "as 
> (A1:chararray, A2:chararray)". 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5450.
---
Fix Version/s: 0.19.0
   Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
> --
>
> Key: PIG-5450
> URL: https://issues.apache.org/jira/browse/PIG-5450
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5450-v01.patch
>
>
> {noformat}
> Caused by: java.lang.VerifyError: Bad return type
> Exception Details:
> Location:
> org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
>  @117: areturn
> Reason:
> Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5446.
---
Fix Version/s: 0.19.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
> ---
>
> Key: PIG-5446
> URL: https://issues.apache.org/jira/browse/PIG-5446
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5446-v01.patch
>
>
> {noformat}
> Unable to open iterator for alias B. Backend error : Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, 
> vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, 
> taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Attempt failed because it appears to make no progress for 
> 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms]], Vertex did not succeed due to 
> OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at org.apache.pig.PigServer.openIterator(PigServer.java:1014)
> at 
> org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58)
> Caused by: org.apache.tez.dag.api.TezException: Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at 
> org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 45.647 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5448.
---
Fix Version/s: 0.19.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> All TestHBaseStorage tests failing on pig-on-spark3
> ---
>
> Key: PIG-5448
> URL: https://issues.apache.org/jira/browse/PIG-5448
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.19.0
>
> Attachments: pig-5448-v01.patch
>
>
> For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are 
> failing with 
> {noformat}
> org.apache.pig.PigException: ERROR 1002: Unable to store alias b
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at 
> org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.RuntimeException: No task metrics available for jobId 0
> at 
> org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5447.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with 
> NoSuchElementException
> ---
>
> Key: PIG-5447
> URL: https://issues.apache.org/jira/browse/PIG-5447
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5447-v01.patch
>
>
> TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer 
> and full-outer joins.
> "Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5447:
--
Fix Version/s: 0.19.0

> Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with 
> NoSuchElementException
> ---
>
> Key: PIG-5447
> URL: https://issues.apache.org/jira/browse/PIG-5447
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5447-v01.patch
>
>
> TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer 
> and full-outer joins.
> "Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5439) Support Spark 3 and drop SparkShim

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5439.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the review Rohini!
Committed to trunk

> Support Spark 3 and drop SparkShim
> --
>
> Key: PIG-5439
> URL: https://issues.apache.org/jira/browse/PIG-5439
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5439-v01.patch, pig-5439-v02.patch
>
>
> Support Pig-on-Spark to run on spark3. 
> Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. 
> This is due to log4j mismatch. 
> After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher.
> So far, not all unit/e2e tests pass with the proposed patch but at least 
> compilation goes through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5438:
--
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

Thanks for the review Rohini! 
Committed to trunk.

> Update SparkCounter.Accumulator to AccumulatorV2
> 
>
> Key: PIG-5438
> URL: https://issues.apache.org/jira/browse/PIG-5438
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5438-v01.patch
>
>
> Original Accumulator is deprecated in Spark2 and gone in Spark3.  
> AccumulatorV2 is usable on both Spark2 and Spark3. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"

2024-05-14 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-5416.
---
Fix Version/s: 0.19.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Thanks for the review Rohini! 
Committed to trunk.

> Spark unit tests failing randomly with "java.lang.RuntimeException: 
> Unexpected job execution status RUNNING"
> 
>
> Key: PIG-5416
> URL: https://issues.apache.org/jira/browse/PIG-5416
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.19.0
>
> Attachments: pig-5416-v01.patch
>
>
> Spark unit tests fail randomly with same errors. 
>  Sample stack trace showing "Caused by: java.lang.RuntimeException: 
> Unexpected job execution status RUNNING".
> {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF}
> Unable to store alias B
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias B
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:721)
> at 
> org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.execute(PigServer.java:1453)
> at org.apache.pig.PigServer.access$500(PigServer.java:119)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5439) Support Spark 3 and drop SparkShim

2024-05-07 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844427#comment-17844427
 ] 

Rohini Palaniswamy commented on PIG-5439:
-

+1

> Support Spark 3 and drop SparkShim
> --
>
> Key: PIG-5439
> URL: https://issues.apache.org/jira/browse/PIG-5439
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5439-v01.patch, pig-5439-v02.patch
>
>
> Support Pig-on-Spark to run on spark3. 
> Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. 
> This is due to log4j mismatch. 
> After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher.
> So far, not all unit/e2e tests pass with the proposed patch but at least 
> compilation goes through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly

2024-04-18 Thread Daniel Dai (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838865#comment-17838865
 ] 

Daniel Dai commented on PIG-5453:
-

+1

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5453-v01.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5454) Make ParallelGC the default Garbage Collection

2024-04-18 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5454:
-

 Summary: Make ParallelGC the default Garbage Collection
 Key: PIG-5454
 URL: https://issues.apache.org/jira/browse/PIG-5454
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Koji Noguchi


>From JDK9 and beyond, G1GC became the default GC. 
I've seen our users hitting OOM after migrating to recent jdk and the issue 
going away after reverting back to ParallelGC.  

Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5453) FLATTEN shifting fields incorrectly

2024-04-18 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5453:
--
Attachment: pig-5453-v01.patch

Uploading the patch that uses a new field introduced as part of PIG-5201, 
PIG-5452.  If number of fields are less than the expected number of fields, it 
will now fills the rest with null.   If number of fields are more, then it 
would now fills up to the expected number of fields only.  (pig-5453-v01.patch) 

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5453-v01.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-04-16 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837863#comment-17837863
 ] 

Rohini Palaniswamy commented on PIG-5450:
-

+1

> Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
> --
>
> Key: PIG-5450
> URL: https://issues.apache.org/jira/browse/PIG-5450
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5450-v01.patch
>
>
> {noformat}
> Caused by: java.lang.VerifyError: Bad return type
> Exception Details:
> Location:
> org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
>  @117: areturn
> Reason:
> Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3

2024-04-16 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837862#comment-17837862
 ] 

Rohini Palaniswamy commented on PIG-5449:
-

+1

> TestEmptyInputDir failing on pig-on-spark3
> --
>
> Key: PIG-5449
> URL: https://issues.apache.org/jira/browse/PIG-5449
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5449-v01.patch
>
>
> TestEmptyInputDir failing on pig-on-spark3 with 
> {noformat:title=TestEmptyInputDir.testMergeJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141)
> {noformat}
> {noformat:title=TestEmptyInputDir.testGroupByFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80)
> {noformat}
> {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297)
> {noformat}
> {noformat:title=TestEmptyInputDir.testFRJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171)
> {noformat}
> {noformat:title=TestEmptyInputDir.testBloomJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267)
>  {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3

2024-04-16 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837861#comment-17837861
 ] 

Rohini Palaniswamy commented on PIG-5448:
-

+1

> All TestHBaseStorage tests failing on pig-on-spark3
> ---
>
> Key: PIG-5448
> URL: https://issues.apache.org/jira/browse/PIG-5448
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5448-v01.patch
>
>
> For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are 
> failing with 
> {noformat}
> org.apache.pig.PigException: ERROR 1002: Unable to store alias b
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at 
> org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.RuntimeException: No task metrics available for jobId 0
> at 
> org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2

2024-04-16 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837860#comment-17837860
 ] 

Rohini Palaniswamy commented on PIG-5438:
-

+1

> Update SparkCounter.Accumulator to AccumulatorV2
> 
>
> Key: PIG-5438
> URL: https://issues.apache.org/jira/browse/PIG-5438
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 0.19.0
>
> Attachments: pig-5438-v01.patch
>
>
> Original Accumulator is deprecated in Spark2 and gone in Spark3.  
> AccumulatorV2 is usable on both Spark2 and Spark3. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5453) FLATTEN shifting fields incorrectly

2024-04-15 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5453:
-

 Summary: FLATTEN shifting fields incorrectly
 Key: PIG-5453
 URL: https://issues.apache.org/jira/browse/PIG-5453
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Koji Noguchi
Assignee: Koji Noguchi


Follow up from PIG-5201, PIG-5452.  

When flatten-ed tuple has less or more fields than specified, entire fields 
shift incorrectly. 

Input
{noformat}
A       (a,b,c)
B       (a,b,c)
C       (a,b,c)
Y       (a,b)
Z       (a,b,c,d,e,f)
E{noformat}
Script
{code:java}
A = load 'input.txt' as (a1:chararray, a2:tuple());
B = FOREACH A GENERATE a1, FLATTEN(a2) as 
(b1:chararray,b2:chararray,b3:chararray), a1 as a4;
dump B; {code}
Incorrect results
{noformat}
(A,a,b,c,A)
(B,a,b,c,B)
(C,a,b,c,C)
(Y,a,b,Y,)
(Z,a,b,c,d)
(EE){noformat}

E is correct.  It's fixed as part of PIG-5201, PIG-5452.
Y has shifted a4(Y) to the left incorrectly.  
Should have been (Y,a,b,,Y)
Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
Should have been (Z,a,b,c,Z).



 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)

2024-04-15 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5452:
--
Description: 
Follow up from PIG-5201, 
{code:java}
A = load 'input' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as 
a3;
C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
dump C;{code}
This produces right number of nulls.
{code:java}
(a,,,a)
(b,,,b)
(c,,,c)
(d,,,d)
(f,,,f) {code}
 

However, 
{code:java}
A = load 'input.txt' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
dump C;{code}
This produces wrong number of null and the output is shifted incorrectly. 
{code:java}
(a,,a,)
(b,,b,)
(c,,c,)
(d,,d,)
(f,,f,) {code}
Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
tuple() with empty inner fields but with user defined schema of "as 
(A1:chararray, A2:chararray)". 

 

  was:
Follow up from PIG-5201, 
{code:java}
A = load 'input' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as 
a3;
C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
dump C;{code}
This produces right number of nulls.


{code:java}
(a,,,a)
(b,,,b)
(c,,,c)
(d,,,d)
(f,,,f) {code}
 

However, 
{code:java}
A = load 'input.txt' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
dump C;{code}
This produces wrong number of null and the output is shifted incorrectly. 
{code:java}
(a,,a,)
(b,,b,)
(c,,c,)
(d,,d,)
(f,,f,) {code}
Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
tuple() with empty inner fields.

 


> Null handling of FLATTEN with user defined schema (as clause)
> -
>
>     Key: PIG-5452
> URL: https://issues.apache.org/jira/browse/PIG-5452
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5452-v01.patch
>
>
> Follow up from PIG-5201, 
> {code:java}
> A = load 'input' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 
> as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
> dump C;{code}
> This produces right number of nulls.
> {code:java}
> (a,,,a)
> (b,,,b)
> (c,,,c)
> (d,,,d)
> (f,,,f) {code}
>  
> However, 
> {code:java}
> A = load 'input.txt' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
> dump C;{code}
> This produces wrong number of null and the output is shifted incorrectly. 
> {code:java}
> (a,,a,)
> (b,,b,)
> (c,,c,)
> (d,,d,)
> (f,,f,) {code}
> Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
> tuple() with empty inner fields but with user defined schema of "as 
> (A1:chararray, A2:chararray)". 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)

2024-04-12 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5452:
--
Attachment: pig-5452-v01.patch

Instead of relying on innerfield schema, using the output schema which combines 
schema of data and user-defined schema.

> Null handling of FLATTEN with user defined schema (as clause)
> -
>
> Key: PIG-5452
> URL: https://issues.apache.org/jira/browse/PIG-5452
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5452-v01.patch
>
>
> Follow up from PIG-5201, 
> {code:java}
> A = load 'input' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 
> as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
> dump C;{code}
> This produces right number of nulls.
> {code:java}
> (a,,,a)
> (b,,,b)
> (c,,,c)
> (d,,,d)
> (f,,,f) {code}
>  
> However, 
> {code:java}
> A = load 'input.txt' as (a1:chararray);
> B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
> C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
> dump C;{code}
> This produces wrong number of null and the output is shifted incorrectly. 
> {code:java}
> (a,,a,)
> (b,,b,)
> (c,,c,)
> (d,,d,)
> (f,,f,) {code}
> Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
> tuple() with empty inner fields.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)

2024-04-12 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5452:
-

 Summary: Null handling of FLATTEN with user defined schema (as 
clause)
 Key: PIG-5452
 URL: https://issues.apache.org/jira/browse/PIG-5452
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi


Follow up from PIG-5201, 
{code:java}
A = load 'input' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as 
a3;
C = FOREACH B GENERATE a1, FLATTEN(a2), a3;
dump C;{code}
This produces right number of nulls.


{code:java}
(a,,,a)
(b,,,b)
(c,,,c)
(d,,,d)
(f,,,f) {code}
 

However, 
{code:java}
A = load 'input.txt' as (a1:chararray);
B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3;
C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3;
dump C;{code}
This produces wrong number of null and the output is shifted incorrectly. 
{code:java}
(a,,a,)
(b,,b,)
(c,,c,)
(d,,d,)
(f,,f,) {code}
Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of 
tuple() with empty inner fields.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"

2024-04-12 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi reassigned PIG-5416:
-

Assignee: Koji Noguchi

> Spark unit tests failing randomly with "java.lang.RuntimeException: 
> Unexpected job execution status RUNNING"
> 
>
> Key: PIG-5416
> URL: https://issues.apache.org/jira/browse/PIG-5416
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5416-v01.patch
>
>
> Spark unit tests fail randomly with same errors. 
>  Sample stack trace showing "Caused by: java.lang.RuntimeException: 
> Unexpected job execution status RUNNING".
> {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF}
> Unable to store alias B
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias B
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:721)
> at 
> org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.execute(PigServer.java:1453)
> at org.apache.pig.PigServer.access$500(PigServer.java:119)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing

2024-03-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832323#comment-17832323
 ] 

Koji Noguchi commented on PIG-5451:
---

This was caused by conflict of orc.version.  

./build/ivy/lib/Pig/orc-core-1.5.6.jar
./lib/h3/orc-core-1.5.6.jar

and

spark/jars/orc-core-1.6.14.jar

> Pig-on-Spark3 E2E Orc_Pushdown_5 failing 
> -
>
> Key: PIG-5451
> URL: https://issues.apache.org/jira/browse/PIG-5451
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
>
> Test failing with
> "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate 
> cannot access its superclass org.threeten.extra.chrono.AbstractDate"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing

2024-03-29 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5451:
-

 Summary: Pig-on-Spark3 E2E Orc_Pushdown_5 failing 
 Key: PIG-5451
 URL: https://issues.apache.org/jira/browse/PIG-5451
 Project: Pig
  Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi


Test failing with
"java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate 
cannot access its superclass org.threeten.extra.chrono.AbstractDate"






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing

2024-03-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832320#comment-17832320
 ] 

Koji Noguchi commented on PIG-5451:
---

Full stack trace.
{noformat}
2024-03-29 10:57:31,787 [dag-scheduler-event-loop] INFO 
org.apache.spark.scheduler.DAGScheduler - ResultStage 3 (runJob at 
SparkHadoopWriter.scala:83) failed in 36.126 s due to Job aborted due to stage 
failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 
in stage 3.0 (TID 8) (gsrd479n10.red.ygrid.yahoo.com executor 4): 
java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate cannot 
access its superclass org.threeten.extra.chrono.AbstractDate
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at 
org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:46)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.threeten.extra.chrono.HybridChronology.date(HybridChronology.java:235)
at org.threeten.extra.chrono.HybridChronology.date(HybridChronology.java:88)
at java.time.chrono.AbstractChronology.resolveYMD(AbstractChronology.java:563)
at java.time.chrono.AbstractChronology.resolveDate(AbstractChronology.java:472)
at 
org.threeten.extra.chrono.HybridChronology.resolveDate(HybridChronology.java:452)
at 
org.threeten.extra.chrono.HybridChronology.resolveDate(HybridChronology.java:88)
at java.time.format.Parsed.resolveDateFields(Parsed.java:351)
at java.time.format.Parsed.resolveFields(Parsed.java:257)
at java.time.format.Parsed.resolve(Parsed.java:244)
at 
java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331)
at 
java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1955)
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777)
at org.apache.orc.impl.DateUtils._clinit_(DateUtils.java:74)
at 
org.apache.orc.impl.ColumnStatisticsImpl$TimestampStatisticsImpl._init_(ColumnStatisticsImpl.java:1683)
at 
org.apache.orc.impl.ColumnStatisticsImpl.deserialize(ColumnStatisticsImpl.java:2131)
at 
org.apache.orc.impl.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:522)
at 
org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:1045)
at 
org.apache.orc.impl.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:1117)
at org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1137)
at 
org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1187)
at 
org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1222)
at org.apache.orc.impl.RecordReaderImpl._init_(RecordReaderImpl.java:254)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl._init_(RecordReaderImpl.java:67)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:83)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:337)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat$OrcRecordReader._init_(OrcNewInputFormat.java:72)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.createRecordReader(OrcNewInputFormat.java:57)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:255)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader._init_(PigRecordReader.java:126)
at 
org.apache.pig.backend.hadoop.executionengine.spark.SparkPigRecordReader._init_(SparkPigRecordReader.java:44)
at 
org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark$SparkRecordReaderFactory.createRecordReader(PigInputFormatSpark.java:131)
at 
org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark.createRecordReader(PigInputFormatSpark.java:71)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:215)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1._init_(NewHadoopRDD.scala:213)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iter

[jira] [Updated] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-03-29 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5450:
--
Attachment: pig-5450-v01.patch

It turns out the weird error was coming from conflicting jar. 
{{./build/ivy/lib/Pig/hive-storage-api-2.7.0.jar}}
and
{{spark/spark/jars/hive-storage-api-2.7.2.jar}}

Uploading a patch updating hive-storage-api version.

> Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
> --
>
> Key: PIG-5450
> URL: https://issues.apache.org/jira/browse/PIG-5450
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5450-v01.patch
>
>
> {noformat}
> Caused by: java.lang.VerifyError: Bad return type
> Exception Details:
> Location:
> org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
>  @117: areturn
> Reason:
> Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-03-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832318#comment-17832318
 ] 

Koji Noguchi commented on PIG-5450:
---

Weird full trace.
{noformat}
024-03-27 10:50:40,088 [task-result-getter-0] WARN 
org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0) 
(gsrd238n05.red.ygrid.yahoo.com executor 1): org.apache.spark.SparkException: 
Task failed while writing rows
at 
org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:163)
at 
org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.VerifyError: Bad return type
Exception Details:
Location:
org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
 @117: areturn
Reason:
Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
stack[0]) is not assignable to 
'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
Current Frame:
bci: @117
flags: { }
locals: { 'org/apache/orc/TypeDescription', 
'org/apache/orc/TypeDescription$RowBatchVersion', integer }
stack: { 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' }
Bytecode:
0x000: b200 022a b600 03b6 0004 2eaa  0181
0x010:  0001  0013  0059  0059
0x020:  0059  0059  0059  0062
0x030:  006b  006b  0074  0074
0x040:  007d  00ad  00ad  00ad
0x050:  00ad  00b6  00f7  0138
0x060:  0155 bb00 0559 1cb7 0006 b0bb 0007
0x070: 591c b700 08b0 bb00 0959 1cb7 000a b0bb
0x080: 000b 591c b700 0cb0 2ab6 000d 3e2a b600
0x090: 0e36 042b b200 0fa5 0009 1d10 12a4 000f
0x0a0: bb00 1159 1c1d 1504 b700 12b0 bb00 1359
0x0b0: 1c1d 1504 b700 14b0 bb00 1559 1cb7 0016
0x0c0: b02a b600 174e 2db9 0018 0100 bd00 193a
0x0d0: 0403 3605 1505 1904 bea2 001e 1904 1505
0x0e0: 2d15 05b9 001a 0200 c000 102b 1cb8 001b
0x0f0: 5384 0501 a7ff e0bb 001c 591c 1904 b700
0x100: 1db0 2ab6 0017 4e2d b900 1801 00bd 0019
0x110: 3a04 0336 0515 0519 04be a200 1e19 0415
0x120: 052d 1505 b900 1a02 00c0 0010 2b1c b800
0x130: 1b53 8405 01a7 ffe0 bb00 1e59 1c19 04b7
0x140: 001f b02a b600 174e bb00 2059 1c2d 03b9
0x150: 001a 0200 c000 102b 1cb8 001b b700 21b0
0x160: 2ab6 0017 4ebb 0022 591c 2d03 b900 1a02
0x170: 00c0 0010 2b1c b800 1b2d 04b9 001a 0200
0x180: c000 102b 1cb8 001b b700 23b0 bb00 2459
0x190: bb00 2559 b700 2612 27b6 0028 2ab6 0003
0x1a0: b600 29b6 002a b700 2bbf
Stackmap Table:
same_frame_extended(@100)
same_frame(@109)
same_frame(@118)
same_frame(@127)
same_frame(@136)
append_frame(@160,Integer,Integer)
same_frame(@172)
chop_frame(@184,2)
same_frame(@193)
append_frame(@212,Object[_75],Object[_76],Integer)
chop_frame(@247,1)
chop_frame(@258,2)
append_frame(@277,Object[_75],Object[_76],Integer)
chop_frame(@312,1)
chop_frame(@323,2)
same_frame(@352)
same_frame(@396)

at org.apache.orc.TypeDescription.createRowBatch(TypeDescription.java:483)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl._init_(WriterImpl.java:100)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:334)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:51)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:37)
at org.apache.pig.builtin.OrcStorage.putNext(OrcStorage.java:249)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:146)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at 
org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.write(SparkHadoopWriter.scala:368)
at 
org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:138)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
at 
org.a

[jira] [Created] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type

2024-03-29 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5450:
-

 Summary: Pig-on-Spark3 E2E ORC test failing with 
java.lang.VerifyError: Bad return type
 Key: PIG-5450
 URL: https://issues.apache.org/jira/browse/PIG-5450
 Project: Pig
  Issue Type: Bug
  Components: spark
Reporter: Koji Noguchi
Assignee: Koji Noguchi


{noformat}
Caused by: java.lang.VerifyError: Bad return type
Exception Details:
Location:
org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector;
 @117: areturn
Reason:
Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, 
stack[0]) is not assignable to 
'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature)
 {noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5410) Support Python 3 for streaming_python

2024-03-29 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5410:
--
Attachment: pig-5410-v02.patch

Testing the patch, it was failing with
{noformat}
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File 
"/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py",
 line 365
WRAPPED_MAP_END)
^
SyntaxError: invalid syntax
{noformat}
it seems like the patch was missing a '+'.   Uploading a new patch with '+'.  



> Support Python 3 for streaming_python
> -
>
> Key: PIG-5410
>     URL: https://issues.apache.org/jira/browse/PIG-5410
> Project: Pig
>  Issue Type: New Feature
>Reporter: Rohini Palaniswamy
>Assignee: Venkatasubrahmanian Narayanan
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5410.patch, pig-5410-v02.patch
>
>
> Python 3 is incompatible with Python 2. We need to make it work with both. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (PIG-5410) Support Python 3 for streaming_python

2024-03-29 Thread Koji Noguchi (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832317#comment-17832317
 ] 

Koji Noguchi edited comment on PIG-5410 at 3/29/24 9:10 PM:


Testing the patch, it was failing with
{noformat}
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File 
"/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py",
 line 365
WRAPPED_MAP_END)
^
SyntaxError: invalid syntax
{noformat}
it seems like the patch was missing a '+'. Uploading a new patch.


was (Author: knoguchi):
Testing the patch, it was failing with
{noformat}
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File 
"/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py",
 line 365
WRAPPED_MAP_END)
^
SyntaxError: invalid syntax
{noformat}
it seems like the patch was missing a '+'.   Uploading a new patch with '+'.  



> Support Python 3 for streaming_python
> -
>
>     Key: PIG-5410
> URL: https://issues.apache.org/jira/browse/PIG-5410
> Project: Pig
>  Issue Type: New Feature
>Reporter: Rohini Palaniswamy
>Assignee: Venkatasubrahmanian Narayanan
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5410.patch, pig-5410-v02.patch
>
>
> Python 3 is incompatible with Python 2. We need to make it work with both. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3

2024-03-22 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5449:
--
Attachment: pig-5449-v01.patch

Before (in spark2 land), this used to work by checking empty list returned by 
getjobIDs.
https://github.com/apache/pig/blob/branch-0.17/src/org/apache/pig/backend/hadoop/executionengine/spark/JobGraphBuilder.java#L210-L219

But with spark3, this is returning actual jobid but no metrics stored behind.  

Instead of adding another logic for spark3, I think we can treat metrics 
retrieval as optional like we do in mapreduce & tez.Attaching a patch. 
(pig-5449-v01.patch)

> TestEmptyInputDir failing on pig-on-spark3
> --
>
> Key: PIG-5449
> URL: https://issues.apache.org/jira/browse/PIG-5449
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5449-v01.patch
>
>
> TestEmptyInputDir failing on pig-on-spark3 with 
> {noformat:title=TestEmptyInputDir.testMergeJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141)
> {noformat}
> {noformat:title=TestEmptyInputDir.testGroupByFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80)
> {noformat}
> {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297)
> {noformat}
> {noformat:title=TestEmptyInputDir.testFRJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171)
> {noformat}
> {noformat:title=TestEmptyInputDir.testBloomJoinFailure}
> junit.framework.AssertionFailedError
> at 
> org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267)
>  {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3

2024-03-22 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5449:
-

 Summary: TestEmptyInputDir failing on pig-on-spark3
 Key: PIG-5449
 URL: https://issues.apache.org/jira/browse/PIG-5449
 Project: Pig
  Issue Type: Bug
  Components: spark
Reporter: Koji Noguchi
Assignee: Koji Noguchi


TestEmptyInputDir failing on pig-on-spark3 with 
{noformat:title=TestEmptyInputDir.testMergeJoinFailure}
junit.framework.AssertionFailedError
at 
org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141)
{noformat}
{noformat:title=TestEmptyInputDir.testGroupByFailure}
junit.framework.AssertionFailedError
at org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80)
{noformat}
{noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure}
junit.framework.AssertionFailedError
at 
org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297)
{noformat}
{noformat:title=TestEmptyInputDir.testFRJoinFailure}
junit.framework.AssertionFailedError
at org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171)
{noformat}
{noformat:title=TestEmptyInputDir.testBloomJoinFailure}
junit.framework.AssertionFailedError
at 
org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267) 
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3

2024-03-19 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5448:
--
Attachment: pig-5448-v01.patch

{quote}No task metrics available for jobId 0
{quote}
This is actually failing because Pig is succeeding without running anything. 
Looking further, found out that Spark is filtering out all input splits and 
reporting successful empty job results with no metrics.

Setting a flag so that Spark would not ignore PigSplit which looks empty but 
still have (non-hdfs) inputs. (pig-5448-v01.patch)

> All TestHBaseStorage tests failing on pig-on-spark3
> ---
>
> Key: PIG-5448
> URL: https://issues.apache.org/jira/browse/PIG-5448
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5448-v01.patch
>
>
> For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are 
> failing with 
> {noformat}
> org.apache.pig.PigException: ERROR 1002: Unable to store alias b
> at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
> at org.apache.pig.PigServer.store(PigServer.java:1086)
> at 
> org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
> Caused by: java.lang.RuntimeException: No task metrics available for jobId 0
> at 
> org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PIG-5439) Support Spark 3 and drop SparkShim

2024-03-19 Thread Koji Noguchi (Jira)


 [ 
https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5439:
--
Attachment: pig-5439-v02.patch

Adding missing spark-scala.version. (pig-5439-v02.patch)

> Support Spark 3 and drop SparkShim
> --
>
> Key: PIG-5439
> URL: https://issues.apache.org/jira/browse/PIG-5439
> Project: Pig
>  Issue Type: Improvement
>  Components: spark
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5439-v01.patch, pig-5439-v02.patch
>
>
> Support Pig-on-Spark to run on spark3. 
> Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. 
> This is due to log4j mismatch. 
> After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher.
> So far, not all unit/e2e tests pass with the proposed patch but at least 
> compilation goes through.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3

2024-03-19 Thread Koji Noguchi (Jira)
Koji Noguchi created PIG-5448:
-

 Summary: All TestHBaseStorage tests failing on pig-on-spark3
 Key: PIG-5448
 URL: https://issues.apache.org/jira/browse/PIG-5448
 Project: Pig
  Issue Type: Bug
  Components: spark
Reporter: Koji Noguchi
Assignee: Koji Noguchi


For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are 
failing with 
{noformat}
org.apache.pig.PigException: ERROR 1002: Unable to store alias b
at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
at org.apache.pig.PigServer.store(PigServer.java:1086)
at 
org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251)
Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get the 
rdds of this spark operator:
at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
at 
org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
at 
org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
at 
org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
Caused by: java.lang.RuntimeException: No task metrics available for jobId 0
at 
org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109)
at 
org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77)
at 
org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73)
at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing

2024-03-13 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826791#comment-17826791
 ] 

Rohini Palaniswamy commented on PIG-5446:
-

+1

> Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
> ---
>
> Key: PIG-5446
> URL: https://issues.apache.org/jira/browse/PIG-5446
> Project: Pig
>  Issue Type: Bug
>  Components: tez
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5446-v01.patch
>
>
> {noformat}
> Unable to open iterator for alias B. Backend error : Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, 
> vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, 
> taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Attempt failed because it appears to make no progress for 
> 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms]], Vertex did not succeed due to 
> OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at org.apache.pig.PigServer.openIterator(PigServer.java:1014)
> at 
> org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58)
> Caused by: org.apache.tez.dag.api.TezException: Vertex failed, 
> vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, 
> diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to 
> make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed 
> because it appears to make no progress for 1ms]], Vertex did not succeed 
> due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex 
> vertex_1707216362777_0001_1_00 [scope-4] killed/failed due 
> to:OWN_TASK_FAILURE]
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> at 
> org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 45.647 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"

2024-03-13 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826790#comment-17826790
 ] 

Rohini Palaniswamy commented on PIG-5416:
-

+1

> Spark unit tests failing randomly with "java.lang.RuntimeException: 
> Unexpected job execution status RUNNING"
> 
>
> Key: PIG-5416
> URL: https://issues.apache.org/jira/browse/PIG-5416
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: pig-5416-v01.patch
>
>
> Spark unit tests fail randomly with same errors. 
>  Sample stack trace showing "Caused by: java.lang.RuntimeException: 
> Unexpected job execution status RUNNING".
> {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF}
> Unable to store alias B
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias B
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:721)
> at 
> org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429)
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get 
> the rdds of this spark operator:
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1479)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464)
> at org.apache.pig.PigServer.execute(PigServer.java:1453)
> at org.apache.pig.PigServer.access$500(PigServer.java:119)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778)
> Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138)
> at 
> org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75)
> at 
> org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225)
> at 
> org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException

2024-03-13 Thread Rohini Palaniswamy (Jira)


[ 
https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826789#comment-17826789
 ] 

Rohini Palaniswamy commented on PIG-5447:
-

+1

> Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with 
> NoSuchElementException
> ---
>
> Key: PIG-5447
> URL: https://issues.apache.org/jira/browse/PIG-5447
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5447-v01.patch
>
>
> TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer 
> and full-outer joins.
> "Caused by: java.util.NoSuchElementException: next on empty iterator"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >