[jira] [Assigned] (PIG-5464) Move off from jackson-mapper-asl and jackson-core-asl
[ https://issues.apache.org/jira/browse/PIG-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5464: - Attachment: pig-5464-jackson_avro.patch Assignee: Koji Noguchi This patch is not to me committed. Only works for hadoop3 version. If we were to commit, we probably need a shim approach. > Move off from jackson-mapper-asl and jackson-core-asl > - > > Key: PIG-5464 > URL: https://issues.apache.org/jira/browse/PIG-5464 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5464-jackson_avro.patch > > > Similar to HADOOP-15983 and SPARK-30466, we need to move off from > jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13. > However, this is only possible for Hadoop3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (PIG-5464) Move off from jackson-mapper-asl and jackson-core-asl
[ https://issues.apache.org/jira/browse/PIG-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883093#comment-17883093 ] Koji Noguchi edited comment on PIG-5464 at 9/19/24 6:45 PM: This patch is not to be committed. Only works for hadoop3 version. If we were to commit, we probably need a shim approach. was (Author: knoguchi): This patch is not to me committed. Only works for hadoop3 version. If we were to commit, we probably need a shim approach. > Move off from jackson-mapper-asl and jackson-core-asl > - > > Key: PIG-5464 > URL: https://issues.apache.org/jira/browse/PIG-5464 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5464-jackson_avro.patch > > > Similar to HADOOP-15983 and SPARK-30466, we need to move off from > jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13. > However, this is only possible for Hadoop3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5464) Move off from jackson-mapper-asl and jackson-core-asl
Koji Noguchi created PIG-5464: - Summary: Move off from jackson-mapper-asl and jackson-core-asl Key: PIG-5464 URL: https://issues.apache.org/jira/browse/PIG-5464 Project: Pig Issue Type: Improvement Reporter: Koji Noguchi Similar to HADOOP-15983 and SPARK-30466, we need to move off from jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13. However, this is only possible for Hadoop3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
[ https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882811#comment-17882811 ] Rohini Palaniswamy commented on PIG-5459: - +1 > Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3) > > > Key: PIG-5459 > URL: https://issues.apache.org/jira/browse/PIG-5459 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5459-v01.patch > > > {noformat} > turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_ > from org.apache.hadoop.conf import * > java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException; > at java.lang.Class.getDeclaredFields0(Native Method) > at java.lang.Class.privateGetDeclaredFields(Class.java:2583) > at java.lang.Class.privateGetPublicFields(Class.java:2614) > at java.lang.Class.getFields(Class.java:1557) > at org.python.core.PyJavaType.init(PyJavaType.java:419) > at org.python.core.PyType.createType(PyType.java:1523) > at org.python.core.PyType.addFromClass(PyType.java:1462) > at org.python.core.PyType.fromClass(PyType.java:1551) > at > org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77) > at > org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44) > at > org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131) > at org.python.core.Py.java2py(Py.java:2017) > at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86) > at > org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113) > at > org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148) > at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120) > at org.python.core.imp.importAll(imp.java:1189) > at org.python.core.imp.importAll(imp.java:1177) > at > org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8) > at > org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig) > at org.python.core.PyTableCode.call(PyTableCode.java:171) > at org.python.core.PyCode.call(PyCode.java:18) > at org.python.core.Py.runCode(Py.java:1614) > at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296) > at > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217) > at > org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440) > at > org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424) > at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310) > at org.apache.pig.Main.runEmbeddedScript(Main.java:1096) > at org.apache.pig.Main.run(Main.java:584) > at org.apache.pig.Main.main(Main.java:175) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:328) > at org.apache.hadoop.util.RunJar.main(RunJar.java:241) > Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 37 more > java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: > Lorg/junit/rules/ExpectedException; > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing
[ https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882810#comment-17882810 ] Rohini Palaniswamy commented on PIG-5451: - +1 > Pig-on-Spark3 E2E Orc_Pushdown_5 failing > - > > Key: PIG-5451 > URL: https://issues.apache.org/jira/browse/PIG-5451 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-9-5451-v01.patch > > > Test failing with > "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate > cannot access its superclass org.threeten.extra.chrono.AbstractDate" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5420) Update accumulo dependency to 1.10.1
[ https://issues.apache.org/jira/browse/PIG-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882809#comment-17882809 ] Rohini Palaniswamy commented on PIG-5420: - +1 > Update accumulo dependency to 1.10.1 > > > Key: PIG-5420 > URL: https://issues.apache.org/jira/browse/PIG-5420 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.18.1 > > Attachments: pig-5420-v01.patch, pig-9-5420-v02.patch > > > Following owasp/cve report. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5420) Update accumulo dependency to 1.10.1
[ https://issues.apache.org/jira/browse/PIG-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5420: -- Attachment: pig-9-5420-v02.patch > Update accumulo dependency to 1.10.1 > > > Key: PIG-5420 > URL: https://issues.apache.org/jira/browse/PIG-5420 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.18.1 > > Attachments: pig-5420-v01.patch, pig-9-5420-v02.patch > > > Following owasp/cve report. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5420) Update accumulo dependency to 1.10.1
[ https://issues.apache.org/jira/browse/PIG-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882808#comment-17882808 ] Koji Noguchi commented on PIG-5420: --- Uploaded pig-9-5420-v02.patch > Update accumulo dependency to 1.10.1 > > > Key: PIG-5420 > URL: https://issues.apache.org/jira/browse/PIG-5420 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.18.1 > > Attachments: pig-5420-v01.patch, pig-9-5420-v02.patch > > > Following owasp/cve report. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5460) Allow Tez to be launched from mapreduce job
[ https://issues.apache.org/jira/browse/PIG-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882806#comment-17882806 ] Rohini Palaniswamy commented on PIG-5460: - Change should just be {code:java} String tokenFile = System.getenv("HADOOP_TOKEN_FILE_LOCATION") if(tokenFile != null && globalConf.get(MRConfiguration.JOB_CREDENTIALS_BINARY) == null) { globalConf.set(MRConfiguration.JOB_CREDENTIALS_BINARY, tokenFile); globalConf.set("tez.credentials.path", tokenFile); } {code} SecurityHelper.populateTokenCache will take care of reading from that. It would be even better if you can put the above into a configureCredentialFile(Configuration conf) method in SecurityHelper instead of TezDAGBuilder and just call it from there, so that all related code is in one place. > Allow Tez to be launched from mapreduce job > --- > > Key: PIG-5460 > URL: https://issues.apache.org/jira/browse/PIG-5460 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5460-v01.patch > > > It's like Oozie but not using Oozie launcher. > I would like to be able to submit Pig on Tez job from the mapper task. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5458) Update metrics-core.version
[ https://issues.apache.org/jira/browse/PIG-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882807#comment-17882807 ] Rohini Palaniswamy commented on PIG-5458: - +1 > Update metrics-core.version > > > Key: PIG-5458 > URL: https://issues.apache.org/jira/browse/PIG-5458 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5458-v01.patch > > > Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics > and > Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics. > I believe one from com.yammer.metrics (2.1.2) can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5461) E2E environment variables ignored
[ https://issues.apache.org/jira/browse/PIG-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882803#comment-17882803 ] Rohini Palaniswamy commented on PIG-5461: - +1 > E2E environment variables ignored > - > > Key: PIG-5461 > URL: https://issues.apache.org/jira/browse/PIG-5461 > Project: Pig > Issue Type: Test >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5461-v01.patch > > > When running e2e against Hadoop3 and using hadoop2+oldpig for verification, I > was confused why environment variables like OLD_HADOOP_HOME were ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5462) Always update Owasp version to latest
[ https://issues.apache.org/jira/browse/PIG-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882802#comment-17882802 ] Rohini Palaniswamy commented on PIG-5462: - +1 > Always update Owasp version to latest > -- > > Key: PIG-5462 > URL: https://issues.apache.org/jira/browse/PIG-5462 > Project: Pig > Issue Type: Test >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5462-v01.patch, pig-5462-v02.patch > > > While looking at owasp report, a lot of them were completely off. > (Like hadoop-shims-0.10.3 being reported as vulnerable.) > Using latest org.owasp/dependency-check-ant > (https://mvnrepository.com/artifact/org.owasp/dependency-check-ant) > seems to help cut down the false positives. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5457) Upgrade Zookeeper to 3.7.2 (from 3.5.7)
[ https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882801#comment-17882801 ] Rohini Palaniswamy commented on PIG-5457: - +1 > Upgrade Zookeeper to 3.7.2 (from 3.5.7) > --- > > Key: PIG-5457 > URL: https://issues.apache.org/jira/browse/PIG-5457 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5457-v01.patch, pig-5457-v02.patch > > > As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in > log4j-1.2.17.jar that we want to avoid. Updating to 3.6.4, making it same as > the dependency from hadoop 3.3.6. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5463) Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10
[ https://issues.apache.org/jira/browse/PIG-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882798#comment-17882798 ] Rohini Palaniswamy commented on PIG-5463: - Can you just rename TestLocalDateTime.java to TestDateTimeLocal.java so that both files appear next to each other ? > Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10 > -- > > Key: PIG-5463 > URL: https://issues.apache.org/jira/browse/PIG-5463 > Project: Pig > Issue Type: Test >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.19.0 > > Attachments: pig-5463-v01.patch > > > Somehow TestDateTime testLocalExecution started failing on Pig on Tez with > hadoop3. > {noformat} > 2024-09-11 10:50:29,815 [IPC Server handler 30 on default port 34089] WARN > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor - Invalid > resource ask by application appattempt_1726051802536_0001_01 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is less > than 0! Requested resource type=[memory-mb], Requested resource= vCores:1> > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:525) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:415) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:349) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:268) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:254) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:434) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:105) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) > {noformat} > Weird part is, it passes when tested alone or tested twice (with copy&paste). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (PIG-5463) Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10
[ https://issues.apache.org/jira/browse/PIG-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5463: - Assignee: Koji Noguchi > Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10 > -- > > Key: PIG-5463 > URL: https://issues.apache.org/jira/browse/PIG-5463 > Project: Pig > Issue Type: Test >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.19.0 > > Attachments: pig-5463-v01.patch > > > Somehow TestDateTime testLocalExecution started failing on Pig on Tez with > hadoop3. > {noformat} > 2024-09-11 10:50:29,815 [IPC Server handler 30 on default port 34089] WARN > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor - Invalid > resource ask by application appattempt_1726051802536_0001_01 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is less > than 0! Requested resource type=[memory-mb], Requested resource= vCores:1> > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:525) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:415) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:349) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:268) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:254) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:434) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:105) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) > {noformat} > Weird part is, it passes when tested alone or tested twice (with copy&paste). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5463) Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10
[ https://issues.apache.org/jira/browse/PIG-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5463: -- Attachment: pig-5463-v01.patch Fix Version/s: 0.19.0 I believe this has something to do with having both {code} pigServer = new PigServer(cluster.getExecType(), cluster.getProperties()); pigServerLocal = new PigServer(Util.getLocalTestMode(), new Properties()); {code} Initialization of pigServer adds hdfs config etc. For now, splitting the test file into two to stabilize the test. Uploaded pig-5463-v01.patch. > Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10 > -- > > Key: PIG-5463 > URL: https://issues.apache.org/jira/browse/PIG-5463 > Project: Pig > Issue Type: Test >Reporter: Koji Noguchi >Priority: Minor > Fix For: 0.19.0 > > Attachments: pig-5463-v01.patch > > > Somehow TestDateTime testLocalExecution started failing on Pig on Tez with > hadoop3. > {noformat} > 2024-09-11 10:50:29,815 [IPC Server handler 30 on default port 34089] WARN > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor - Invalid > resource ask by application appattempt_1726051802536_0001_01 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is less > than 0! Requested resource type=[memory-mb], Requested resource= vCores:1> > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:525) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:415) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:349) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:268) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:254) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:434) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:105) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) > {noformat} > Weird part is, it passes when tested alone or tested twice (with copy&paste). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5463) Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10
Koji Noguchi created PIG-5463: - Summary: Pig on Tez TestDateTime.testLocalExecution failing on hadoop3/tez-0.10 Key: PIG-5463 URL: https://issues.apache.org/jira/browse/PIG-5463 Project: Pig Issue Type: Test Reporter: Koji Noguchi Somehow TestDateTimetestLocalExecution started failing on Pig on Tez with hadoop3. {noformat} 2024-09-11 10:50:29,815 [IPC Server handler 30 on default port 34089] WARN org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor - Invalid resource ask by application appattempt_1726051802536_0001_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is less than 0! Requested resource type=[memory-mb], Requested resource= at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:525) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:415) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:349) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:304) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:312) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:268) at org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:254) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:434) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:105) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048) {noformat} Weird part is, it passes when tested alone or tested twice (with copy&paste). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5454) Make ParallelGC the default Garbage Collection
[ https://issues.apache.org/jira/browse/PIG-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5454: -- Attachment: pig-5454-v03.patch v02 still didn't work for Spark. It turns out spark also needed pigcontext properties to be updated. v03 uploaded. > Make ParallelGC the default Garbage Collection > -- > > Key: PIG-5454 > URL: https://issues.apache.org/jira/browse/PIG-5454 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5454-v01.patch, pig-5454-v02.patch, > pig-5454-v03.patch > > > From JDK9 and beyond, G1GC became the default GC. > I've seen our users hitting OOM after migrating to recent jdk and the issue > going away after reverting back to ParallelGC. > Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5454) Make ParallelGC the default Garbage Collection
[ https://issues.apache.org/jira/browse/PIG-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5454: -- Attachment: pig-5454-v02.patch Initial patch didn't work for Tez. Properties inside PigContext also needed to be updated. Uploading v02 patch. > Make ParallelGC the default Garbage Collection > -- > > Key: PIG-5454 > URL: https://issues.apache.org/jira/browse/PIG-5454 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5454-v01.patch, pig-5454-v02.patch > > > From JDK9 and beyond, G1GC became the default GC. > I've seen our users hitting OOM after migrating to recent jdk and the issue > going away after reverting back to ParallelGC. > Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.7.2 (from 3.5.7)
[ https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5457: -- Attachment: pig-5457-v02.patch > Upgrade Zookeeper to 3.7.2 (from 3.5.7) > --- > > Key: PIG-5457 > URL: https://issues.apache.org/jira/browse/PIG-5457 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5457-v01.patch, pig-5457-v02.patch > > > As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in > log4j-1.2.17.jar that we want to avoid. Updating to 3.6.4, making it same as > the dependency from hadoop 3.3.6. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.7.2 (from 3.5.7)
[ https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5457: -- Attachment: (was: pig-5457-zookeeper.patch) > Upgrade Zookeeper to 3.7.2 (from 3.5.7) > --- > > Key: PIG-5457 > URL: https://issues.apache.org/jira/browse/PIG-5457 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5457-v01.patch, pig-5457-v02.patch > > > As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in > log4j-1.2.17.jar that we want to avoid. Updating to 3.6.4, making it same as > the dependency from hadoop 3.3.6. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.7.2 (from 3.5.7)
[ https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5457: -- Attachment: pig-5457-zookeeper.patch Summary: Upgrade Zookeeper to 3.7.2 (from 3.5.7) (was: Upgrade Zookeeper to 3.6.4 (from 3.5.7)) Instead of 3.6, upgrading to 3.7. Tried 3.8 also but this made the tests unstable. Will re-visit in the future. Also, spark is pulling zookeeper 3.6. Skipping them. > Upgrade Zookeeper to 3.7.2 (from 3.5.7) > --- > > Key: PIG-5457 > URL: https://issues.apache.org/jira/browse/PIG-5457 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5457-v01.patch, pig-5457-zookeeper.patch > > > As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in > log4j-1.2.17.jar that we want to avoid. Updating to 3.6.4, making it same as > the dependency from hadoop 3.3.6. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5454) Make ParallelGC the default Garbage Collection
[ https://issues.apache.org/jira/browse/PIG-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5454: -- Attachment: pig-5454-v01.patch This was not as simple as I hoped for. I was incorrectly assuming that when multiple GCs are specified, jvm will pick the last one. Instead, jvm fails to start with bq. Conflicting collector combinations in option list; please refer to the release notes for the combinations allowed Here, attaching a patch that looks at the specified options and only adds "-XX:+UseParallelGC" when other GC is not specified. > Make ParallelGC the default Garbage Collection > -- > > Key: PIG-5454 > URL: https://issues.apache.org/jira/browse/PIG-5454 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5454-v01.patch > > > From JDK9 and beyond, G1GC became the default GC. > I've seen our users hitting OOM after migrating to recent jdk and the issue > going away after reverting back to ParallelGC. > Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (PIG-5454) Make ParallelGC the default Garbage Collection
[ https://issues.apache.org/jira/browse/PIG-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5454: - Assignee: Koji Noguchi > Make ParallelGC the default Garbage Collection > -- > > Key: PIG-5454 > URL: https://issues.apache.org/jira/browse/PIG-5454 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > > From JDK9 and beyond, G1GC became the default GC. > I've seen our users hitting OOM after migrating to recent jdk and the issue > going away after reverting back to ParallelGC. > Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5462) Always update Owasp version to latest
[ https://issues.apache.org/jira/browse/PIG-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5462: -- Attachment: pig-5462-v02.patch Summary: Always update Owasp version to latest (was: Update Owasp version to latest (10.0.3) ) Instead of hard coding the latest version, this will always pull the latest available. Uploaded the v02 patch. bq. Like hadoop-shims-0.10.3 being reported as vulnerable. Unfortunately, this false positive remained. Reading https://nvd.nist.gov/vuln/search/results?form_type=Advanced&results_type=overview&search_type=all&cpe_vendor=cpe%3A%2F%3Aapache&cpe_product=cpe%3A%2F%3Aapache%3Ahadoop&cpe_version=cpe%3A%2F%3Aapache%3Ahadoop%3A0.10.3 it seems like it's showing the vulnerability of hadoop 0.10 version which is completely unrelated here. I'll write a separate patch for ignoring those false positives. > Always update Owasp version to latest > -- > > Key: PIG-5462 > URL: https://issues.apache.org/jira/browse/PIG-5462 > Project: Pig > Issue Type: Test >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5462-v01.patch, pig-5462-v02.patch > > > While looking at owasp report, a lot of them were completely off. > (Like hadoop-shims-0.10.3 being reported as vulnerable.) > Using latest org.owasp/dependency-check-ant > (https://mvnrepository.com/artifact/org.owasp/dependency-check-ant) > seems to help cut down the false positives. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5462) Update Owasp version to latest (10.0.3)
[ https://issues.apache.org/jira/browse/PIG-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5462: -- Attachment: pig-5462-v01.patch > Update Owasp version to latest (10.0.3) > > > Key: PIG-5462 > URL: https://issues.apache.org/jira/browse/PIG-5462 > Project: Pig > Issue Type: Test >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5462-v01.patch > > > While looking at owasp report, a lot of them were completely off. > (Like hadoop-shims-0.10.3 being reported as vulnerable.) > Using latest org.owasp/dependency-check-ant > (https://mvnrepository.com/artifact/org.owasp/dependency-check-ant) > seems to help cut down the false positives. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5462) Update Owasp version to latest (10.0.3)
Koji Noguchi created PIG-5462: - Summary: Update Owasp version to latest (10.0.3) Key: PIG-5462 URL: https://issues.apache.org/jira/browse/PIG-5462 Project: Pig Issue Type: Test Reporter: Koji Noguchi Assignee: Koji Noguchi While looking at owasp report, a lot of them were completely off. (Like hadoop-shims-0.10.3 being reported as vulnerable.) Using latest org.owasp/dependency-check-ant (https://mvnrepository.com/artifact/org.owasp/dependency-check-ant) seems to help cut down the false positives. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5461) E2E environment variables ignored
[ https://issues.apache.org/jira/browse/PIG-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5461: -- Attachment: pig-5461-v01.patch > E2E environment variables ignored > - > > Key: PIG-5461 > URL: https://issues.apache.org/jira/browse/PIG-5461 > Project: Pig > Issue Type: Test >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5461-v01.patch > > > When running e2e against Hadoop3 and using hadoop2+oldpig for verification, I > was confused why environment variables like OLD_HADOOP_HOME were ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5461) E2E environment variables ignored
Koji Noguchi created PIG-5461: - Summary: E2E environment variables ignored Key: PIG-5461 URL: https://issues.apache.org/jira/browse/PIG-5461 Project: Pig Issue Type: Test Reporter: Koji Noguchi Assignee: Koji Noguchi When running e2e against Hadoop3 and using hadoop2+oldpig for verification, I was confused why environment variables like OLD_HADOOP_HOME were ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
[ https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5459: -- Attachment: pig-5459-v01.patch > Second option is to give it up and add the required junit jars to lib dir. > Attaching a patch which does this. > Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3) > > > Key: PIG-5459 > URL: https://issues.apache.org/jira/browse/PIG-5459 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5459-v01.patch > > > {noformat} > turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_ > from org.apache.hadoop.conf import * > java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException; > at java.lang.Class.getDeclaredFields0(Native Method) > at java.lang.Class.privateGetDeclaredFields(Class.java:2583) > at java.lang.Class.privateGetPublicFields(Class.java:2614) > at java.lang.Class.getFields(Class.java:1557) > at org.python.core.PyJavaType.init(PyJavaType.java:419) > at org.python.core.PyType.createType(PyType.java:1523) > at org.python.core.PyType.addFromClass(PyType.java:1462) > at org.python.core.PyType.fromClass(PyType.java:1551) > at > org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77) > at > org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44) > at > org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131) > at org.python.core.Py.java2py(Py.java:2017) > at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86) > at > org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113) > at > org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148) > at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120) > at org.python.core.imp.importAll(imp.java:1189) > at org.python.core.imp.importAll(imp.java:1177) > at > org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8) > at > org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig) > at org.python.core.PyTableCode.call(PyTableCode.java:171) > at org.python.core.PyCode.call(PyCode.java:18) > at org.python.core.Py.runCode(Py.java:1614) > at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296) > at > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217) > at > org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440) > at > org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424) > at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310) > at org.apache.pig.Main.runEmbeddedScript(Main.java:1096) > at org.apache.pig.Main.run(Main.java:584) > at org.apache.pig.Main.main(Main.java:175) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:328) > at org.apache.hadoop.util.RunJar.main(RunJar.java:241) > Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 37 more > java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: > Lorg/junit/rules/ExpectedException; > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5460) Allow Tez to be launched from mapreduce job
[ https://issues.apache.org/jira/browse/PIG-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5460: -- Attachment: pig-5460-v01.patch > Allow Tez to be launched from mapreduce job > --- > > Key: PIG-5460 > URL: https://issues.apache.org/jira/browse/PIG-5460 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Priority: Minor > Attachments: pig-5460-v01.patch > > > It's like Oozie but not using Oozie launcher. > I would like to be able to submit Pig on Tez job from the mapper task. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (PIG-5460) Allow Tez to be launched from mapreduce job
[ https://issues.apache.org/jira/browse/PIG-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5460: - Assignee: Koji Noguchi > Allow Tez to be launched from mapreduce job > --- > > Key: PIG-5460 > URL: https://issues.apache.org/jira/browse/PIG-5460 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5460-v01.patch > > > It's like Oozie but not using Oozie launcher. > I would like to be able to submit Pig on Tez job from the mapper task. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5460) Allow Tez to be launched from mapreduce job
Koji Noguchi created PIG-5460: - Summary: Allow Tez to be launched from mapreduce job Key: PIG-5460 URL: https://issues.apache.org/jira/browse/PIG-5460 Project: Pig Issue Type: Improvement Reporter: Koji Noguchi It's like Oozie but not using Oozie launcher. I would like to be able to submit Pig on Tez job from the mapper task. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
[ https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869647#comment-17869647 ] Koji Noguchi commented on PIG-5459: --- It confused me on why regular run (e2e) is requiring junit jar. It turns out "from org.apache.hadoop.conf import *" line matches classes from test jars that Hadoop3 added as part of regular lib. For example {noformat} /tmp/hadoop-3.3.6/share/hadoop/common/hadoop-common-3.3.6-tests.jar === 0 Sun Jun 18 08:22:40 UTC 2023 org/apache/hadoop/conf/ 2151 Sun Jun 18 08:22:38 UTC 2023 org/apache/hadoop/conf/TestConfigurationDeprecation$1.class 522 Sun Jun 18 08:22:38 UTC 2023 org/apache/hadoop/conf/TestGetInstances$SampleClass.class 2291 Sun Jun 18 08:22:38 UTC 2023 org/apache/hadoop/conf/TestConfigurationDeprecation$2.class 333 Sun Jun 18 08:22:38 UTC 2023 org/apache/hadoop/conf/TestGetInstances$ChildInterface.class 2203 Sun Jun 18 08:22:38 UTC 2023 org/apache/hadoop/conf/TestGetInstances.class 2358 Sun Jun 18 08:22:36 UTC 2023 org/apache/hadoop/conf/TestConfigurationSubclass.class 3335 Sun Jun 18 08:22:36 UTC 2023 org/apache/hadoop/conf/TestDeprecatedKeys.class 71538 Sun Jun 18 08:22:36 UTC 2023 org/apache/hadoop/conf/TestConfiguration.class ... /tmp/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.6-tests.jar === 0 Sun Jun 18 08:42:34 UTC 2023 org/apache/hadoop/conf/ 4469 Sun Jun 18 08:42:34 UTC 2023 org/apache/hadoop/conf/TestNoDefaultsJobConf.class ... {noformat} Now, these classes requires junit. One option is to skip these test jars but that requires changes on the hadoop side (since pig is calling hadoop commandline to start up pig.) Second option is to give it up and add the required junit jars to lib dir. Third option is to skip this test and let users add junit jars if they really need to call "from org.apache.hadoop.conf import *". but it's pretty tough to understand what's happening when users hit this. > Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3) > > > Key: PIG-5459 > URL: https://issues.apache.org/jira/browse/PIG-5459 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > > {noformat} > turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_ > from org.apache.hadoop.conf import * > java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException; > at java.lang.Class.getDeclaredFields0(Native Method) > at java.lang.Class.privateGetDeclaredFields(Class.java:2583) > at java.lang.Class.privateGetPublicFields(Class.java:2614) > at java.lang.Class.getFields(Class.java:1557) > at org.python.core.PyJavaType.init(PyJavaType.java:419) > at org.python.core.PyType.createType(PyType.java:1523) > at org.python.core.PyType.addFromClass(PyType.java:1462) > at org.python.core.PyType.fromClass(PyType.java:1551) > at > org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77) > at > org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44) > at > org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131) > at org.python.core.Py.java2py(Py.java:2017) > at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86) > at > org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113) > at > org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148) > at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120) > at org.python.core.imp.importAll(imp.java:1189) > at org.python.core.imp.importAll(imp.java:1177) > at > org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8) > at > org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig) > at org.python.core.PyTableCode.call(PyTableCode.java:171) > at org.python.core.PyCode.call(PyCode.java:18) > at org.python.core.Py.runCode(Py.java:1614) > at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296) > at > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217) > at > org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440) > at > org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424) > at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310) > at org.apache.pig.Main.runEmbeddedScript(Main.java:1096) >
[jira] [Assigned] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
[ https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5459: - Assignee: Koji Noguchi > Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3) > > > Key: PIG-5459 > URL: https://issues.apache.org/jira/browse/PIG-5459 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > > {noformat} > turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_ > from org.apache.hadoop.conf import * > java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException; > at java.lang.Class.getDeclaredFields0(Native Method) > at java.lang.Class.privateGetDeclaredFields(Class.java:2583) > at java.lang.Class.privateGetPublicFields(Class.java:2614) > at java.lang.Class.getFields(Class.java:1557) > at org.python.core.PyJavaType.init(PyJavaType.java:419) > at org.python.core.PyType.createType(PyType.java:1523) > at org.python.core.PyType.addFromClass(PyType.java:1462) > at org.python.core.PyType.fromClass(PyType.java:1551) > at > org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77) > at > org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44) > at > org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131) > at org.python.core.Py.java2py(Py.java:2017) > at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86) > at > org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113) > at > org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148) > at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120) > at org.python.core.imp.importAll(imp.java:1189) > at org.python.core.imp.importAll(imp.java:1177) > at > org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8) > at > org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig) > at org.python.core.PyTableCode.call(PyTableCode.java:171) > at org.python.core.PyCode.call(PyCode.java:18) > at org.python.core.Py.runCode(Py.java:1614) > at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296) > at > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217) > at > org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440) > at > org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424) > at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310) > at org.apache.pig.Main.runEmbeddedScript(Main.java:1096) > at org.apache.pig.Main.run(Main.java:584) > at org.apache.pig.Main.main(Main.java:175) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:328) > at org.apache.hadoop.util.RunJar.main(RunJar.java:241) > Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 37 more > java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: > Lorg/junit/rules/ExpectedException; > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
Koji Noguchi created PIG-5459: - Summary: Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3) Key: PIG-5459 URL: https://issues.apache.org/jira/browse/PIG-5459 Project: Pig Issue Type: Bug Reporter: Koji Noguchi {noformat} turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_ from org.apache.hadoop.conf import * java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException; at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2583) at java.lang.Class.privateGetPublicFields(Class.java:2614) at java.lang.Class.getFields(Class.java:1557) at org.python.core.PyJavaType.init(PyJavaType.java:419) at org.python.core.PyType.createType(PyType.java:1523) at org.python.core.PyType.addFromClass(PyType.java:1462) at org.python.core.PyType.fromClass(PyType.java:1551) at org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77) at org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44) at org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131) at org.python.core.Py.java2py(Py.java:2017) at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86) at org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113) at org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148) at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120) at org.python.core.imp.importAll(imp.java:1189) at org.python.core.imp.importAll(imp.java:1177) at org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8) at org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig) at org.python.core.PyTableCode.call(PyTableCode.java:171) at org.python.core.PyCode.call(PyCode.java:18) at org.python.core.Py.runCode(Py.java:1614) at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296) at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217) at org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440) at org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424) at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310) at org.apache.pig.Main.runEmbeddedScript(Main.java:1096) at org.apache.pig.Main.run(Main.java:584) at org.apache.pig.Main.main(Main.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:328) at org.apache.hadoop.util.RunJar.main(RunJar.java:241) Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 37 more java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException; {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5459) Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3)
[ https://issues.apache.org/jira/browse/PIG-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5459: -- Priority: Minor (was: Major) > Jython_Checkin_3 e2e failing with NoClassDefFoundError (hadoop3) > > > Key: PIG-5459 > URL: https://issues.apache.org/jira/browse/PIG-5459 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Priority: Minor > > {noformat} > turing_jython.conf/Jython_Checkin_3.pig", line 4, in _module_ > from org.apache.hadoop.conf import * > java.lang.NoClassDefFoundError: Lorg/junit/rules/ExpectedException; > at java.lang.Class.getDeclaredFields0(Native Method) > at java.lang.Class.privateGetDeclaredFields(Class.java:2583) > at java.lang.Class.privateGetPublicFields(Class.java:2614) > at java.lang.Class.getFields(Class.java:1557) > at org.python.core.PyJavaType.init(PyJavaType.java:419) > at org.python.core.PyType.createType(PyType.java:1523) > at org.python.core.PyType.addFromClass(PyType.java:1462) > at org.python.core.PyType.fromClass(PyType.java:1551) > at > org.python.core.adapter.ClassicPyObjectAdapter$6.adapt(ClassicPyObjectAdapter.java:77) > at > org.python.core.adapter.ExtensiblePyObjectAdapter.adapt(ExtensiblePyObjectAdapter.java:44) > at > org.python.core.adapter.ClassicPyObjectAdapter.adapt(ClassicPyObjectAdapter.java:131) > at org.python.core.Py.java2py(Py.java:2017) > at org.python.core.PyJavaPackage.addClass(PyJavaPackage.java:86) > at > org.python.core.packagecache.PackageManager.basicDoDir(PackageManager.java:113) > at > org.python.core.packagecache.SysPackageManager.doDir(SysPackageManager.java:148) > at org.python.core.PyJavaPackage.fillDir(PyJavaPackage.java:120) > at org.python.core.imp.importAll(imp.java:1189) > at org.python.core.imp.importAll(imp.java:1177) > at > org.python.pycode._pyx0.f$0(/tmp/yarn-local/usercache/.../gtrain-1722336537-turing_jython.conf/Jython_Checkin_3.pig:8) > at > org.python.pycode._pyx0.call_function(/tmp/yarn-local/usercache...gtrain-1722336537-tu/ring_jython.conf/Jython_Checkin_3.pig) > at org.python.core.PyTableCode.call(PyTableCode.java:171) > at org.python.core.PyCode.call(PyCode.java:18) > at org.python.core.Py.runCode(Py.java:1614) > at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296) > at > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:217) > at > org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:440) > at > org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:424) > at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:310) > at org.apache.pig.Main.runEmbeddedScript(Main.java:1096) > at org.apache.pig.Main.run(Main.java:584) > at org.apache.pig.Main.main(Main.java:175) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:328) > at org.apache.hadoop.util.RunJar.main(RunJar.java:241) > Caused by: java.lang.ClassNotFoundException: org.junit.rules.ExpectedException > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 37 more > java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: > Lorg/junit/rules/ExpectedException; > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5458) Update metrics-core.version
[ https://issues.apache.org/jira/browse/PIG-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869453#comment-17869453 ] Koji Noguchi commented on PIG-5458: --- Forgot to mention, after the change in PIG-5456, I noticed Pig on MR/Tez jobs were relying on metrics jar from Spark.Thus this patch. > Update metrics-core.version > > > Key: PIG-5458 > URL: https://issues.apache.org/jira/browse/PIG-5458 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5458-v01.patch > > > Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics > and > Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics. > I believe one from com.yammer.metrics (2.1.2) can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5456) Upgrade Spark to 3.4.3
[ https://issues.apache.org/jira/browse/PIG-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869452#comment-17869452 ] Koji Noguchi commented on PIG-5456: --- In summary, changes to classloading is, for bin/pig and unit tests, * MR/Tez jobs will stop using jars from spark directory. * For Spark3, it would stop using reload4j (and orc-core after PIG-5457) Former led to PIG-5458 where I noticed Pig on MR/Tez were relying on metrics jar from Spark. > Upgrade Spark to 3.4.3 > -- > > Key: PIG-5456 > URL: https://issues.apache.org/jira/browse/PIG-5456 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5456-v01.patch, pig-5456-v02.patch > > > Major blocker for upgrading to Spark 3.4.3 was Spark started using log4j2. > Simple upgrade failing a lot of tests with > {noformat} > java.lang.VerifyError: class org.apache.log4j.bridge.LogEventAdapter > overrides final method getTimeStamp.()J {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5458) Update metrics-core.version
[ https://issues.apache.org/jira/browse/PIG-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5458: -- Attachment: pig-5458-v01.patch > Update metrics-core.version > > > Key: PIG-5458 > URL: https://issues.apache.org/jira/browse/PIG-5458 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5458-v01.patch > > > Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics > and > Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics. > I believe one from com.yammer.metrics (2.1.2) can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (PIG-5458) Update metrics-core.version
[ https://issues.apache.org/jira/browse/PIG-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5458: - Assignee: Koji Noguchi > Update metrics-core.version > > > Key: PIG-5458 > URL: https://issues.apache.org/jira/browse/PIG-5458 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5458-v01.patch > > > Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics > and > Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics. > I believe one from com.yammer.metrics (2.1.2) can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5458) Update metrics-core.version
Koji Noguchi created PIG-5458: - Summary: Update metrics-core.version Key: PIG-5458 URL: https://issues.apache.org/jira/browse/PIG-5458 Project: Pig Issue Type: Improvement Reporter: Koji Noguchi Hadoop3 uses metrics-core.version of 3.2.4 from io.dropwizard.metrics and Hadoop2 uses metrics-core.version of 3.0.1 from com.codahale.metrics. I believe one from com.yammer.metrics (2.1.2) can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing
[ https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5451: -- Attachment: pig-9-5451-v01.patch {quote} This was caused by conflict of orc.version. ./build/ivy/lib/Pig/orc-core-1.5.6.jar ./lib/h3/orc-core-1.5.6.jar and spark/jars/orc-core-1.6.14.jar {quote} After upgrading Spark to 3.4.3 in PIG-5456, conflict changes a bit. When downloading spark-core 3.4.3 through ivy, no orc-core dependency. But, when downloading spark-3.4.3-bin-without-hadoop.tgz from Apache, it contains orc-core-1.8.7-shaded-protobuf.jar and orc-mapreduce-1.8.7-shaded-protobuf.jar. In order to make them consistent, adding extra pulls and adding steps to skip orc-1.5.6 jars (just like we do with reload4j jars in PIG-5456) for Spark3. (pig-9-5451-v01.patch) > Pig-on-Spark3 E2E Orc_Pushdown_5 failing > - > > Key: PIG-5451 > URL: https://issues.apache.org/jira/browse/PIG-5451 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-9-5451-v01.patch > > > Test failing with > "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate > cannot access its superclass org.threeten.extra.chrono.AbstractDate" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5456) Upgrade Spark to 3.4.3
[ https://issues.apache.org/jira/browse/PIG-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5456: -- Attachment: pig-5456-v02.patch > log4j-1.2.17.jar was coming from stale zookeeper. Will create a new Jira to > update the dependency. Created PIG-5457 > As for how to skip reload4j > One option I considered was to move the reload4j to a different directory and only pick it up for non-spark3 jobs. This may work if the way to start up pig was only from bin/pig or unit/e2e tests. However, given we don't know if users have such custom startup script(s), taking another approach. Leaving the reload4j jar in the same location but explicitly skipping it from bin/pig and build.xml(unit) tests. This way, only Pig-on-spark jobs are affected leaving the rest untouched. (pig-5456-v02.patch) > Upgrade Spark to 3.4.3 > -- > > Key: PIG-5456 > URL: https://issues.apache.org/jira/browse/PIG-5456 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5456-v01.patch, pig-5456-v02.patch > > > Major blocker for upgrading to Spark 3.4.3 was Spark started using log4j2. > Simple upgrade failing a lot of tests with > {noformat} > java.lang.VerifyError: class org.apache.log4j.bridge.LogEventAdapter > overrides final method getTimeStamp.()J {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.6.4 (from 3.5.7)
[ https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5457: -- Fix Version/s: 0.19.0 > Upgrade Zookeeper to 3.6.4 (from 3.5.7) > --- > > Key: PIG-5457 > URL: https://issues.apache.org/jira/browse/PIG-5457 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5457-v01.patch > > > As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in > log4j-1.2.17.jar that we want to avoid. Updating to 3.6.4, making it same as > the dependency from hadoop 3.3.6. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5457) Upgrade Zookeeper to 3.6.4 (from 3.5.7)
[ https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5457: -- Attachment: pig-5457-v01.patch > Upgrade Zookeeper to 3.6.4 (from 3.5.7) > --- > > Key: PIG-5457 > URL: https://issues.apache.org/jira/browse/PIG-5457 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Priority: Trivial > Attachments: pig-5457-v01.patch > > > As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in > log4j-1.2.17.jar that we want to avoid. Updating to 3.6.4, making it same as > the dependency from hadoop 3.3.6. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (PIG-5457) Upgrade Zookeeper to 3.6.4 (from 3.5.7)
[ https://issues.apache.org/jira/browse/PIG-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5457: - Assignee: Koji Noguchi > Upgrade Zookeeper to 3.6.4 (from 3.5.7) > --- > > Key: PIG-5457 > URL: https://issues.apache.org/jira/browse/PIG-5457 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5457-v01.patch > > > As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in > log4j-1.2.17.jar that we want to avoid. Updating to 3.6.4, making it same as > the dependency from hadoop 3.3.6. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5457) Upgrade Zookeeper to 3.6.4 (from 3.5.7)
Koji Noguchi created PIG-5457: - Summary: Upgrade Zookeeper to 3.6.4 (from 3.5.7) Key: PIG-5457 URL: https://issues.apache.org/jira/browse/PIG-5457 Project: Pig Issue Type: Improvement Reporter: Koji Noguchi As mentioned in PIG-5456, zookeeper-3.5.7 dependency pulls in log4j-1.2.17.jar that we want to avoid. Updating to 3.6.4, making it same as the dependency from hadoop 3.3.6. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
[ https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5455. --- Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > Upgrade Hadoop to 3.3.6 and Tez to 0.10.3 > - > > Key: PIG-5455 > URL: https://issues.apache.org/jira/browse/PIG-5455 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5455-v01.patch > > > Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later > and simple upgrade of Hadoop failing the tests with > "Implementing class java.lang.IncompatibleClassChangeError: Implementing > class" > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5428) Update hadoop2,3 and tez to recent versions
[ https://issues.apache.org/jira/browse/PIG-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863944#comment-17863944 ] Koji Noguchi commented on PIG-5428: --- > Setting tez.runtime.transfer.data-via-events.enabled to false helped but not > sure where > the problem is on. Pig? Tez? > It was due to a way how Pig uses Tez different from Hive. Hopefully handled in https://issues.apache.org/jira/browse/TEZ-4570. > Update hadoop2,3 and tez to recent versions > --- > > Key: PIG-5428 > URL: https://issues.apache.org/jira/browse/PIG-5428 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.18.0 > > Attachments: pig-5428-v01.patch > > > PIG-5253 hadoop3 patch is committed. > Now, updating hadoop2&3, tez and other dependent library versions. > Only testing using two different parameters. > * -Dhbaseversion=2 -Dhadoopversion=2 -Dhiveversion=1 -Dsparkversion=2 > and > * -Dhbaseversion=2 -Dhadoopversion=3 -Dhiveversion=3 -Dsparkversion=2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
[ https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863942#comment-17863942 ] Koji Noguchi commented on PIG-5455: --- Forgot to mention, I learned that disabling of tez.runtime.transfer.data-via-events.enabled done in PIG-5428 was necessary due to a bug reported in https://issues.apache.org/jira/browse/TEZ-4570. But somehow e2e tests were still not setting this flag. Moved the disabling of tez.runtime.transfer.data-via-events.enabled from TezLauncher&TezMiniCluster to TezDagBuilder to enforce this configuration. > Upgrade Hadoop to 3.3.6 and Tez to 0.10.3 > - > > Key: PIG-5455 > URL: https://issues.apache.org/jira/browse/PIG-5455 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5455-v01.patch > > > Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later > and simple upgrade of Hadoop failing the tests with > "Implementing class java.lang.IncompatibleClassChangeError: Implementing > class" > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5456) Upgrade Spark to 3.4.3
[ https://issues.apache.org/jira/browse/PIG-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5456: -- Attachment: pig-5456-v01.patch It turns out log4j2 already provides log4j1 compatibility through "Log4j 1.x bridge" from [https://logging.apache.org/log4j/2.x/manual/migration.html.] But this spark/log4j-1.2-api-2.19.0.jar conflicted with reload4j-1.2.24.jar resulting in the error provided in the description. So far, work around in running tests were to delete reload4j-1.2.24.jar (and log4j-1.2.17.jar). log4j-1.2.17.jar was coming from stale zookeeper. Will create a new Jira to update the dependency. As for how to skip reload4j, [~rohini] , do you have any suggestions? In addition, this log4j-1.2-api-2.19.0.jar implementation seems to have a bug in use of SimpleLayout. Reported at [https://github.com/apache/logging-log4j2/issues/2722.] For now, replacing them with PatternLayout. Also, log4j-api-2.19.0.jar from Spark somehow having a bug. Updating to log4j-1.2-api-2.23.1.jar worked. Patch uploaded. > Upgrade Spark to 3.4.3 > -- > > Key: PIG-5456 > URL: https://issues.apache.org/jira/browse/PIG-5456 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5456-v01.patch > > > Major blocker for upgrading to Spark 3.4.3 was Spark started using log4j2. > Simple upgrade failing a lot of tests with > {noformat} > java.lang.VerifyError: class org.apache.log4j.bridge.LogEventAdapter > overrides final method getTimeStamp.()J {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5456) Upgrade Spark to 3.4.3
Koji Noguchi created PIG-5456: - Summary: Upgrade Spark to 3.4.3 Key: PIG-5456 URL: https://issues.apache.org/jira/browse/PIG-5456 Project: Pig Issue Type: Improvement Components: spark Reporter: Koji Noguchi Assignee: Koji Noguchi Fix For: 0.19.0 Major blocker for upgrading to Spark 3.4.3 was Spark started using log4j2. Simple upgrade failing a lot of tests with {noformat} java.lang.VerifyError: class org.apache.log4j.bridge.LogEventAdapter overrides final method getTimeStamp.()J {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
[ https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863929#comment-17863929 ] Rohini Palaniswamy commented on PIG-5455: - +1 > Upgrade Hadoop to 3.3.6 and Tez to 0.10.3 > - > > Key: PIG-5455 > URL: https://issues.apache.org/jira/browse/PIG-5455 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5455-v01.patch > > > Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later > and simple upgrade of Hadoop failing the tests with > "Implementing class java.lang.IncompatibleClassChangeError: Implementing > class" > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
[ https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863927#comment-17863927 ] Koji Noguchi commented on PIG-5455: --- > "Implementing class java.lang.IncompatibleClassChangeError: Implementing > class" > This error was coming from incompatible mockito version. " Pig using mockito 1.8.4, Hadoop-3.3.6 using mockito 2.28.2. Uploading the patch which upgrades hadoop and tez, pulls the new dependencies and updates test that uses alternative Whitebox implementation provided in Hadoop which went away as part of the mockito upgrade. > Upgrade Hadoop to 3.3.6 and Tez to 0.10.3 > - > > Key: PIG-5455 > URL: https://issues.apache.org/jira/browse/PIG-5455 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5455-v01.patch > > > Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later > and simple upgrade of Hadoop failing the tests with > "Implementing class java.lang.IncompatibleClassChangeError: Implementing > class" > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
[ https://issues.apache.org/jira/browse/PIG-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5455: -- Attachment: pig-5455-v01.patch > Upgrade Hadoop to 3.3.6 and Tez to 0.10.3 > - > > Key: PIG-5455 > URL: https://issues.apache.org/jira/browse/PIG-5455 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5455-v01.patch > > > Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later > and simple upgrade of Hadoop failing the tests with > "Implementing class java.lang.IncompatibleClassChangeError: Implementing > class" > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5455) Upgrade Hadoop to 3.3.6 and Tez to 0.10.3
Koji Noguchi created PIG-5455: - Summary: Upgrade Hadoop to 3.3.6 and Tez to 0.10.3 Key: PIG-5455 URL: https://issues.apache.org/jira/browse/PIG-5455 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Assignee: Koji Noguchi Fix For: 0.19.0 Latest Tez (0.10.3 and later) requires Hadoop 3.3 or later and simple upgrade of Hadoop failing the tests with "Implementing class java.lang.IncompatibleClassChangeError: Implementing class" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reopened PIG-5453: --- While tracking multiple jiras, I missed that this patch was not put through full unit/e2e tests. (Thus the previous syntax error.) After fixing the simple syntax error, saw a couple of regression test failures. At this point, reverting the patch while I debug and come up with a new patch. So sorry. > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch, pig-5453-v02.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847097#comment-17847097 ] Daniel Dai commented on PIG-5453: - +1 > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch, pig-5453-v02.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847095#comment-17847095 ] Koji Noguchi commented on PIG-5453: --- Sorry, original patch had extra comma causing the compile error for TestFlatten.java. Uploaded pig-5453-v02.patch. To fix the broken trunk, I pushed the change. > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch, pig-5453-v02.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5453: -- Attachment: pig-5453-v02.patch > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch, pig-5453-v02.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5453. --- Fix Version/s: 0.19.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Daniel! Committed to trunk. > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)
[ https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5452. --- Fix Version/s: 0.19.0 Resolution: Fixed Thanks for the review Daniel! Committed to trunk. > Null handling of FLATTEN with user defined schema (as clause) > - > > Key: PIG-5452 > URL: https://issues.apache.org/jira/browse/PIG-5452 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5452-v01.patch > > > Follow up from PIG-5201, > {code:java} > A = load 'input' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 > as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2), a3; > dump C;{code} > This produces right number of nulls. > {code:java} > (a,,,a) > (b,,,b) > (c,,,c) > (d,,,d) > (f,,,f) {code} > > However, > {code:java} > A = load 'input.txt' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; > dump C;{code} > This produces wrong number of null and the output is shifted incorrectly. > {code:java} > (a,,a,) > (b,,b,) > (c,,c,) > (d,,d,) > (f,,f,) {code} > Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of > tuple() with empty inner fields but with user defined schema of "as > (A1:chararray, A2:chararray)". > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
[ https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5450. --- Fix Version/s: 0.19.0 Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type > -- > > Key: PIG-5450 > URL: https://issues.apache.org/jira/browse/PIG-5450 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5450-v01.patch > > > {noformat} > Caused by: java.lang.VerifyError: Bad return type > Exception Details: > Location: > org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; > @117: areturn > Reason: > Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, > stack[0]) is not assignable to > 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
[ https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5446. --- Fix Version/s: 0.19.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing > --- > > Key: PIG-5446 > URL: https://issues.apache.org/jira/browse/PIG-5446 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5446-v01.patch > > > {noformat} > Unable to open iterator for alias B. Backend error : Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:0 > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, > vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, > taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Attempt failed because it appears to make no progress for > 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to > make no progress for 1ms]], Vertex did not succeed due to > OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at org.apache.pig.PigServer.openIterator(PigServer.java:1014) > at > org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58) > Caused by: org.apache.tez.dag.api.TezException: Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 45.647 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5448. --- Fix Version/s: 0.19.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > All TestHBaseStorage tests failing on pig-on-spark3 > --- > > Key: PIG-5448 > URL: https://issues.apache.org/jira/browse/PIG-5448 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.19.0 > > Attachments: pig-5448-v01.patch > > > For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are > failing with > {noformat} > org.apache.pig.PigException: ERROR 1002: Unable to store alias b > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at > org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.RuntimeException: No task metrics available for jobId 0 > at > org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5447. --- Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with > NoSuchElementException > --- > > Key: PIG-5447 > URL: https://issues.apache.org/jira/browse/PIG-5447 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5447-v01.patch > > > TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer > and full-outer joins. > "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5447: -- Fix Version/s: 0.19.0 > Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with > NoSuchElementException > --- > > Key: PIG-5447 > URL: https://issues.apache.org/jira/browse/PIG-5447 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5447-v01.patch > > > TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer > and full-outer joins. > "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5439) Support Spark 3 and drop SparkShim
[ https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5439. --- Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk > Support Spark 3 and drop SparkShim > -- > > Key: PIG-5439 > URL: https://issues.apache.org/jira/browse/PIG-5439 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5439-v01.patch, pig-5439-v02.patch > > > Support Pig-on-Spark to run on spark3. > Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. > This is due to log4j mismatch. > After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher. > So far, not all unit/e2e tests pass with the proposed patch but at least > compilation goes through. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2
[ https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5438: -- Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the review Rohini! Committed to trunk. > Update SparkCounter.Accumulator to AccumulatorV2 > > > Key: PIG-5438 > URL: https://issues.apache.org/jira/browse/PIG-5438 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5438-v01.patch > > > Original Accumulator is deprecated in Spark2 and gone in Spark3. > AccumulatorV2 is usable on both Spark2 and Spark3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"
[ https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-5416. --- Fix Version/s: 0.19.0 Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review Rohini! Committed to trunk. > Spark unit tests failing randomly with "java.lang.RuntimeException: > Unexpected job execution status RUNNING" > > > Key: PIG-5416 > URL: https://issues.apache.org/jira/browse/PIG-5416 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.19.0 > > Attachments: pig-5416-v01.patch > > > Spark unit tests fail randomly with same errors. > Sample stack trace showing "Caused by: java.lang.RuntimeException: > Unexpected job execution status RUNNING". > {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF} > Unable to store alias B > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to > store alias B > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783) > at org.apache.pig.PigServer.registerQuery(PigServer.java:708) > at org.apache.pig.PigServer.registerQuery(PigServer.java:721) > at > org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.execute(PigServer.java:1453) > at org.apache.pig.PigServer.access$500(PigServer.java:119) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778) > Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5439) Support Spark 3 and drop SparkShim
[ https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844427#comment-17844427 ] Rohini Palaniswamy commented on PIG-5439: - +1 > Support Spark 3 and drop SparkShim > -- > > Key: PIG-5439 > URL: https://issues.apache.org/jira/browse/PIG-5439 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5439-v01.patch, pig-5439-v02.patch > > > Support Pig-on-Spark to run on spark3. > Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. > This is due to log4j mismatch. > After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher. > So far, not all unit/e2e tests pass with the proposed patch but at least > compilation goes through. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838865#comment-17838865 ] Daniel Dai commented on PIG-5453: - +1 > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5453-v01.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5454) Make ParallelGC the default Garbage Collection
Koji Noguchi created PIG-5454: - Summary: Make ParallelGC the default Garbage Collection Key: PIG-5454 URL: https://issues.apache.org/jira/browse/PIG-5454 Project: Pig Issue Type: Bug Components: impl Reporter: Koji Noguchi >From JDK9 and beyond, G1GC became the default GC. I've seen our users hitting OOM after migrating to recent jdk and the issue going away after reverting back to ParallelGC. Maybe the GC behavior assumed by SelfSpillBag does not work with G1GC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5453: -- Attachment: pig-5453-v01.patch Uploading the patch that uses a new field introduced as part of PIG-5201, PIG-5452. If number of fields are less than the expected number of fields, it will now fills the rest with null. If number of fields are more, then it would now fills up to the expected number of fields only. (pig-5453-v01.patch) > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5453-v01.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
[ https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837863#comment-17837863 ] Rohini Palaniswamy commented on PIG-5450: - +1 > Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type > -- > > Key: PIG-5450 > URL: https://issues.apache.org/jira/browse/PIG-5450 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5450-v01.patch > > > {noformat} > Caused by: java.lang.VerifyError: Bad return type > Exception Details: > Location: > org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; > @117: areturn > Reason: > Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, > stack[0]) is not assignable to > 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837862#comment-17837862 ] Rohini Palaniswamy commented on PIG-5449: - +1 > TestEmptyInputDir failing on pig-on-spark3 > -- > > Key: PIG-5449 > URL: https://issues.apache.org/jira/browse/PIG-5449 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5449-v01.patch > > > TestEmptyInputDir failing on pig-on-spark3 with > {noformat:title=TestEmptyInputDir.testMergeJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141) > {noformat} > {noformat:title=TestEmptyInputDir.testGroupByFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80) > {noformat} > {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297) > {noformat} > {noformat:title=TestEmptyInputDir.testFRJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171) > {noformat} > {noformat:title=TestEmptyInputDir.testBloomJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267) > {noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837861#comment-17837861 ] Rohini Palaniswamy commented on PIG-5448: - +1 > All TestHBaseStorage tests failing on pig-on-spark3 > --- > > Key: PIG-5448 > URL: https://issues.apache.org/jira/browse/PIG-5448 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5448-v01.patch > > > For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are > failing with > {noformat} > org.apache.pig.PigException: ERROR 1002: Unable to store alias b > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at > org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.RuntimeException: No task metrics available for jobId 0 > at > org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5438) Update SparkCounter.Accumulator to AccumulatorV2
[ https://issues.apache.org/jira/browse/PIG-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837860#comment-17837860 ] Rohini Palaniswamy commented on PIG-5438: - +1 > Update SparkCounter.Accumulator to AccumulatorV2 > > > Key: PIG-5438 > URL: https://issues.apache.org/jira/browse/PIG-5438 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: 0.19.0 > > Attachments: pig-5438-v01.patch > > > Original Accumulator is deprecated in Spark2 and gone in Spark3. > AccumulatorV2 is usable on both Spark2 and Spark3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5453) FLATTEN shifting fields incorrectly
Koji Noguchi created PIG-5453: - Summary: FLATTEN shifting fields incorrectly Key: PIG-5453 URL: https://issues.apache.org/jira/browse/PIG-5453 Project: Pig Issue Type: Bug Components: impl Reporter: Koji Noguchi Assignee: Koji Noguchi Follow up from PIG-5201, PIG-5452. When flatten-ed tuple has less or more fields than specified, entire fields shift incorrectly. Input {noformat} A (a,b,c) B (a,b,c) C (a,b,c) Y (a,b) Z (a,b,c,d,e,f) E{noformat} Script {code:java} A = load 'input.txt' as (a1:chararray, a2:tuple()); B = FOREACH A GENERATE a1, FLATTEN(a2) as (b1:chararray,b2:chararray,b3:chararray), a1 as a4; dump B; {code} Incorrect results {noformat} (A,a,b,c,A) (B,a,b,c,B) (C,a,b,c,C) (Y,a,b,Y,) (Z,a,b,c,d) (EE){noformat} E is correct. It's fixed as part of PIG-5201, PIG-5452. Y has shifted a4(Y) to the left incorrectly. Should have been (Y,a,b,,Y) Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). Should have been (Z,a,b,c,Z). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)
[ https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5452: -- Description: Follow up from PIG-5201, {code:java} A = load 'input' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2), a3; dump C;{code} This produces right number of nulls. {code:java} (a,,,a) (b,,,b) (c,,,c) (d,,,d) (f,,,f) {code} However, {code:java} A = load 'input.txt' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; dump C;{code} This produces wrong number of null and the output is shifted incorrectly. {code:java} (a,,a,) (b,,b,) (c,,c,) (d,,d,) (f,,f,) {code} Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of tuple() with empty inner fields but with user defined schema of "as (A1:chararray, A2:chararray)". was: Follow up from PIG-5201, {code:java} A = load 'input' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2), a3; dump C;{code} This produces right number of nulls. {code:java} (a,,,a) (b,,,b) (c,,,c) (d,,,d) (f,,,f) {code} However, {code:java} A = load 'input.txt' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; dump C;{code} This produces wrong number of null and the output is shifted incorrectly. {code:java} (a,,a,) (b,,b,) (c,,c,) (d,,d,) (f,,f,) {code} Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of tuple() with empty inner fields. > Null handling of FLATTEN with user defined schema (as clause) > - > > Key: PIG-5452 > URL: https://issues.apache.org/jira/browse/PIG-5452 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5452-v01.patch > > > Follow up from PIG-5201, > {code:java} > A = load 'input' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 > as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2), a3; > dump C;{code} > This produces right number of nulls. > {code:java} > (a,,,a) > (b,,,b) > (c,,,c) > (d,,,d) > (f,,,f) {code} > > However, > {code:java} > A = load 'input.txt' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; > dump C;{code} > This produces wrong number of null and the output is shifted incorrectly. > {code:java} > (a,,a,) > (b,,b,) > (c,,c,) > (d,,d,) > (f,,f,) {code} > Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of > tuple() with empty inner fields but with user defined schema of "as > (A1:chararray, A2:chararray)". > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)
[ https://issues.apache.org/jira/browse/PIG-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5452: -- Attachment: pig-5452-v01.patch Instead of relying on innerfield schema, using the output schema which combines schema of data and user-defined schema. > Null handling of FLATTEN with user defined schema (as clause) > - > > Key: PIG-5452 > URL: https://issues.apache.org/jira/browse/PIG-5452 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5452-v01.patch > > > Follow up from PIG-5201, > {code:java} > A = load 'input' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 > as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2), a3; > dump C;{code} > This produces right number of nulls. > {code:java} > (a,,,a) > (b,,,b) > (c,,,c) > (d,,,d) > (f,,,f) {code} > > However, > {code:java} > A = load 'input.txt' as (a1:chararray); > B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; > C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; > dump C;{code} > This produces wrong number of null and the output is shifted incorrectly. > {code:java} > (a,,a,) > (b,,b,) > (c,,c,) > (d,,d,) > (f,,f,) {code} > Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of > tuple() with empty inner fields. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5452) Null handling of FLATTEN with user defined schema (as clause)
Koji Noguchi created PIG-5452: - Summary: Null handling of FLATTEN with user defined schema (as clause) Key: PIG-5452 URL: https://issues.apache.org/jira/browse/PIG-5452 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Assignee: Koji Noguchi Follow up from PIG-5201, {code:java} A = load 'input' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(A1:chararray, A2:chararray), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2), a3; dump C;{code} This produces right number of nulls. {code:java} (a,,,a) (b,,,b) (c,,,c) (d,,,d) (f,,,f) {code} However, {code:java} A = load 'input.txt' as (a1:chararray); B = FOREACH A GENERATE a1, null as a2:tuple(), a1 as a3; C = FOREACH B GENERATE a1, FLATTEN(a2) as (A1:chararray, A2:chararray), a3; dump C;{code} This produces wrong number of null and the output is shifted incorrectly. {code:java} (a,,a,) (b,,b,) (c,,c,) (d,,d,) (f,,f,) {code} Difference here is, for the latter, a2 in "FLATTEN(a2)" only has schema of tuple() with empty inner fields. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"
[ https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi reassigned PIG-5416: - Assignee: Koji Noguchi > Spark unit tests failing randomly with "java.lang.RuntimeException: > Unexpected job execution status RUNNING" > > > Key: PIG-5416 > URL: https://issues.apache.org/jira/browse/PIG-5416 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5416-v01.patch > > > Spark unit tests fail randomly with same errors. > Sample stack trace showing "Caused by: java.lang.RuntimeException: > Unexpected job execution status RUNNING". > {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF} > Unable to store alias B > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to > store alias B > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783) > at org.apache.pig.PigServer.registerQuery(PigServer.java:708) > at org.apache.pig.PigServer.registerQuery(PigServer.java:721) > at > org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.execute(PigServer.java:1453) > at org.apache.pig.PigServer.access$500(PigServer.java:119) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778) > Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing
[ https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832323#comment-17832323 ] Koji Noguchi commented on PIG-5451: --- This was caused by conflict of orc.version. ./build/ivy/lib/Pig/orc-core-1.5.6.jar ./lib/h3/orc-core-1.5.6.jar and spark/jars/orc-core-1.6.14.jar > Pig-on-Spark3 E2E Orc_Pushdown_5 failing > - > > Key: PIG-5451 > URL: https://issues.apache.org/jira/browse/PIG-5451 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > > Test failing with > "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate > cannot access its superclass org.threeten.extra.chrono.AbstractDate" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing
Koji Noguchi created PIG-5451: - Summary: Pig-on-Spark3 E2E Orc_Pushdown_5 failing Key: PIG-5451 URL: https://issues.apache.org/jira/browse/PIG-5451 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Assignee: Koji Noguchi Test failing with "java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate cannot access its superclass org.threeten.extra.chrono.AbstractDate" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5451) Pig-on-Spark3 E2E Orc_Pushdown_5 failing
[ https://issues.apache.org/jira/browse/PIG-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832320#comment-17832320 ] Koji Noguchi commented on PIG-5451: --- Full stack trace. {noformat} 2024-03-29 10:57:31,787 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - ResultStage 3 (runJob at SparkHadoopWriter.scala:83) failed in 36.126 s due to Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 8) (gsrd479n10.red.ygrid.yahoo.com executor 4): java.lang.IllegalAccessError: class org.threeten.extra.chrono.HybridDate cannot access its superclass org.threeten.extra.chrono.AbstractDate at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:46) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.threeten.extra.chrono.HybridChronology.date(HybridChronology.java:235) at org.threeten.extra.chrono.HybridChronology.date(HybridChronology.java:88) at java.time.chrono.AbstractChronology.resolveYMD(AbstractChronology.java:563) at java.time.chrono.AbstractChronology.resolveDate(AbstractChronology.java:472) at org.threeten.extra.chrono.HybridChronology.resolveDate(HybridChronology.java:452) at org.threeten.extra.chrono.HybridChronology.resolveDate(HybridChronology.java:88) at java.time.format.Parsed.resolveDateFields(Parsed.java:351) at java.time.format.Parsed.resolveFields(Parsed.java:257) at java.time.format.Parsed.resolve(Parsed.java:244) at java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331) at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1955) at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777) at org.apache.orc.impl.DateUtils._clinit_(DateUtils.java:74) at org.apache.orc.impl.ColumnStatisticsImpl$TimestampStatisticsImpl._init_(ColumnStatisticsImpl.java:1683) at org.apache.orc.impl.ColumnStatisticsImpl.deserialize(ColumnStatisticsImpl.java:2131) at org.apache.orc.impl.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:522) at org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:1045) at org.apache.orc.impl.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:1117) at org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1137) at org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1187) at org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1222) at org.apache.orc.impl.RecordReaderImpl._init_(RecordReaderImpl.java:254) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl._init_(RecordReaderImpl.java:67) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:83) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:337) at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat$OrcRecordReader._init_(OrcNewInputFormat.java:72) at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.createRecordReader(OrcNewInputFormat.java:57) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:255) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader._init_(PigRecordReader.java:126) at org.apache.pig.backend.hadoop.executionengine.spark.SparkPigRecordReader._init_(SparkPigRecordReader.java:44) at org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark$SparkRecordReaderFactory.createRecordReader(PigInputFormatSpark.java:131) at org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark.createRecordReader(PigInputFormatSpark.java:71) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:215) at org.apache.spark.rdd.NewHadoopRDD$$anon$1._init_(NewHadoopRDD.scala:213) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:168) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iter
[jira] [Updated] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
[ https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5450: -- Attachment: pig-5450-v01.patch It turns out the weird error was coming from conflicting jar. {{./build/ivy/lib/Pig/hive-storage-api-2.7.0.jar}} and {{spark/spark/jars/hive-storage-api-2.7.2.jar}} Uploading a patch updating hive-storage-api version. > Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type > -- > > Key: PIG-5450 > URL: https://issues.apache.org/jira/browse/PIG-5450 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5450-v01.patch > > > {noformat} > Caused by: java.lang.VerifyError: Bad return type > Exception Details: > Location: > org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; > @117: areturn > Reason: > Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, > stack[0]) is not assignable to > 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
[ https://issues.apache.org/jira/browse/PIG-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832318#comment-17832318 ] Koji Noguchi commented on PIG-5450: --- Weird full trace. {noformat} 024-03-27 10:50:40,088 [task-result-getter-0] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0) (gsrd238n05.red.ygrid.yahoo.com executor 1): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:163) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.VerifyError: Bad return type Exception Details: Location: org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; @117: areturn Reason: Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, stack[0]) is not assignable to 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) Current Frame: bci: @117 flags: { } locals: { 'org/apache/orc/TypeDescription', 'org/apache/orc/TypeDescription$RowBatchVersion', integer } stack: { 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' } Bytecode: 0x000: b200 022a b600 03b6 0004 2eaa 0181 0x010: 0001 0013 0059 0059 0x020: 0059 0059 0059 0062 0x030: 006b 006b 0074 0074 0x040: 007d 00ad 00ad 00ad 0x050: 00ad 00b6 00f7 0138 0x060: 0155 bb00 0559 1cb7 0006 b0bb 0007 0x070: 591c b700 08b0 bb00 0959 1cb7 000a b0bb 0x080: 000b 591c b700 0cb0 2ab6 000d 3e2a b600 0x090: 0e36 042b b200 0fa5 0009 1d10 12a4 000f 0x0a0: bb00 1159 1c1d 1504 b700 12b0 bb00 1359 0x0b0: 1c1d 1504 b700 14b0 bb00 1559 1cb7 0016 0x0c0: b02a b600 174e 2db9 0018 0100 bd00 193a 0x0d0: 0403 3605 1505 1904 bea2 001e 1904 1505 0x0e0: 2d15 05b9 001a 0200 c000 102b 1cb8 001b 0x0f0: 5384 0501 a7ff e0bb 001c 591c 1904 b700 0x100: 1db0 2ab6 0017 4e2d b900 1801 00bd 0019 0x110: 3a04 0336 0515 0519 04be a200 1e19 0415 0x120: 052d 1505 b900 1a02 00c0 0010 2b1c b800 0x130: 1b53 8405 01a7 ffe0 bb00 1e59 1c19 04b7 0x140: 001f b02a b600 174e bb00 2059 1c2d 03b9 0x150: 001a 0200 c000 102b 1cb8 001b b700 21b0 0x160: 2ab6 0017 4ebb 0022 591c 2d03 b900 1a02 0x170: 00c0 0010 2b1c b800 1b2d 04b9 001a 0200 0x180: c000 102b 1cb8 001b b700 23b0 bb00 2459 0x190: bb00 2559 b700 2612 27b6 0028 2ab6 0003 0x1a0: b600 29b6 002a b700 2bbf Stackmap Table: same_frame_extended(@100) same_frame(@109) same_frame(@118) same_frame(@127) same_frame(@136) append_frame(@160,Integer,Integer) same_frame(@172) chop_frame(@184,2) same_frame(@193) append_frame(@212,Object[_75],Object[_76],Integer) chop_frame(@247,1) chop_frame(@258,2) append_frame(@277,Object[_75],Object[_76],Integer) chop_frame(@312,1) chop_frame(@323,2) same_frame(@352) same_frame(@396) at org.apache.orc.TypeDescription.createRowBatch(TypeDescription.java:483) at org.apache.hadoop.hive.ql.io.orc.WriterImpl._init_(WriterImpl.java:100) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:334) at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:51) at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:37) at org.apache.pig.builtin.OrcStorage.putNext(OrcStorage.java:249) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:146) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.write(SparkHadoopWriter.scala:368) at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:138) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525) at org.a
[jira] [Created] (PIG-5450) Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type
Koji Noguchi created PIG-5450: - Summary: Pig-on-Spark3 E2E ORC test failing with java.lang.VerifyError: Bad return type Key: PIG-5450 URL: https://issues.apache.org/jira/browse/PIG-5450 Project: Pig Issue Type: Bug Components: spark Reporter: Koji Noguchi Assignee: Koji Noguchi {noformat} Caused by: java.lang.VerifyError: Bad return type Exception Details: Location: org/apache/orc/impl/TypeUtils.createColumn(Lorg/apache/orc/TypeDescription;Lorg/apache/orc/TypeDescription$RowBatchVersion;I)Lorg/apache/hadoop/hive/ql/exec/vector/ColumnVector; @117: areturn Reason: Type 'org/apache/hadoop/hive/ql/exec/vector/DateColumnVector' (current frame, stack[0]) is not assignable to 'org/apache/hadoop/hive/ql/exec/vector/ColumnVector' (from method signature) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5410) Support Python 3 for streaming_python
[ https://issues.apache.org/jira/browse/PIG-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5410: -- Attachment: pig-5410-v02.patch Testing the patch, it was failing with {noformat} Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File "/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py", line 365 WRAPPED_MAP_END) ^ SyntaxError: invalid syntax {noformat} it seems like the patch was missing a '+'. Uploading a new patch with '+'. > Support Python 3 for streaming_python > - > > Key: PIG-5410 > URL: https://issues.apache.org/jira/browse/PIG-5410 > Project: Pig > Issue Type: New Feature >Reporter: Rohini Palaniswamy >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5410.patch, pig-5410-v02.patch > > > Python 3 is incompatible with Python 2. We need to make it work with both. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (PIG-5410) Support Python 3 for streaming_python
[ https://issues.apache.org/jira/browse/PIG-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832317#comment-17832317 ] Koji Noguchi edited comment on PIG-5410 at 3/29/24 9:10 PM: Testing the patch, it was failing with {noformat} Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File "/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py", line 365 WRAPPED_MAP_END) ^ SyntaxError: invalid syntax {noformat} it seems like the patch was missing a '+'. Uploading a new patch. was (Author: knoguchi): Testing the patch, it was failing with {noformat} Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : File "/grid/0/tmp/yarn-local/usercache/gtrain/appcache/application_1694019138198_2621253/container_e13_1694019138198_2621253_01_04/tmp/controller1951726576599472905.py", line 365 WRAPPED_MAP_END) ^ SyntaxError: invalid syntax {noformat} it seems like the patch was missing a '+'. Uploading a new patch with '+'. > Support Python 3 for streaming_python > - > > Key: PIG-5410 > URL: https://issues.apache.org/jira/browse/PIG-5410 > Project: Pig > Issue Type: New Feature >Reporter: Rohini Palaniswamy >Assignee: Venkatasubrahmanian Narayanan >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5410.patch, pig-5410-v02.patch > > > Python 3 is incompatible with Python 2. We need to make it work with both. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5449: -- Attachment: pig-5449-v01.patch Before (in spark2 land), this used to work by checking empty list returned by getjobIDs. https://github.com/apache/pig/blob/branch-0.17/src/org/apache/pig/backend/hadoop/executionengine/spark/JobGraphBuilder.java#L210-L219 But with spark3, this is returning actual jobid but no metrics stored behind. Instead of adding another logic for spark3, I think we can treat metrics retrieval as optional like we do in mapreduce & tez.Attaching a patch. (pig-5449-v01.patch) > TestEmptyInputDir failing on pig-on-spark3 > -- > > Key: PIG-5449 > URL: https://issues.apache.org/jira/browse/PIG-5449 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5449-v01.patch > > > TestEmptyInputDir failing on pig-on-spark3 with > {noformat:title=TestEmptyInputDir.testMergeJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141) > {noformat} > {noformat:title=TestEmptyInputDir.testGroupByFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80) > {noformat} > {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297) > {noformat} > {noformat:title=TestEmptyInputDir.testFRJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171) > {noformat} > {noformat:title=TestEmptyInputDir.testBloomJoinFailure} > junit.framework.AssertionFailedError > at > org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267) > {noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5449) TestEmptyInputDir failing on pig-on-spark3
Koji Noguchi created PIG-5449: - Summary: TestEmptyInputDir failing on pig-on-spark3 Key: PIG-5449 URL: https://issues.apache.org/jira/browse/PIG-5449 Project: Pig Issue Type: Bug Components: spark Reporter: Koji Noguchi Assignee: Koji Noguchi TestEmptyInputDir failing on pig-on-spark3 with {noformat:title=TestEmptyInputDir.testMergeJoinFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testMergeJoin(TestEmptyInputDir.java:141) {noformat} {noformat:title=TestEmptyInputDir.testGroupByFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testGroupBy(TestEmptyInputDir.java:80) {noformat} {noformat:title=TestEmptyInputDir.testBloomJoinOuterFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testBloomJoinOuter(TestEmptyInputDir.java:297) {noformat} {noformat:title=TestEmptyInputDir.testFRJoinFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testFRJoin(TestEmptyInputDir.java:171) {noformat} {noformat:title=TestEmptyInputDir.testBloomJoinFailure} junit.framework.AssertionFailedError at org.apache.pig.test.TestEmptyInputDir.testBloomJoin(TestEmptyInputDir.java:267) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3
[ https://issues.apache.org/jira/browse/PIG-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5448: -- Attachment: pig-5448-v01.patch {quote}No task metrics available for jobId 0 {quote} This is actually failing because Pig is succeeding without running anything. Looking further, found out that Spark is filtering out all input splits and reporting successful empty job results with no metrics. Setting a flag so that Spark would not ignore PigSplit which looks empty but still have (non-hdfs) inputs. (pig-5448-v01.patch) > All TestHBaseStorage tests failing on pig-on-spark3 > --- > > Key: PIG-5448 > URL: https://issues.apache.org/jira/browse/PIG-5448 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5448-v01.patch > > > For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are > failing with > {noformat} > org.apache.pig.PigException: ERROR 1002: Unable to store alias b > at org.apache.pig.PigServer.storeEx(PigServer.java:1127) > at org.apache.pig.PigServer.store(PigServer.java:1086) > at > org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.storeEx(PigServer.java:1123) > Caused by: java.lang.RuntimeException: No task metrics available for jobId 0 > at > org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (PIG-5439) Support Spark 3 and drop SparkShim
[ https://issues.apache.org/jira/browse/PIG-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-5439: -- Attachment: pig-5439-v02.patch Adding missing spark-scala.version. (pig-5439-v02.patch) > Support Spark 3 and drop SparkShim > -- > > Key: PIG-5439 > URL: https://issues.apache.org/jira/browse/PIG-5439 > Project: Pig > Issue Type: Improvement > Components: spark >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5439-v01.patch, pig-5439-v02.patch > > > Support Pig-on-Spark to run on spark3. > Initial version would only run up to Spark 3.2.4 and not on 3.3 or 3.4. > This is due to log4j mismatch. > After moving to log4j2 (PIG-5426), we can move Spark to 3.3 or higher. > So far, not all unit/e2e tests pass with the proposed patch but at least > compilation goes through. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (PIG-5448) All TestHBaseStorage tests failing on pig-on-spark3
Koji Noguchi created PIG-5448: - Summary: All TestHBaseStorage tests failing on pig-on-spark3 Key: PIG-5448 URL: https://issues.apache.org/jira/browse/PIG-5448 Project: Pig Issue Type: Bug Components: spark Reporter: Koji Noguchi Assignee: Koji Noguchi For Pig on Spark3 (with PIG-5439), all of the TestHBaseStorage unit tests are failing with {noformat} org.apache.pig.PigException: ERROR 1002: Unable to store alias b at org.apache.pig.PigServer.storeEx(PigServer.java:1127) at org.apache.pig.PigServer.store(PigServer.java:1086) at org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete(TestHBaseStorage.java:1251) Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get the rdds of this spark operator: at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:241) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) at org.apache.pig.PigServer.storeEx(PigServer.java:1123) Caused by: java.lang.RuntimeException: No task metrics available for jobId 0 at org.apache.pig.tools.pigstats.spark.SparkJobStats.collectStats(SparkJobStats.java:109) at org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:77) at org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:73) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5446) Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing
[ https://issues.apache.org/jira/browse/PIG-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826791#comment-17826791 ] Rohini Palaniswamy commented on PIG-5446: - +1 > Tez TestPigProgressReporting.testProgressReportingWithStatusMessage failing > --- > > Key: PIG-5446 > URL: https://issues.apache.org/jira/browse/PIG-5446 > Project: Pig > Issue Type: Bug > Components: tez >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5446-v01.patch > > > {noformat} > Unable to open iterator for alias B. Backend error : Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:0 > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias B. Backend error : Vertex failed, vertexName=scope-4, > vertexId=vertex_1707216362777_0001_1_00, diagnostics=[Task failed, > taskId=task_1707216362777_0001_1_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Attempt failed because it appears to make no progress for > 1ms], TaskAttempt 1 failed, info=[Attempt failed because it appears to > make no progress for 1ms]], Vertex did not succeed due to > OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at org.apache.pig.PigServer.openIterator(PigServer.java:1014) > at > org.apache.pig.test.TestPigProgressReporting.testProgressReportingWithStatusMessage(TestPigProgressReporting.java:58) > Caused by: org.apache.tez.dag.api.TezException: Vertex failed, > vertexName=scope-4, vertexId=vertex_1707216362777_0001_1_00, > diagnostics=[Task failed, taskId=task_1707216362777_0001_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Attempt failed because it appears to > make no progress for 1ms], TaskAttempt 1 failed, info=[Attempt failed > because it appears to make no progress for 1ms]], Vertex did not succeed > due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex > vertex_1707216362777_0001_1_00 [scope-4] killed/failed due > to:OWN_TASK_FAILURE] > DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:204) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:243) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:212) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 45.647 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5416) Spark unit tests failing randomly with "java.lang.RuntimeException: Unexpected job execution status RUNNING"
[ https://issues.apache.org/jira/browse/PIG-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826790#comment-17826790 ] Rohini Palaniswamy commented on PIG-5416: - +1 > Spark unit tests failing randomly with "java.lang.RuntimeException: > Unexpected job execution status RUNNING" > > > Key: PIG-5416 > URL: https://issues.apache.org/jira/browse/PIG-5416 > Project: Pig > Issue Type: Bug > Components: spark >Reporter: Koji Noguchi >Priority: Minor > Attachments: pig-5416-v01.patch > > > Spark unit tests fail randomly with same errors. > Sample stack trace showing "Caused by: java.lang.RuntimeException: > Unexpected job execution status RUNNING". > {noformat:title=TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF} > Unable to store alias B > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to > store alias B > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1783) > at org.apache.pig.PigServer.registerQuery(PigServer.java:708) > at org.apache.pig.PigServer.registerQuery(PigServer.java:721) > at > org.apache.pig.test.TestBuiltInBagToTupleOrString.testPigScriptForBagToTupleUDF(TestBuiltInBagToTupleOrString.java:429) > Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0: fail to get > the rdds of this spark operator: > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:115) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) > at > org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:240) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1479) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1464) > at org.apache.pig.PigServer.execute(PigServer.java:1453) > at org.apache.pig.PigServer.access$500(PigServer.java:119) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778) > Caused by: java.lang.RuntimeException: Unexpected job execution status RUNNING > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.isJobSuccess(SparkStatsUtil.java:138) > at > org.apache.pig.tools.pigstats.spark.SparkPigStats.addJobStats(SparkPigStats.java:75) > at > org.apache.pig.tools.pigstats.spark.SparkStatsUtil.waitForJobAddStats(SparkStatsUtil.java:59) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:225) > at > org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5447) Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with NoSuchElementException
[ https://issues.apache.org/jira/browse/PIG-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826789#comment-17826789 ] Rohini Palaniswamy commented on PIG-5447: - +1 > Pig-on-Spark TestSkewedJoin.testSkewedJoinOuter failing with > NoSuchElementException > --- > > Key: PIG-5447 > URL: https://issues.apache.org/jira/browse/PIG-5447 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5447-v01.patch > > > TestSkewedJoin.testSkewedJoinOuter is consistently failing for right-outer > and full-outer joins. > "Caused by: java.util.NoSuchElementException: next on empty iterator" -- This message was sent by Atlassian Jira (v8.20.10#820010)