[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122108#comment-16122108 ] Wendy Haley commented on OOZIE-2787: Is this released to AWS spark yet? I did not quite understand the workaround. How do I exclude the jar file from the --files spark-opts. I already have a --files spark-opts that does not include the jar location. > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, > OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch, > OOZIE-2787-amend-4.patch, OOZIE-2787-amend-5.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860858#comment-15860858 ] Satish Subhashrao Saley commented on OOZIE-2787: Thank you Puru, Abhishek and Xiaobin for review. Committed to master. > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, > OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch, > OOZIE-2787-amend-4.patch, OOZIE-2787-amend-5.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860777#comment-15860777 ] Hadoop QA commented on OOZIE-2787: -- Testing JIRA OOZIE-2787 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. .{color:green}+1{color} There are no new bugs found in [server]. .{color:green}+1{color} There are no new bugs found in [client]. .{color:green}+1{color} There are no new bugs found in [tools]. .{color:green}+1{color} There are no new bugs found in [docs]. .{color:green}+1{color} There are no new bugs found in [examples]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [sharelib/pig]. .{color:green}+1{color} There are no new bugs found in [sharelib/spark]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive2]. .{color:green}+1{color} There are no new bugs found in [sharelib/streaming]. .{color:green}+1{color} There are no new bugs found in [hadooplibs/hadoop-utils-2]. .{color:green}+1{color} There are no new bugs found in [core]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 1873 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/3639/ > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, > OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch, > OOZIE-2787-amend-4.patch, OOZIE-2787-amend-5.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860506#comment-15860506 ] Purshotam Shah commented on OOZIE-2787: --- +1 for OOZIE-2787-amend-5.patch. > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, > OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch, > OOZIE-2787-amend-4.patch, OOZIE-2787-amend-5.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860414#comment-15860414 ] Hadoop QA commented on OOZIE-2787: -- Testing JIRA OOZIE-2787 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch does adds/modifies 1 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. .{color:green}+1{color} There are no new bugs found in [server]. .{color:green}+1{color} There are no new bugs found in [client]. .{color:green}+1{color} There are no new bugs found in [core]. .{color:green}+1{color} There are no new bugs found in [docs]. .{color:green}+1{color} There are no new bugs found in [hadooplibs/hadoop-utils-2]. .{color:green}+1{color} There are no new bugs found in [tools]. .{color:green}+1{color} There are no new bugs found in [examples]. .{color:green}+1{color} There are no new bugs found in [sharelib/streaming]. .{color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive2]. .{color:green}+1{color} There are no new bugs found in [sharelib/pig]. .{color:green}+1{color} There are no new bugs found in [sharelib/spark]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 1873 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}*+1 Overall result, good!, no -1s*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/3638/ > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, > OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860204#comment-15860204 ] Satish Subhashrao Saley commented on OOZIE-2787: Thank you for review [~abhishekbafna] [~zhengxb2005]. Users don't need to do any extra work, they will continue using current configuration. I added an example in doc. Also added some test cases for JarFilter. > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, > OOZIE-2787-amend-1.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859120#comment-15859120 ] Hadoop QA commented on OOZIE-2787: -- Testing JIRA OOZIE-2787 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. .{color:green}+1{color} There are no new bugs found in [server]. .{color:green}+1{color} There are no new bugs found in [client]. .{color:green}+1{color} There are no new bugs found in [core]. .{color:green}+1{color} There are no new bugs found in [docs]. .{color:green}+1{color} There are no new bugs found in [hadooplibs/hadoop-utils-2]. .{color:green}+1{color} There are no new bugs found in [tools]. .{color:green}+1{color} There are no new bugs found in [examples]. .{color:green}+1{color} There are no new bugs found in [sharelib/streaming]. .{color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive2]. .{color:green}+1{color} There are no new bugs found in [sharelib/pig]. .{color:green}+1{color} There are no new bugs found in [sharelib/spark]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 1872 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/3634/ > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, > OOZIE-2787-amend-1.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859081#comment-15859081 ] Xiaobin Zheng commented on OOZIE-2787: -- [~satishsaley] Thx for the patch. Two minor suggestions: 1. java doc for 'isApplicationJar' seems outdated. 2. It would be great if we can add some simple unit test for either 'filter()' or 'isApplicationJar()' to ensure the behavior we want. > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, > OOZIE-2787-amend-1.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858946#comment-15858946 ] Abhishek Bafna commented on OOZIE-2787: --- [~satishsaley] It would be nice to document the same for oozie-spark action. Some thing like "How to specify application path in Oozie-Spark action". Thanks. > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, > OOZIE-2787-amend-1.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855025#comment-15855025 ] Satish Subhashrao Saley commented on OOZIE-2787: Thank you Rohini and Andras for review. Committed to master. > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854415#comment-15854415 ] Satish Subhashrao Saley commented on OOZIE-2787: Tested locally, failures are flaky. > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852492#comment-15852492 ] Hadoop QA commented on OOZIE-2787: -- Testing JIRA OOZIE-2787 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings .{color:red}WARNING: the current HEAD has 1 RAT warning(s), they should be addressed ASAP{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. .{color:green}+1{color} There are no new bugs found in [core]. .{color:green}+1{color} There are no new bugs found in [client]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [hadooplibs/hadoop-utils-2]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1868 .Tests failed: 4 .Tests errors: 0 .The patch failed the following testcases: . testMemoryUsageAndSpeed(org.apache.oozie.service.TestPartitionDependencyManagerEhcache) . testMemoryUsageAndSpeed(org.apache.oozie.service.TestPartitionDependencyManagerService) . testCoordMaterializeTriggerService3(org.apache.oozie.service.TestCoordMaterializeTriggerService) . testTimeOutWithUnresolvedMissingDependencies(org.apache.oozie.command.coord.TestCoordPushDependencyCheckXCommand) .Tests failing with errors: . {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} {color:red}. There is at least one warning, please check{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/3612/ > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852419#comment-15852419 ] Andras Piros commented on OOZIE-2787: - +1 (non-binding) > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852332#comment-15852332 ] Rohini Palaniswamy commented on OOZIE-2787: --- +1 > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, > OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852248#comment-15852248 ] Rohini Palaniswamy commented on OOZIE-2787: --- Moving it to a JarFilter class is nice and code is more cleaner. Just need two more minor changes. private class JarFilter -> private static class JarFilter LinkedList listUris = null; -> private LinkedList listUris = null; > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852051#comment-15852051 ] Rohini Palaniswamy commented on OOZIE-2787: --- Comments: - Instead of adding a new profile spark-2.1, please just upgrade version of spark in spark-2 profile. - fixFsDefaultUris is also doing filtering of the application jar path. Just do that in one place in filterJars. > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851984#comment-15851984 ] Andras Piros commented on OOZIE-2787: - [~satishsaley] thanks for the patch! Some observations: * please add test case to {{TestSparkMain}} or elsewhere * please rename new Maven profile to {{spark-2.1-kafka-1.6.2}} to get a better idea what's in there * I'd extract the {{filterJars()}} to a nested class for better testability and SRP, like {{JarURIFilter}}. In that case you could pass all the necessary parameters via constructor, and have a {{toString{})) method that calls {{StringUtils.join()}} * it's OK w/ me if all the JAR files of the current directory are filtered, supposing all those ones are application JARs. What about other packages like {{.py}} and {{.zip}} files? Maybe worth having unit tests for those as well > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail
[ https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851859#comment-15851859 ] Hadoop QA commented on OOZIE-2787: -- Testing JIRA OOZIE-2787 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings .{color:red}WARNING: the current HEAD has 1 RAT warning(s), they should be addressed ASAP{color} {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1{color} There are no new bugs found in total. .{color:green}+1{color} There are no new bugs found in [client]. .{color:green}+1{color} There are no new bugs found in [core]. .{color:green}+1{color} There are no new bugs found in [hadooplibs/hadoop-utils-2]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 1868 {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} {color:red}. There is at least one warning, please check{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/3610/ > Oozie distributes application jar twice making the spark job fail > - > > Key: OOZIE-2787 > URL: https://issues.apache.org/jira/browse/OOZIE-2787 > Project: Oozie > Issue Type: Bug >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2787-1.patch > > > Oozie adds the application jar to the list of files to be uploaded to > distributed cache. Since this gets added twice, the job fails. This is > observed from spark 2.1.0 which introduces a check for same file and fails > the job. > {code} > --master > yarn > --deploy-mode > cluster > --name > oozieSparkStarter > --class > ScalaWordCount > --queue > default > --conf > spark.executor.extraClassPath=$PWD/* > --conf > spark.driver.extraClassPath=$PWD/* > --conf > spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties > --conf > spark.yarn.security.tokens.hive.enabled=false > --conf > spark.yarn.security.tokens.hbase.enabled=false > --files > hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar > --properties-file > spark-defaults.conf > --verbose > spark-example.jar > samplefile.txt > output > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)