[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-08-10 Thread Wendy Haley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122108#comment-16122108
 ] 

Wendy Haley commented on OOZIE-2787:


Is this released to AWS spark yet?
I did not quite understand the workaround.  How do I exclude the jar file from 
the --files spark-opts.  I already have a --files spark-opts that does not 
include the jar location.


> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch, 
> OOZIE-2787-amend-4.patch, OOZIE-2787-amend-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-09 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860858#comment-15860858
 ] 

Satish Subhashrao Saley commented on OOZIE-2787:


Thank you Puru, Abhishek and Xiaobin for review. Committed to master.

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch, 
> OOZIE-2787-amend-4.patch, OOZIE-2787-amend-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860777#comment-15860777
 ] 

Hadoop QA commented on OOZIE-2787:
--

Testing JIRA OOZIE-2787

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch does adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:green}+1{color} There are no new bugs found in [core].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 1873
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3639/

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch, 
> OOZIE-2787-amend-4.patch, OOZIE-2787-amend-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-09 Thread Purshotam Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860506#comment-15860506
 ] 

Purshotam Shah commented on OOZIE-2787:
---

+1 for OOZIE-2787-amend-5.patch.

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch, 
> OOZIE-2787-amend-4.patch, OOZIE-2787-amend-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860414#comment-15860414
 ] 

Hadoop QA commented on OOZIE-2787:
--

Testing JIRA OOZIE-2787

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch does adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 1873
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:green}*+1 Overall result, good!, no -1s*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3638/

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860385#comment-15860385
 ] 

Hadoop QA commented on OOZIE-2787:
--

Testing JIRA OOZIE-2787

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch does adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [tools].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1873
.Tests failed: 1
.Tests errors: 1

.The patch failed the following testcases:

.  
testNone(org.apache.oozie.command.coord.TestCoordActionInputCheckXCommand)

.Tests failing with errors:
.  testJMXInstrumentation(org.apache.oozie.util.TestMetricsInstrumentation)

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3637/

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch, OOZIE-2787-amend-2.patch, OOZIE-2787-amend-3.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-09 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860204#comment-15860204
 ] 

Satish Subhashrao Saley commented on OOZIE-2787:


Thank you for review [~abhishekbafna]  [~zhengxb2005].
Users don't need to do any extra work, they will continue using current 
configuration. I added an example in doc. Also added some test cases for 
JarFilter.

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859120#comment-15859120
 ] 

Hadoop QA commented on OOZIE-2787:
--

Testing JIRA OOZIE-2787

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 1872
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3634/

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-08 Thread Xiaobin Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859081#comment-15859081
 ] 

Xiaobin Zheng commented on OOZIE-2787:
--

[~satishsaley] Thx for the patch. Two minor suggestions:
1. java doc for 'isApplicationJar' seems outdated.
2. It would be great if we can add some simple unit test for either 'filter()' 
or 'isApplicationJar()' to ensure the behavior we want.

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-08 Thread Abhishek Bafna (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858946#comment-15858946
 ] 

Abhishek Bafna commented on OOZIE-2787:
---

[~satishsaley] It would be nice to document the same for oozie-spark action. 
Some thing like "How to specify application path in Oozie-Spark action". Thanks.

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch, 
> OOZIE-2787-amend-1.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-06 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855025#comment-15855025
 ] 

Satish Subhashrao Saley commented on OOZIE-2787:


Thank you Rohini and Andras for review. Committed to master.

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-06 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854415#comment-15854415
 ] 

Satish Subhashrao Saley commented on OOZIE-2787:


Tested locally, failures are flaky.

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852492#comment-15852492
 ] 

Hadoop QA commented on OOZIE-2787:
--

Testing JIRA OOZIE-2787

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
.{color:red}WARNING: the current HEAD has 1 RAT warning(s), they should be 
addressed ASAP{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1868
.Tests failed: 4
.Tests errors: 0

.The patch failed the following testcases:

.  
testMemoryUsageAndSpeed(org.apache.oozie.service.TestPartitionDependencyManagerEhcache)
.  
testMemoryUsageAndSpeed(org.apache.oozie.service.TestPartitionDependencyManagerService)
.  
testCoordMaterializeTriggerService3(org.apache.oozie.service.TestCoordMaterializeTriggerService)
.  
testTimeOutWithUnresolvedMissingDependencies(org.apache.oozie.command.coord.TestCoordPushDependencyCheckXCommand)

.Tests failing with errors:
.  

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}

{color:red}.   There is at least one warning, please check{color}

The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3612/

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-03 Thread Andras Piros (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852419#comment-15852419
 ] 

Andras Piros commented on OOZIE-2787:
-

+1 (non-binding)

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852332#comment-15852332
 ] 

Rohini Palaniswamy commented on OOZIE-2787:
---

+1

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch, 
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852248#comment-15852248
 ] 

Rohini Palaniswamy commented on OOZIE-2787:
---

Moving it to a JarFilter class is nice and code is more cleaner. Just need two 
more minor changes.

private class JarFilter -> private static class JarFilter
LinkedList listUris = null; -> private LinkedList listUris = null;

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852051#comment-15852051
 ] 

Rohini Palaniswamy commented on OOZIE-2787:
---

Comments:
   - Instead of adding a new profile spark-2.1, please just upgrade version of 
spark in spark-2 profile. 
   - fixFsDefaultUris is also doing filtering of the application jar path. Just 
do that in one place in filterJars.

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-03 Thread Andras Piros (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851984#comment-15851984
 ] 

Andras Piros commented on OOZIE-2787:
-

[~satishsaley] thanks for the patch!

Some observations:
* please add test case to {{TestSparkMain}} or elsewhere
* please rename new Maven profile to {{spark-2.1-kafka-1.6.2}} to get a better 
idea what's in there
* I'd extract the {{filterJars()}} to a nested class for better testability and 
SRP, like {{JarURIFilter}}. In that case you could pass all the necessary 
parameters via constructor, and have a {{toString{})) method that calls 
{{StringUtils.join()}}
* it's OK w/ me if all the JAR files of the current directory are filtered, 
supposing all those ones are application JARs. What about other packages like 
{{.py}} and {{.zip}} files? Maybe worth having unit tests for those as well

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851859#comment-15851859
 ] 

Hadoop QA commented on OOZIE-2787:
--

Testing JIRA OOZIE-2787

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:red}-1{color} the patch does not add/modify any testcase
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
.{color:red}WARNING: the current HEAD has 1 RAT warning(s), they should be 
addressed ASAP{color}
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1{color} There are no new bugs found in total.
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [core].
.{color:green}+1{color} There are no new bugs found in 
[hadooplibs/hadoop-utils-2].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 1868
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}

{color:red}.   There is at least one warning, please check{color}

The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/3610/

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OOZIE-2787) Oozie distributes application jar twice making the spark job fail

2017-02-03 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851652#comment-15851652
 ] 

Satish Subhashrao Saley commented on OOZIE-2787:


[SPARK-18099|https://issues.apache.org/jira/browse/SPARK-18099] added an 
exception to be thrown if same file gets added multiple times to distributed 
cache. 
If user has application jar in workflow/lib directory and he/she mentions 
relative path of the jar in  tag, then it results in distributing 
application jar multiple times. Earlier (before spark 2.1) this wasn't an 
issue, because spark used to show a WARN message.

Solution is to include complete hdfs path while specifying application jar and 
exclude it from --files option.

> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
>  Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to 
> distributed cache. Since this gets added twice, the job fails. This is 
> observed from spark 2.1.0 which introduces a check for same file and fails 
> the job.
> {code}
> --master
> yarn
> --deploy-mode
> cluster
> --name
> oozieSparkStarter
> --class
> ScalaWordCount
> --queue 
> default
> --conf
> spark.executor.extraClassPath=$PWD/*
> --conf
> spark.driver.extraClassPath=$PWD/*
> --conf
> spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
> --conf
> spark.yarn.security.tokens.hive.enabled=false
> --conf
> spark.yarn.security.tokens.hbase.enabled=false
> --files
> hdfs://mycluster.com/user/saley/oozie/apps/sparkapp/lib/spark-example.jar
> --properties-file
> spark-defaults.conf
> --verbose
> spark-example.jar
> samplefile.txt
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)