[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2017-02-17 Thread Joshua Caplan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872720#comment-15872720
 ] 

Joshua Caplan commented on SPARK-3877:
--

Done, as SPARK-19649 .

> The exit code of spark-submit is still 0 when an yarn application fails
> ---
>
> Key: SPARK-3877
> URL: https://issues.apache.org/jira/browse/SPARK-3877
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>  Labels: yarn
> Fix For: 1.1.1, 1.2.0
>
>
> When an yarn application fails (yarn-cluster mode), the exit code of 
> spark-submit is still 0. It's hard for people to write some automatic scripts 
> to run spark jobs in yarn because the failure can not be detected in these 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2017-01-17 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826517#comment-15826517
 ] 

Marcelo Vanzin commented on SPARK-3877:
---

[~j_caplan] can you open a new bug for that issue?

> The exit code of spark-submit is still 0 when an yarn application fails
> ---
>
> Key: SPARK-3877
> URL: https://issues.apache.org/jira/browse/SPARK-3877
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>  Labels: yarn
> Fix For: 1.1.1, 1.2.0
>
>
> When an yarn application fails (yarn-cluster mode), the exit code of 
> spark-submit is still 0. It's hard for people to write some automatic scripts 
> to run spark jobs in yarn because the failure can not be detected in these 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2017-01-17 Thread Joshua Caplan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826444#comment-15826444
 ] 

Joshua Caplan commented on SPARK-3877:
--

see also https://issues.apache.org/jira/browse/MAPREDUCE-6091

> The exit code of spark-submit is still 0 when an yarn application fails
> ---
>
> Key: SPARK-3877
> URL: https://issues.apache.org/jira/browse/SPARK-3877
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>  Labels: yarn
> Fix For: 1.1.1, 1.2.0
>
>
> When an yarn application fails (yarn-cluster mode), the exit code of 
> spark-submit is still 0. It's hard for people to write some automatic scripts 
> to run spark jobs in yarn because the failure can not be detected in these 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2017-01-09 Thread Joshua Caplan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812752#comment-15812752
 ] 

Joshua Caplan commented on SPARK-3877:
--

I think you have created a race condition with this fix which I am encountering 
about 50% of the time, using Spark 1.6.3.  I have configured YARN not to keep 
*any* recent jobs in memory, as some of my jobs get pretty large.

yarn-site   yarn.resourcemanager.max-completed-applications 0

The once-per-second call to getApplicationReport may thus encounter a RUNNING 
application followed by a not found application, and report a false negative.

(typical) Executor log:
17/01/09 19:31:23 INFO ApplicationMaster: Final app status: SUCCEEDED, 
exitCode: 0
17/01/09 19:31:23 INFO SparkContext: Invoking stop() from shutdown hook
17/01/09 19:31:24 INFO SparkUI: Stopped Spark web UI at http://10.0.0.168:37046
17/01/09 19:31:24 INFO YarnClusterSchedulerBackend: Shutting down all executors
17/01/09 19:31:24 INFO YarnClusterSchedulerBackend: Asking each executor to 
shut down
17/01/09 19:31:24 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
17/01/09 19:31:24 INFO MemoryStore: MemoryStore cleared
17/01/09 19:31:24 INFO BlockManager: BlockManager stopped
17/01/09 19:31:24 INFO BlockManagerMaster: BlockManagerMaster stopped
17/01/09 19:31:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
17/01/09 19:31:24 INFO SparkContext: Successfully stopped SparkContext
17/01/09 19:31:24 INFO ApplicationMaster: Unregistering ApplicationMaster with 
SUCCEEDED
17/01/09 19:31:24 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down 
remote daemon.
17/01/09 19:31:24 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon 
shut down; proceeding with flushing remote transports.
17/01/09 19:31:24 INFO AMRMClientImpl: Waiting for application to be 
successfully unregistered.
17/01/09 19:31:24 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut 
down.

Client log:
17/01/09 19:31:23 INFO Client: Application report for 
application_1483983939941_0056 (state: RUNNING)
17/01/09 19:31:24 ERROR Client: Application application_1483983939941_0056 not 
found.
Exception in thread "main" org.apache.spark.SparkException: Application 
application_1483983939941_0056 is killed
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1038)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


> The exit code of spark-submit is still 0 when an yarn application fails
> ---
>
> Key: SPARK-3877
> URL: https://issues.apache.org/jira/browse/SPARK-3877
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>  Labels: yarn
> Fix For: 1.1.1, 1.2.0
>
>
> When an yarn application fails (yarn-cluster mode), the exit code of 
> spark-submit is still 0. It's hard for people to write some automatic scripts 
> to run spark jobs in yarn because the failure can not be detected in these 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-19 Thread sam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014683#comment-15014683
 ] 

sam commented on SPARK-3877:


Actually ignore, as per comment in duplicate, can't seem to reproduce.

> The exit code of spark-submit is still 0 when an yarn application fails
> ---
>
> Key: SPARK-3877
> URL: https://issues.apache.org/jira/browse/SPARK-3877
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>  Labels: yarn
> Fix For: 1.1.1, 1.2.0
>
>
> When an yarn application fails (yarn-cluster mode), the exit code of 
> spark-submit is still 0. It's hard for people to write some automatic scripts 
> to run spark jobs in yarn because the failure can not be detected in these 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2015-11-18 Thread sam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011690#comment-15011690
 ] 

sam commented on SPARK-3877:


Is this really fixed?? I'm getting this on 1.5.0 using EMR.

[~tgraves]

[~vanzin]

[~zsxwing]

> The exit code of spark-submit is still 0 when an yarn application fails
> ---
>
> Key: SPARK-3877
> URL: https://issues.apache.org/jira/browse/SPARK-3877
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>  Labels: yarn
> Fix For: 1.1.1, 1.2.0
>
>
> When an yarn application fails (yarn-cluster mode), the exit code of 
> spark-submit is still 0. It's hard for people to write some automatic scripts 
> to run spark jobs in yarn because the failure can not be detected in these 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2014-10-22 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180628#comment-14180628
 ] 

Apache Spark commented on SPARK-3877:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/2748

 The exit code of spark-submit is still 0 when an yarn application fails
 ---

 Key: SPARK-3877
 URL: https://issues.apache.org/jira/browse/SPARK-3877
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.1.0
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
  Labels: yarn
 Fix For: 1.1.1, 1.2.0


 When an yarn application fails (yarn-cluster mode), the exit code of 
 spark-submit is still 0. It's hard for people to write some automatic scripts 
 to run spark jobs in yarn because the failure can not be detected in these 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2014-10-17 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174999#comment-14174999
 ] 

Thomas Graves commented on SPARK-3877:
--

[~vanzin]  I agree. The user code should be exiting with non-zero or throwing 
on failure.  If they aren't then there is nothing we can do about it, other 
then tell them to change their code to properly exit if they want to see 
failure status. Perhaps we should better document what they should do on 
failure too.   Its basically the same I did for the exit codes in 
ApplicationMaster. It relies on user code exiting non-zero and throwing.

The only other option would be for us to actually look at the details in the 
scheduler ourselves to try to determine what happened.  ie we see Stage X 
failed or Y tasks failed, etc.  I would say we do that later if its needed. 



 The exit code of spark-submit is still 0 when an yarn application fails
 ---

 Key: SPARK-3877
 URL: https://issues.apache.org/jira/browse/SPARK-3877
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Shixiong Zhu
Priority: Minor
  Labels: yarn

 When an yarn application fails (yarn-cluster mode), the exit code of 
 spark-submit is still 0. It's hard for people to write some automatic scripts 
 to run spark jobs in yarn because the failure can not be detected in these 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2014-10-16 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174507#comment-14174507
 ] 

Marcelo Vanzin commented on SPARK-3877:
---

[~tgraves] this can be seen as a subset of SPARK-2167, but as I mentioned on 
that bug, I don't think it's fixable for all cases. SparkSubmit is executing 
user code, so it can only report errors when the user code does. 

e.g., a job like this would report an error today

{code}
  val sc = ...
  try {
// do stuff
if (somethingBad) throw MyJobFailedException()
  } finally {
sc.stop()
  }
{code}

But this one wouldn't:

{code}
  val sc = ...
  try {
// do stuff
if (somethingBad) throw MyJobFailedException()
  } catch {
case e: Exception = logError(Oops, something bad happened., e)
  } finally {
sc.stop()
  }
{code}

yarn-client mode will abruptly stop the SparkContext when the Yarn app fails. 
But depending on how the user's {main()} deals with errors, that still may not 
result in a non-zero exit status.

 The exit code of spark-submit is still 0 when an yarn application fails
 ---

 Key: SPARK-3877
 URL: https://issues.apache.org/jira/browse/SPARK-3877
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Shixiong Zhu
Priority: Minor
  Labels: yarn

 When an yarn application fails (yarn-cluster mode), the exit code of 
 spark-submit is still 0. It's hard for people to write some automatic scripts 
 to run spark jobs in yarn because the failure can not be detected in these 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2014-10-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164993#comment-14164993
 ] 

Apache Spark commented on SPARK-3877:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/2732

 The exit code of spark-submit is still 0 when an yarn application fails
 ---

 Key: SPARK-3877
 URL: https://issues.apache.org/jira/browse/SPARK-3877
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Shixiong Zhu
Priority: Minor
  Labels: yarn

 When an yarn application fails (yarn-cluster mode), the exit code of 
 spark-submit is still 0. It's hard for people to write some automatic scripts 
 to run spark jobs in yarn because the failure can not be detected in these 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3877) The exit code of spark-submit is still 0 when an yarn application fails

2014-10-09 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165124#comment-14165124
 ] 

Thomas Graves commented on SPARK-3877:
--

this looks like a dup of SPARK-2167.  Or actually perhaps a subset of that 
since I think you only handle the yarn mode.   Does this cover both client and 
cluster mode?



 The exit code of spark-submit is still 0 when an yarn application fails
 ---

 Key: SPARK-3877
 URL: https://issues.apache.org/jira/browse/SPARK-3877
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Shixiong Zhu
Priority: Minor
  Labels: yarn

 When an yarn application fails (yarn-cluster mode), the exit code of 
 spark-submit is still 0. It's hard for people to write some automatic scripts 
 to run spark jobs in yarn because the failure can not be detected in these 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org