[jira] [Updated] (SPARK-10568) Error thrown in stopping one component in SparkContext.stop() doesn't allow other components to be stopped

2019-05-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-10568:
-
Labels: bulk-closed  (was: )

> Error thrown in stopping one component in SparkContext.stop() doesn't allow 
> other components to be stopped
> --
>
> Key: SPARK-10568
> URL: https://issues.apache.org/jira/browse/SPARK-10568
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1
>Reporter: Matt Cheah
>Priority: Minor
>  Labels: bulk-closed
>
> When I shut down a Java process that is running a SparkContext, it invokes a 
> shutdown hook that eventually calls SparkContext.stop(), and inside 
> SparkContext.stop() each individual component (DiskBlockManager, Scheduler 
> Backend) is stopped. If an exception is thrown in stopping one of these 
> components, none of the other components will be stopped cleanly either. This 
> caused problems when I stopped a Java process running a Spark context in 
> yarn-client mode, because not properly stopping YarnSchedulerBackend leads to 
> problems.
> The steps I ran are as follows:
> 1. Create one job which fills the cluster
> 2. Kick off another job which creates a Spark Context
> 3. Kill the Java process with the Spark Context in #2
> 4. The job remains in the YARN UI as ACCEPTED
> Looking in the logs we see the following:
> {code}
> 2015-09-07 10:32:43,446 ERROR [Thread-3] o.a.s.u.Utils - Uncaught exception 
> in thread Thread-3
> java.lang.NullPointerException: null
> at 
> org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:162)
>  ~[spark-core_2.10-1.4.1.jar:1.4.1]
> at 
> org.apache.spark.storage.DiskBlockManager$$anonfun$addShutdownHook$1.apply$mcV$sp(DiskBlockManager.scala:144)
>  ~[spark-core_2.10-1.4.1.jar:1.4.1]
> at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2308) 
> ~[spark-core_2.10-1.4.1.jar:1.4.1]
> {code}
> I think what's going on is that when we kill the application in the queued 
> state, it tries to run the SparkContext.stop() method on the driver and stop 
> each component. It dies trying to stop the DiskBlockManager because it hasn't 
> been initialized yet - the application is still waiting to be scheduled by 
> the Yarn RM - but YarnClient.stop() is not invoked as a result, leaving the 
> application sticking around in the accepted state.
> Because of what appears to be bugs in the YARN scheduler, entering this state 
> makes it so that the YARN scheduler is unable to schedule any more jobs 
> unless we manually remove this application via the YARN CLI. We can tackle 
> the YARN stuck state separately, but ensuring that all components get at 
> least some chance to stop when a SparkContext stops seems like a good idea. 
> Of course we can still throw some exception and/or log exceptions for 
> everything that goes wrong at the end of stopping the context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10568) Error thrown in stopping one component in SparkContext.stop() doesn't allow other components to be stopped

2015-09-12 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10568:
--
Priority: Minor  (was: Major)

> Error thrown in stopping one component in SparkContext.stop() doesn't allow 
> other components to be stopped
> --
>
> Key: SPARK-10568
> URL: https://issues.apache.org/jira/browse/SPARK-10568
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1
>Reporter: Matt Cheah
>Priority: Minor
>
> When I shut down a Java process that is running a SparkContext, it invokes a 
> shutdown hook that eventually calls SparkContext.stop(), and inside 
> SparkContext.stop() each individual component (DiskBlockManager, Scheduler 
> Backend) is stopped. If an exception is thrown in stopping one of these 
> components, none of the other components will be stopped cleanly either. This 
> caused problems when I stopped a Java process running a Spark context in 
> yarn-client mode, because not properly stopping YarnSchedulerBackend leads to 
> problems.
> The steps I ran are as follows:
> 1. Create one job which fills the cluster
> 2. Kick off another job which creates a Spark Context
> 3. Kill the Java process with the Spark Context in #2
> 4. The job remains in the YARN UI as ACCEPTED
> Looking in the logs we see the following:
> {code}
> 2015-09-07 10:32:43,446 ERROR [Thread-3] o.a.s.u.Utils - Uncaught exception 
> in thread Thread-3
> java.lang.NullPointerException: null
> at 
> org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:162)
>  ~[spark-core_2.10-1.4.1.jar:1.4.1]
> at 
> org.apache.spark.storage.DiskBlockManager$$anonfun$addShutdownHook$1.apply$mcV$sp(DiskBlockManager.scala:144)
>  ~[spark-core_2.10-1.4.1.jar:1.4.1]
> at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2308) 
> ~[spark-core_2.10-1.4.1.jar:1.4.1]
> {code}
> I think what's going on is that when we kill the application in the queued 
> state, it tries to run the SparkContext.stop() method on the driver and stop 
> each component. It dies trying to stop the DiskBlockManager because it hasn't 
> been initialized yet - the application is still waiting to be scheduled by 
> the Yarn RM - but YarnClient.stop() is not invoked as a result, leaving the 
> application sticking around in the accepted state.
> Because of what appears to be bugs in the YARN scheduler, entering this state 
> makes it so that the YARN scheduler is unable to schedule any more jobs 
> unless we manually remove this application via the YARN CLI. We can tackle 
> the YARN stuck state separately, but ensuring that all components get at 
> least some chance to stop when a SparkContext stops seems like a good idea. 
> Of course we can still throw some exception and/or log exceptions for 
> everything that goes wrong at the end of stopping the context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10568) Error thrown in stopping one component in SparkContext.stop() doesn't allow other components to be stopped

2015-09-11 Thread Matt Cheah (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-10568:
---
Description: 
When I shut down a Java process that is running a SparkContext, it invokes a 
shutdown hook that eventually calls SparkContext.stop(), and inside 
SparkContext.stop() each individual component (DiskBlockManager, Scheduler 
Backend) is stopped. If an exception is thrown in stopping one of these 
components, none of the other components will be stopped cleanly either. This 
caused problems when I stopped a Java process running a Spark context in 
yarn-client mode, because not properly stopping YarnSchedulerBackend leads to 
problems.

The steps I ran are as follows:
1. Create one job which fills the cluster
2. Kick off another job which creates a Spark Context
3. Kill the Java process with the Spark Context in #2
4. The job remains in the YARN UI as ACCEPTED

Looking in the logs we see the following:

{code}
2015-09-07 10:32:43,446 ERROR [Thread-3] o.a.s.u.Utils - Uncaught exception in 
thread Thread-3
java.lang.NullPointerException: null
at 
org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:162)
 ~[spark-core_2.10-1.4.1.jar:1.4.1]
at 
org.apache.spark.storage.DiskBlockManager$$anonfun$addShutdownHook$1.apply$mcV$sp(DiskBlockManager.scala:144)
 ~[spark-core_2.10-1.4.1.jar:1.4.1]
at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2308) 
~[spark-core_2.10-1.4.1.jar:1.4.1]
{code}

I think what's going on is that when we kill the application in the queued 
state, it tries to run the SparkContext.stop() method on the driver and stop 
each component. It dies trying to stop the DiskBlockManager because it hasn't 
been initialized yet - the application is still waiting to be scheduled by the 
Yarn RM - but YarnClient.stop() is not invoked as a result, leaving the 
application sticking around in the accepted state.

Because of what appears to be bugs in the YARN scheduler, entering this state 
makes it so that the YARN scheduler is unable to schedule any more jobs unless 
we manually remove this application via the YARN CLI. We can tackle the YARN 
stuck state separately, but ensuring that all components get at least some 
chance to stop when a SparkContext stops seems like a good idea. Of course we 
can still throw some exception and/or log exceptions for everything that goes 
wrong at the end of stopping the context.

  was:
When I shut down a Java process that is running a SparkContext, it invokes a 
shutdown hook that eventually calls SparkContext.stop(), and inside 
SparkContext.stop() each individual component (DiskBlockManager, Scheduler 
Backend) is stopped. If an exception is thrown in stopping one of these 
components, none of the other components will be stopped cleanly either. This 
caused problems when I stopped a Java process running a Spark context in 
yarn-client mode, because not properly stopping YarnSchedulerBackend leads to 
problems.

The steps I ran are as follows:
1. Create one job which fills the cluster
2. Kick off another job which creates a Spark Context
3. Kill the Java process with the Spark Context in #2
4. The job remains in the YARN UI as ACCEPTED

Looking in the logs we see the following:

{code}
2015-09-07 10:32:43,446 ERROR [Thread-3] o.a.s.u.Utils - Uncaught exception in 
thread Thread-3
java.lang.NullPointerException: null
at 
org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:162)
 ~[spark-core_2.10-1.4.1-palantir2.jar:1.4.1-palantir2]
at 
org.apache.spark.storage.DiskBlockManager$$anonfun$addShutdownHook$1.apply$mcV$sp(DiskBlockManager.scala:144)
 ~[spark-core_2.10-1.4.1-palantir2.jar:1.4.1-palantir2]
at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2308) 
~[spark-core_2.10-1.4.1-palantir2.jar:1.4.1-palantir2]
{code}

I think what's going on is that when we kill the application in the queued 
state, it tries to run the SparkContext.stop() method on the driver and stop 
each component. It dies trying to stop the DiskBlockManager because it hasn't 
been initialized yet - the application is still waiting to be scheduled by the 
Yarn RM - but YarnClient.stop() is not invoked as a result, leaving the 
application sticking around in the accepted state.

Because of what appears to be bugs in the YARN scheduler, entering this state 
makes it so that the YARN scheduler is unable to schedule any more jobs unless 
we manually remove this application via the YARN CLI. We can tackle the YARN 
stuck state separately, but ensuring that all components get at least some 
chance to stop when a SparkContext stops seems like a good idea. Of course we 
can still throw some exception and/or log exceptions for everything that goes 
wrong at the end of stopping the context.


> Error thrown in stopping