GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/8856
[SPARK-10649] [STREAMING] Prevent inheriting job group and irrelevant job
description in streaming jobs
** Note that this PR only for branch 1.5. See #8781 for the solution for
Spark master **.
The job group, and job descriptions information is passed through thread
local properties, and get inherited by child threads. In case of spark
streaming, the streaming jobs inherit these properties from the thread that
called streamingContext.start(). This may not make sense.
1. Job group: This is mainly used for cancelling a group of jobs together.
It does not make sense to cancel streaming jobs like this, as the effect will
be unpredictable. And its not a valid usecase any way, to cancel a streaming
context, call streamingContext.stop()
2. Job description: This is used to pass on nice text descriptions for jobs
to show up in the UI. The job description of the thread that calls
streamingContext.start() is not useful for all the streaming jobs, as it does
not make sense for all of the streaming jobs to have the same description, and
the description may or may not be related to streaming.
The solution in this PR is meant for the Spark branch 1.5, where local
properties are inherited by cloning the properties only when the Spark config
`spark.localProperties.clone` is set to `true` (see #8781 for the PR for Spark
master branch). Similar to the approach taken by #8721, StreamingContext sets
that configuration to true, which makes sure that all subsequent child threads
get a cloned copy of the threadlocal properties. This allows the job group and
job description to be explicitly removed in the thread that starts the
streaming scheduler, so that all the subsequent child threads does not inherit
them. Also, the starting is done in a new child thread, so that setting the job
group and description for streaming, does not change those properties in the
thread that called streamingContext.start().
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-10649-1.5
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8856.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8856
----
commit e7b77e4895d0e089e92fdaca70bf07da214e2471
Author: Tathagata Das <[email protected]>
Date: 2015-09-21T23:47:52Z
[SPARK-10649] [STREAMING] Prevent inheriting job group and irrelevant job
description in streaming jobs
The job group, and job descriptions information is passed through thread
local properties, and get inherited by child threads. In case of spark
streaming, the streaming jobs inherit these properties from the thread that
called streamingContext.start(). This may not make sense.
1. Job group: This is mainly used for cancelling a group of jobs together.
It does not make sense to cancel streaming jobs like this, as the effect will
be unpredictable. And its not a valid usecase any way, to cancel a streaming
context, call streamingContext.stop()
2. Job description: This is used to pass on nice text descriptions for jobs
to show up in the UI. The job description of the thread that calls
streamingContext.start() is not useful for all the streaming jobs, as it does
not make sense for all of the streaming jobs to have the same description, and
the description may or may not be related to streaming.
The solution in this PR is meant for the Spark master branch, where local
properties are inherited by cloning the properties. The job group and job
description in the thread that starts the streaming scheduler are explicitly
removed, so that all the subsequent child threads does not inherit them. Also,
the starting is done in a new child thread, so that setting the job group and
description for streaming, does not change those properties in the thread that
called streamingContext.start().
Author: Tathagata Das <[email protected]>
Closes #8781 from tdas/SPARK-10649.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]