GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/8856

    [SPARK-10649] [STREAMING] Prevent inheriting job group and irrelevant job 
description in streaming jobs

    ** Note that this PR only for branch 1.5. See #8781 for the solution for 
Spark master **.
    
    The job group, and job descriptions information is passed through thread 
local properties, and get inherited by child threads. In case of spark 
streaming, the streaming jobs inherit these properties from the thread that 
called streamingContext.start(). This may not make sense.
    
    1. Job group: This is mainly used for cancelling a group of jobs together. 
It does not make sense to cancel streaming jobs like this, as the effect will 
be unpredictable. And its not a valid usecase any way, to cancel a streaming 
context, call streamingContext.stop()
    
    2. Job description: This is used to pass on nice text descriptions for jobs 
to show up in the UI. The job description of the thread that calls 
streamingContext.start() is not useful for all the streaming jobs, as it does 
not make sense for all of the streaming jobs to have the same description, and 
the description may or may not be related to streaming.
    
    The solution in this PR is meant for the Spark branch 1.5, where local 
properties are inherited by cloning the properties only when the Spark config 
`spark.localProperties.clone` is set to `true` (see #8781 for the PR for Spark 
master branch). Similar to the approach taken by #8721, StreamingContext sets 
that configuration to true, which makes sure that all subsequent child threads 
get a cloned copy of the threadlocal properties. This allows the job group and 
job description to be explicitly removed in the thread that starts the 
streaming scheduler, so that all the subsequent child threads does not inherit 
them. Also, the starting is done in a new child thread, so that setting the job 
group and description for streaming, does not change those properties in the 
thread that called streamingContext.start().

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-10649-1.5

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8856
    
----
commit e7b77e4895d0e089e92fdaca70bf07da214e2471
Author: Tathagata Das <[email protected]>
Date:   2015-09-21T23:47:52Z

    [SPARK-10649] [STREAMING] Prevent inheriting job group and irrelevant job 
description in streaming jobs
    
    The job group, and job descriptions information is passed through thread 
local properties, and get inherited by child threads. In case of spark 
streaming, the streaming jobs inherit these properties from the thread that 
called streamingContext.start(). This may not make sense.
    
    1. Job group: This is mainly used for cancelling a group of jobs together. 
It does not make sense to cancel streaming jobs like this, as the effect will 
be unpredictable. And its not a valid usecase any way, to cancel a streaming 
context, call streamingContext.stop()
    
    2. Job description: This is used to pass on nice text descriptions for jobs 
to show up in the UI. The job description of the thread that calls 
streamingContext.start() is not useful for all the streaming jobs, as it does 
not make sense for all of the streaming jobs to have the same description, and 
the description may or may not be related to streaming.
    
    The solution in this PR is meant for the Spark master branch, where local 
properties are inherited by cloning the properties. The job group and job 
description in the thread that starts the streaming scheduler are explicitly 
removed, so that all the subsequent child threads does not inherit them. Also, 
the starting is done in a new child thread, so that setting the job group and 
description for streaming, does not change those properties in the thread that 
called streamingContext.start().
    
    Author: Tathagata Das <[email protected]>
    
    Closes #8781 from tdas/SPARK-10649.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to