[jira] [Commented] (SPARK-22625) Properly cleanup inheritable thread-locals

2017-11-29 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271041#comment-16271041
 ] 

Sean Owen commented on SPARK-22625:
---

I don't have any additional info for you. You can propose a PR. The issue is in 
part caused by a third party library creating threads though. If it's a clean 
improvement to Spark, OK, but not really something to 'work around'.

> Properly cleanup inheritable thread-locals
> --
>
> Key: SPARK-22625
> URL: https://issues.apache.org/jira/browse/SPARK-22625
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Tolstopyatov Vsevolod
>  Labels: leak
>
> Memory leak is present due to inherited thread locals, SPARK-20558 didn't 
> fixed it properly.
> Our production application has the following logic: one thread is reading 
> from HDFS and another one creates spark context, processes HDFS files and 
> then closes it on regular schedule.
> Depending on what thread started first, SparkContext thread local may or may 
> not be inherited by HDFS-daemon (DataStreamer), causing memory leak when 
> streamer was created after spark context. Memory consumption increases every 
> time new spark context is created, related yourkit paths: 
> https://screencast.com/t/tgFBYMEpW
> The problem is more general and is not related to HDFS in particular.
> Proper fix: register all cloned properties (in `localProperties#childValue`) 
> in ConcurrentHashMap and forcefully clear all of them in `SparkContext#close`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22625) Properly cleanup inheritable thread-locals

2017-11-29 Thread Tolstopyatov Vsevolod (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271038#comment-16271038
 ] 

Tolstopyatov Vsevolod commented on SPARK-22625:
---

Ping [~srowen]

> Properly cleanup inheritable thread-locals
> --
>
> Key: SPARK-22625
> URL: https://issues.apache.org/jira/browse/SPARK-22625
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Tolstopyatov Vsevolod
>  Labels: leak
>
> Memory leak is present due to inherited thread locals, SPARK-20558 didn't 
> fixed it properly.
> Our production application has the following logic: one thread is reading 
> from HDFS and another one creates spark context, processes HDFS files and 
> then closes it on regular schedule.
> Depending on what thread started first, SparkContext thread local may or may 
> not be inherited by HDFS-daemon (DataStreamer), causing memory leak when 
> streamer was created after spark context. Memory consumption increases every 
> time new spark context is created, related yourkit paths: 
> https://screencast.com/t/tgFBYMEpW
> The problem is more general and is not related to HDFS in particular.
> Proper fix: register all cloned properties (in `localProperties#childValue`) 
> in ConcurrentHashMap and forcefully clear all of them in `SparkContext#close`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22625) Properly cleanup inheritable thread-locals

2017-11-28 Thread Tolstopyatov Vsevolod (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268483#comment-16268483
 ] 

Tolstopyatov Vsevolod commented on SPARK-22625:
---

In Spark.
The problem is that thread from third-party library is created after spark 
context (thus inheriting its thread local with all the properties, UI object 
etc.) and I have no access over this thread (this is library implementation 
detail), but it's still spark issue (because spark is the owner of this thread 
local): for every inherited thread local spark clones all the properties, so 
the they are effectively detached from Spark and close doesn't affect them. And 
moreover, this properties are inherited for every thread created after spark 
context, so I can't even prevent.

Solution (register every cloned property object and call `.clear` on close for 
all of them) will help


> Properly cleanup inheritable thread-locals
> --
>
> Key: SPARK-22625
> URL: https://issues.apache.org/jira/browse/SPARK-22625
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Tolstopyatov Vsevolod
>  Labels: leak
>
> Memory leak is present due to inherited thread locals, SPARK-20558 didn't 
> fixed it properly.
> Our production application has the following logic: one thread is reading 
> from HDFS and another one creates spark context, processes HDFS files and 
> then closes it on regular schedule.
> Depending on what thread started first, SparkContext thread local may or may 
> not be inherited by HDFS-daemon (DataStreamer), causing memory leak when 
> streamer was created after spark context. Memory consumption increases every 
> time new spark context is created, related yourkit paths: 
> https://screencast.com/t/tgFBYMEpW
> The problem is more general and is not related to HDFS in particular.
> Proper fix: register all cloned properties (in `localProperties#childValue`) 
> in ConcurrentHashMap and forcefully clear all of them in `SparkContext#close`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22625) Properly cleanup inheritable thread-locals

2017-11-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268474#comment-16268474
 ] 

Sean Owen commented on SPARK-22625:
---

Is this a ThreadLocal in your code or a third party library? I'm not clear 
where the leak is in Spark.

> Properly cleanup inheritable thread-locals
> --
>
> Key: SPARK-22625
> URL: https://issues.apache.org/jira/browse/SPARK-22625
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Tolstopyatov Vsevolod
>  Labels: leak
>
> Memory leak is present due to inherited thread locals, SPARK-20558 didn't 
> fixed it properly.
> Our production application has the following logic: one thread is reading 
> from HDFS and another one creates spark context, processes HDFS files and 
> then closes it on regular schedule.
> Depending on what thread started first, SparkContext thread local may or may 
> not be inherited by HDFS-daemon (DataStreamer), causing memory leak when 
> streamer was created after spark context. Memory consumption increases every 
> time new spark context is created, related yourkit paths: 
> https://screencast.com/t/tgFBYMEpW
> The problem is more general and is not related to HDFS in particular.
> Proper fix: register all cloned properties (in `localProperties#childValue`) 
> in ConcurrentHashMap and forcefully clear all of them in `SparkContext#close`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22625) Properly cleanup inheritable thread-locals

2017-11-28 Thread Tolstopyatov Vsevolod (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268461#comment-16268461
 ] 

Tolstopyatov Vsevolod commented on SPARK-22625:
---

If you agree this is the problem I can work on a patch in a week or so

> Properly cleanup inheritable thread-locals
> --
>
> Key: SPARK-22625
> URL: https://issues.apache.org/jira/browse/SPARK-22625
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Tolstopyatov Vsevolod
>  Labels: leak
>
> Memory leak is present due to inherited thread locals, SPARK-20558 didn't 
> fixed it properly.
> Our production application has the following logic: one thread is reading 
> from HDFS and another one creates spark context, processes HDFS files and 
> then closes it on regular schedule.
> Depending on what thread started first, SparkContext thread local may or may 
> not be inherited by HDFS-daemon (DataStreamer), causing memory leak when 
> streamer was created after spark context. Memory consumption increases every 
> time new spark context is created, related yourkit paths: 
> https://screencast.com/t/tgFBYMEpW
> The problem is more general and is not related to HDFS in particular.
> Proper fix: register all cloned properties (in `localProperties#childValue`) 
> in ConcurrentHashMap and forcefully clear all of them in `SparkContext#close`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org