[jira] [Assigned] (SPARK-7970) Optimize code for SQL queries fired on Union of RDDs (closure cleaner)

2015-10-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7970:
---

Assignee: (was: Apache Spark)

> Optimize code for SQL queries fired on Union of RDDs (closure cleaner)
> --
>
> Key: SPARK-7970
> URL: https://issues.apache.org/jira/browse/SPARK-7970
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Nitin Goyal
> Attachments: Screen Shot 2015-05-27 at 11.01.03 pm.png, Screen Shot 
> 2015-05-27 at 11.07.02 pm.png
>
>
> Closure cleaner slows down the execution of Spark SQL queries fired on union 
> of RDDs. The time increases linearly at driver side with number of RDDs 
> unioned. Refer following thread for more context :-
> http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tt12466.html
> As can be seen in attached screenshots of Jprofiler, lot of time is getting 
> consumed in "getClassReader" method of ClosureCleaner and rest in 
> "ensureSerializable" (atleast in my case)
> This can be fixed in two ways (as per my current understanding) :-
> 1. Fixed at Spark SQL level - As pointed out by yhuai, we can create 
> MapPartitionsRDD idirectly nstead of doing rdd.mapPartitions which calls 
> ClosureCleaner clean method (See PR - 
> https://github.com/apache/spark/pull/6256).
> 2. Fix at Spark core level -
>   (i) Make "checkSerializable" property driven in SparkContext's clean method
>   (ii) Somehow cache classreader for last 'n' classes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7970) Optimize code for SQL queries fired on Union of RDDs (closure cleaner)

2015-10-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7970:
---

Assignee: Apache Spark

> Optimize code for SQL queries fired on Union of RDDs (closure cleaner)
> --
>
> Key: SPARK-7970
> URL: https://issues.apache.org/jira/browse/SPARK-7970
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Nitin Goyal
>Assignee: Apache Spark
> Attachments: Screen Shot 2015-05-27 at 11.01.03 pm.png, Screen Shot 
> 2015-05-27 at 11.07.02 pm.png
>
>
> Closure cleaner slows down the execution of Spark SQL queries fired on union 
> of RDDs. The time increases linearly at driver side with number of RDDs 
> unioned. Refer following thread for more context :-
> http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tt12466.html
> As can be seen in attached screenshots of Jprofiler, lot of time is getting 
> consumed in "getClassReader" method of ClosureCleaner and rest in 
> "ensureSerializable" (atleast in my case)
> This can be fixed in two ways (as per my current understanding) :-
> 1. Fixed at Spark SQL level - As pointed out by yhuai, we can create 
> MapPartitionsRDD idirectly nstead of doing rdd.mapPartitions which calls 
> ClosureCleaner clean method (See PR - 
> https://github.com/apache/spark/pull/6256).
> 2. Fix at Spark core level -
>   (i) Make "checkSerializable" property driven in SparkContext's clean method
>   (ii) Somehow cache classreader for last 'n' classes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org