[jira] [Updated] (SPARK-23978) Kryo much slower when mllib jar not on classpath

2018-04-13 Thread Richard Wilkinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Wilkinson updated SPARK-23978:
--
   Priority: Minor  (was: Major)
Description: 
Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib 
classes to the kryo registration, but it does this via class.forName.

If the mllib jar is not on the classpath, this can be very slow.

My app, which is using GraphX connected components function is 2x slower in 2.3 
than 2.2.1

I have attached jVisualVM stats for both cases; you can see a vast amount of 
time is spent in Utils.classForName.  While debugging, i traced this to the 
Kryo initialization

  was:
Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib 
classes to the kryo registration, but it does this via class.forName.

If the mllib jar is not on the classpath, this can be very slow.

My app, which is using GraphX connected components function is 2x slower in 2.3 
than 2.2.1

 


> Kryo much slower when mllib jar not on classpath
> 
>
> Key: SPARK-23978
> URL: https://issues.apache.org/jira/browse/SPARK-23978
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
> Environment: Windows 10, Java 8
>Reporter: Richard Wilkinson
>Priority: Minor
> Attachments: kryo_stats.png
>
>
> Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib 
> classes to the kryo registration, but it does this via class.forName.
> If the mllib jar is not on the classpath, this can be very slow.
> My app, which is using GraphX connected components function is 2x slower in 
> 2.3 than 2.2.1
> I have attached jVisualVM stats for both cases; you can see a vast amount of 
> time is spent in Utils.classForName.  While debugging, i traced this to the 
> Kryo initialization



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23978) Kryo much slower when mllib jar not on classpath

2018-04-13 Thread Richard Wilkinson (JIRA)
Richard Wilkinson created SPARK-23978:
-

 Summary: Kryo much slower when mllib jar not on classpath
 Key: SPARK-23978
 URL: https://issues.apache.org/jira/browse/SPARK-23978
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.0
 Environment: Windows 10, Java 8
Reporter: Richard Wilkinson
 Attachments: kryo_stats.png

Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib 
classes to the kryo registration, but it does this via class.forName.

If the mllib jar is not on the classpath, this can be very slow.

My app, which is using GraphX connected components function is 2x slower in 2.3 
than 2.2.1

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23978) Kryo much slower when mllib jar not on classpath

2018-04-13 Thread Richard Wilkinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Wilkinson updated SPARK-23978:
--
Attachment: kryo_stats.png

> Kryo much slower when mllib jar not on classpath
> 
>
> Key: SPARK-23978
> URL: https://issues.apache.org/jira/browse/SPARK-23978
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
> Environment: Windows 10, Java 8
>Reporter: Richard Wilkinson
>Priority: Major
> Attachments: kryo_stats.png
>
>
> Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib 
> classes to the kryo registration, but it does this via class.forName.
> If the mllib jar is not on the classpath, this can be very slow.
> My app, which is using GraphX connected components function is 2x slower in 
> 2.3 than 2.2.1
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22450) Safely register class for mllib

2018-03-06 Thread Richard Wilkinson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387630#comment-16387630
 ] 

Richard Wilkinson commented on SPARK-22450:
---

Just as an FYI, the change to 
org.apache.spark.serializer.KryoSerializer#newKryo from (i think this ticket) 
this is a performance hit over the in 2.2.1.  I am calling 
org.apache.spark.serializer.KryoSerializer#newInstance alot, which is probably 
an issue in itself (hence not rasing a bug report), but im not aware of how 
much this is called internal to spark.  I do not have the ML jars on my 
classpath.

> Safely register class for mllib
> ---
>
> Key: SPARK-22450
> URL: https://issues.apache.org/jira/browse/SPARK-22450
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Xianyang Liu
>Assignee: Xianyang Liu
>Priority: Major
> Fix For: 2.3.0
>
>
> There are still some algorithms based on mllib, such as KMeans.  For now, 
> many mllib common class (such as: Vector, DenseVector, SparseVector, Matrix, 
> DenseMatrix, SparseMatrix) are not registered in Kryo. So there are some 
> performance issues for those object serialization or deserialization.
> Previously dicussed: https://github.com/apache/spark/pull/19586



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org