[jira] [Updated] (SPARK-23978) Kryo much slower when mllib jar not on classpath
[ https://issues.apache.org/jira/browse/SPARK-23978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Wilkinson updated SPARK-23978: -- Priority: Minor (was: Major) Description: Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib classes to the kryo registration, but it does this via class.forName. If the mllib jar is not on the classpath, this can be very slow. My app, which is using GraphX connected components function is 2x slower in 2.3 than 2.2.1 I have attached jVisualVM stats for both cases; you can see a vast amount of time is spent in Utils.classForName. While debugging, i traced this to the Kryo initialization was: Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib classes to the kryo registration, but it does this via class.forName. If the mllib jar is not on the classpath, this can be very slow. My app, which is using GraphX connected components function is 2x slower in 2.3 than 2.2.1 > Kryo much slower when mllib jar not on classpath > > > Key: SPARK-23978 > URL: https://issues.apache.org/jira/browse/SPARK-23978 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 > Environment: Windows 10, Java 8 >Reporter: Richard Wilkinson >Priority: Minor > Attachments: kryo_stats.png > > > Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib > classes to the kryo registration, but it does this via class.forName. > If the mllib jar is not on the classpath, this can be very slow. > My app, which is using GraphX connected components function is 2x slower in > 2.3 than 2.2.1 > I have attached jVisualVM stats for both cases; you can see a vast amount of > time is spent in Utils.classForName. While debugging, i traced this to the > Kryo initialization -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23978) Kryo much slower when mllib jar not on classpath
Richard Wilkinson created SPARK-23978: - Summary: Kryo much slower when mllib jar not on classpath Key: SPARK-23978 URL: https://issues.apache.org/jira/browse/SPARK-23978 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.0 Environment: Windows 10, Java 8 Reporter: Richard Wilkinson Attachments: kryo_stats.png Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib classes to the kryo registration, but it does this via class.forName. If the mllib jar is not on the classpath, this can be very slow. My app, which is using GraphX connected components function is 2x slower in 2.3 than 2.2.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23978) Kryo much slower when mllib jar not on classpath
[ https://issues.apache.org/jira/browse/SPARK-23978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Wilkinson updated SPARK-23978: -- Attachment: kryo_stats.png > Kryo much slower when mllib jar not on classpath > > > Key: SPARK-23978 > URL: https://issues.apache.org/jira/browse/SPARK-23978 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 > Environment: Windows 10, Java 8 >Reporter: Richard Wilkinson >Priority: Major > Attachments: kryo_stats.png > > > Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib > classes to the kryo registration, but it does this via class.forName. > If the mllib jar is not on the classpath, this can be very slow. > My app, which is using GraphX connected components function is 2x slower in > 2.3 than 2.2.1 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22450) Safely register class for mllib
[ https://issues.apache.org/jira/browse/SPARK-22450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387630#comment-16387630 ] Richard Wilkinson commented on SPARK-22450: --- Just as an FYI, the change to org.apache.spark.serializer.KryoSerializer#newKryo from (i think this ticket) this is a performance hit over the in 2.2.1. I am calling org.apache.spark.serializer.KryoSerializer#newInstance alot, which is probably an issue in itself (hence not rasing a bug report), but im not aware of how much this is called internal to spark. I do not have the ML jars on my classpath. > Safely register class for mllib > --- > > Key: SPARK-22450 > URL: https://issues.apache.org/jira/browse/SPARK-22450 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Xianyang Liu >Assignee: Xianyang Liu >Priority: Major > Fix For: 2.3.0 > > > There are still some algorithms based on mllib, such as KMeans. For now, > many mllib common class (such as: Vector, DenseVector, SparseVector, Matrix, > DenseMatrix, SparseMatrix) are not registered in Kryo. So there are some > performance issues for those object serialization or deserialization. > Previously dicussed: https://github.com/apache/spark/pull/19586 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org