[jira] [Created] (SPARK-21575) Eliminate needless synchronization in java-R serialization
Iurii Antykhovych created SPARK-21575: - Summary: Eliminate needless synchronization in java-R serialization Key: SPARK-21575 URL: https://issues.apache.org/jira/browse/SPARK-21575 Project: Spark Issue Type: Improvement Components: SparkR Affects Versions: 2.2.0 Reporter: Iurii Antykhovych Priority: Trivial As long as {{org.apache.spark.api.r.JVMObjectTracker}} is backed by {{ConcurrentHashMap}}, synchronized blocks in {{get(..)}} and {{remove(..)}} methods can be safely removed. This would eliminate lock contention in {{org.apache.spark.api.r.SerDe}} and {{org.apache.spark.api.r.RBackendHandler}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections
[ https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099390#comment-16099390 ] Iurii Antykhovych commented on SPARK-21491: --- Done, could you please re-check. > Performance enhancement: eliminate creation of intermediate collections > --- > > Key: SPARK-21491 > URL: https://issues.apache.org/jira/browse/SPARK-21491 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.2.0 >Reporter: Iurii Antykhovych >Priority: Trivial > > Simple performance optimization in a few places of GraphX: > {{Traversable.toMap}} can be replaced with {{collection.breakout}}. > This would eliminate creation of an intermediate collection of tuples, see > [Stack Overflow > article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections
[ https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097903#comment-16097903 ] Iurii Antykhovych commented on SPARK-21491: --- Performed some micro-benchmarking with [jmh|http://openjdk.java.net/projects/code-tools/jmh/]: in [this GitHub repository|https://github.com/SereneAnt/sbt_jmh]. The results are below, the time is average per operation (conversion of collection to map): {noformat} BenchmarkMode CntScoreError Units BreakOutBenchmark.largeBreakOut avgt 20 158.186 ± 7.472 us/op BreakOutBenchmark.largeToMap avgt 20 166.793 ± 1.670 us/op BreakOutBenchmark.smallBreakOut avgt 200.698 ± 0.019 us/op BreakOutBenchmark.smallToMap avgt 200.758 ± 0.020 us/op BreakOutBenchmark.tinyBreakOut avgt 200.018 ± 0.001 us/op BreakOutBenchmark.tinyToMap avgt 200.030 ± 0.001 us/op {noformat} Map sizes in microbenchmark fixtures: * large - 1000 entries * small - 10 entries * tiny - 1 entry Updated the pull-request: comments added. Only hot code retained. For example, {{ShortestPaths#addMaps}} is executed more than 60 times for simple graph test in {{ShortestPathsSuite.scala}} > Performance enhancement: eliminate creation of intermediate collections > --- > > Key: SPARK-21491 > URL: https://issues.apache.org/jira/browse/SPARK-21491 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.2.0 >Reporter: Iurii Antykhovych >Priority: Trivial > > Simple performance optimization in a few places of GraphX: > {{Traversable.toMap}} can be replaced with {{collection.breakout}}. > This would eliminate creation of an intermediate collection of tuples, see > [Stack Overflow > article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections
[ https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097815#comment-16097815 ] Iurii Antykhovych commented on SPARK-21491: --- Sure, I'll try to write a performance test that depicts the effect. In cases like this one, not only net performance matters but latencies and GC stats as well, it eliminates excess allocation of short living objects. > Performance enhancement: eliminate creation of intermediate collections > --- > > Key: SPARK-21491 > URL: https://issues.apache.org/jira/browse/SPARK-21491 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.2.0 >Reporter: Iurii Antykhovych >Priority: Trivial > > Simple performance optimization in a few places of GraphX: > {{Traversable.toMap}} can be replaced with {{collection.breakout}}. > This would eliminate creation of an intermediate collection of tuples, see > [Stack Overflow > article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections
[ https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095587#comment-16095587 ] Iurii Antykhovych edited comment on SPARK-21491 at 7/21/17 12:30 AM: - I searched for all such places in the whole GraphX module. These are only three occurrences of such tuples-to-map conversion I could boost without major refactoring. Let GraphX be the pilot module of such an optimization) The changes in `LabelPropagation` and `ShortestPaths` are on a hot execution path. The fix in `PageRank` class is executed once per run, not 'hot' enough. Shall I revert it? was (Author: sereneant): I searched for all such places in the whole GraphX module. These are only three occurrences of such tuples-to-map conversion I could boost without major refactoring. Let GraphX be the pilot module of such an optimization) The only change on a hot path of execution is the code is the one on `graphx.lib.ShortestPaths` class. The rest is executed once per run, not 'hot' enough. Shall I revert it (LabelPropagation.scala, PageRank.scala)? > Performance enhancement: eliminate creation of intermediate collections > --- > > Key: SPARK-21491 > URL: https://issues.apache.org/jira/browse/SPARK-21491 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.2.0 >Reporter: Iurii Antykhovych >Priority: Trivial > > Simple performance optimization in a few places of GraphX: > {{Traversable.toMap}} can be replaced with {{collection.breakout}}. > This would eliminate creation of an intermediate collection of tuples, see > [Stack Overflow > article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections
[ https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095587#comment-16095587 ] Iurii Antykhovych commented on SPARK-21491: --- I searched for all such places in the whole GraphX module. These are only three occurrences of such tuples-to-map conversion I could boost without major refactoring. Let GraphX be the pilot module of such an optimization) The only change on a hot path of execution is the code is the one on `graphx.lib.ShortestPaths` class. The rest is executed once per run, not 'hot' enough. Shall I revert it (LabelPropagation.scala, PageRank.scala)? > Performance enhancement: eliminate creation of intermediate collections > --- > > Key: SPARK-21491 > URL: https://issues.apache.org/jira/browse/SPARK-21491 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.2.0 >Reporter: Iurii Antykhovych >Priority: Trivial > > Simple performance optimization in a few places of GraphX: > {{Traversable.toMap}} can be replaced with {{collection.breakout}}. > This would eliminate creation of an intermediate collection of tuples, see > [Stack Overflow > article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections
[ https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095398#comment-16095398 ] Iurii Antykhovych edited comment on SPARK-21491 at 7/20/17 9:22 PM: This is relevant to all scala versions starting from 2.8, it's in `scala.collection.breakOut`. The problem with {{collection.map(...).toMap}} is in creation of intermediate collection of tuples, that is converted to map then; that leads to performance degradation and excess object allocation. The price of {{collection.breakOut}} is the code readability, it significantly suffers I guess, compared to {{.toMap}} method. was (Author: sereneant): This is relevant to all scala versions starting from 2.8, it's in `scala.collection.breakOut`. The problem with `collection.map(...).toMap` is in creation of intermediate collection of tuples, that is converted to map then; that leads to performance degradation and excess object allocation. The price of `collection.breakOut` is the code readability, it significantly suffers I guess, compared to '.toMap' method. > Performance enhancement: eliminate creation of intermediate collections > --- > > Key: SPARK-21491 > URL: https://issues.apache.org/jira/browse/SPARK-21491 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.2.0 >Reporter: Iurii Antykhovych >Priority: Trivial > > Simple performance optimization in a few places of GraphX: > {{Traversable.toMap}} can be replaced with {{collection.breakout}}. > This would eliminate creation of an intermediate collection of tuples, see > [Stack Overflow > article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections
[ https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095398#comment-16095398 ] Iurii Antykhovych commented on SPARK-21491: --- This is relevant to all scala versions starting from 2.8, it's in `scala.collection.breakOut`. The problem with `collection.map(...).toMap` is in creation of intermediate collection of tuples, that is converted to map then; that leads to performance degradation and excess object allocation. The price of `collection.breakOut` is the code readability, it significantly suffers I guess, compared to '.toMap' method. > Performance enhancement: eliminate creation of intermediate collections > --- > > Key: SPARK-21491 > URL: https://issues.apache.org/jira/browse/SPARK-21491 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.2.0 >Reporter: Iurii Antykhovych >Priority: Trivial > > Simple performance optimization in a few places of GraphX: > {{Traversable.toMap}} can be replaced with {{collection.breakout}}. > This would eliminate creation of an intermediate collection of tuples, see > [Stack Overflow > article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections
[ https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iurii Antykhovych updated SPARK-21491: -- Priority: Trivial (was: Minor) > Performance enhancement: eliminate creation of intermediate collections > --- > > Key: SPARK-21491 > URL: https://issues.apache.org/jira/browse/SPARK-21491 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.2.0 >Reporter: Iurii Antykhovych >Priority: Trivial > > Simple performance optimization in a few places of GraphX: > {{Traversable.toMap}} can be replaced with {{collection.breakout}}. > This would eliminate creation of an intermediate collection of tuples, see > [Stack Overflow > article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections
Iurii Antykhovych created SPARK-21491: - Summary: Performance enhancement: eliminate creation of intermediate collections Key: SPARK-21491 URL: https://issues.apache.org/jira/browse/SPARK-21491 Project: Spark Issue Type: Improvement Components: GraphX Affects Versions: 2.2.0 Reporter: Iurii Antykhovych Priority: Minor Simple performance optimization in a few places of GraphX: {{Traversable.toMap}} can be replaced with {{collection.breakout}}. This would eliminate creation of an intermediate collection of tuples, see [Stack Overflow article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org