[jira] [Created] (SPARK-21575) Eliminate needless synchronization in java-R serialization

2017-07-29 Thread Iurii Antykhovych (JIRA)
Iurii Antykhovych created SPARK-21575:
-

 Summary: Eliminate needless synchronization in java-R serialization
 Key: SPARK-21575
 URL: https://issues.apache.org/jira/browse/SPARK-21575
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 2.2.0
Reporter: Iurii Antykhovych
Priority: Trivial


As long as {{org.apache.spark.api.r.JVMObjectTracker}} is backed by 
{{ConcurrentHashMap}}, synchronized blocks in {{get(..)}} and {{remove(..)}} 
methods can be safely removed.

This would eliminate lock contention in {{org.apache.spark.api.r.SerDe}}
 and {{org.apache.spark.api.r.RBackendHandler}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections

2017-07-24 Thread Iurii Antykhovych (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099390#comment-16099390
 ] 

Iurii Antykhovych commented on SPARK-21491:
---

Done,
could you please re-check.

> Performance enhancement: eliminate creation of intermediate collections
> ---
>
> Key: SPARK-21491
> URL: https://issues.apache.org/jira/browse/SPARK-21491
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 2.2.0
>Reporter: Iurii Antykhovych
>Priority: Trivial
>
> Simple performance optimization in a few places of GraphX:
> {{Traversable.toMap}} can be replaced with {{collection.breakout}}.
> This would eliminate creation of an intermediate collection of tuples, see
> [Stack Overflow 
> article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections

2017-07-23 Thread Iurii Antykhovych (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097903#comment-16097903
 ] 

Iurii Antykhovych commented on SPARK-21491:
---

Performed some micro-benchmarking with 
[jmh|http://openjdk.java.net/projects/code-tools/jmh/]: in [this GitHub 
repository|https://github.com/SereneAnt/sbt_jmh].
The results are below, the time is average per operation (conversion of 
collection to map):
{noformat}
BenchmarkMode  CntScoreError  Units
BreakOutBenchmark.largeBreakOut  avgt   20  158.186 ±  7.472  us/op
BreakOutBenchmark.largeToMap avgt   20  166.793 ±  1.670  us/op
BreakOutBenchmark.smallBreakOut  avgt   200.698 ±  0.019  us/op
BreakOutBenchmark.smallToMap avgt   200.758 ±  0.020  us/op
BreakOutBenchmark.tinyBreakOut   avgt   200.018 ±  0.001  us/op
BreakOutBenchmark.tinyToMap  avgt   200.030 ±  0.001  us/op
{noformat}

Map sizes in microbenchmark fixtures: 
* large - 1000 entries
* small - 10 entries  
* tiny - 1 entry

Updated the pull-request: comments added. Only hot code retained.
For example, {{ShortestPaths#addMaps}} is executed more than 60 times for 
simple graph test in {{ShortestPathsSuite.scala}}

> Performance enhancement: eliminate creation of intermediate collections
> ---
>
> Key: SPARK-21491
> URL: https://issues.apache.org/jira/browse/SPARK-21491
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 2.2.0
>Reporter: Iurii Antykhovych
>Priority: Trivial
>
> Simple performance optimization in a few places of GraphX:
> {{Traversable.toMap}} can be replaced with {{collection.breakout}}.
> This would eliminate creation of an intermediate collection of tuples, see
> [Stack Overflow 
> article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections

2017-07-23 Thread Iurii Antykhovych (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097815#comment-16097815
 ] 

Iurii Antykhovych commented on SPARK-21491:
---

Sure, I'll try to write a performance test that depicts the effect.
In cases like this one, not only net performance matters but latencies and GC 
stats as well,
it eliminates excess allocation of short living objects.

> Performance enhancement: eliminate creation of intermediate collections
> ---
>
> Key: SPARK-21491
> URL: https://issues.apache.org/jira/browse/SPARK-21491
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 2.2.0
>Reporter: Iurii Antykhovych
>Priority: Trivial
>
> Simple performance optimization in a few places of GraphX:
> {{Traversable.toMap}} can be replaced with {{collection.breakout}}.
> This would eliminate creation of an intermediate collection of tuples, see
> [Stack Overflow 
> article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections

2017-07-20 Thread Iurii Antykhovych (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095587#comment-16095587
 ] 

Iurii Antykhovych edited comment on SPARK-21491 at 7/21/17 12:30 AM:
-

I searched for all such places in the whole GraphX module.
These are only three occurrences of such tuples-to-map conversion I could boost 
without major refactoring.
Let GraphX be the pilot module of such an optimization)

The changes in `LabelPropagation` and `ShortestPaths` are on a hot execution 
path.
The fix in `PageRank` class is executed once per run, not 'hot' enough. 
Shall I revert it?




was (Author: sereneant):
I searched for all such places in the whole GraphX module.
These are only three occurrences of such tuples-to-map conversion I could boost 
without major refactoring.
Let GraphX be the pilot module of such an optimization)

The only change on a hot path of execution is the code is the one on 
`graphx.lib.ShortestPaths` class.
The rest is executed once per run, not 'hot' enough. 
Shall I revert it (LabelPropagation.scala, PageRank.scala)?



> Performance enhancement: eliminate creation of intermediate collections
> ---
>
> Key: SPARK-21491
> URL: https://issues.apache.org/jira/browse/SPARK-21491
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 2.2.0
>Reporter: Iurii Antykhovych
>Priority: Trivial
>
> Simple performance optimization in a few places of GraphX:
> {{Traversable.toMap}} can be replaced with {{collection.breakout}}.
> This would eliminate creation of an intermediate collection of tuples, see
> [Stack Overflow 
> article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections

2017-07-20 Thread Iurii Antykhovych (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095587#comment-16095587
 ] 

Iurii Antykhovych commented on SPARK-21491:
---

I searched for all such places in the whole GraphX module.
These are only three occurrences of such tuples-to-map conversion I could boost 
without major refactoring.
Let GraphX be the pilot module of such an optimization)

The only change on a hot path of execution is the code is the one on 
`graphx.lib.ShortestPaths` class.
The rest is executed once per run, not 'hot' enough. 
Shall I revert it (LabelPropagation.scala, PageRank.scala)?



> Performance enhancement: eliminate creation of intermediate collections
> ---
>
> Key: SPARK-21491
> URL: https://issues.apache.org/jira/browse/SPARK-21491
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 2.2.0
>Reporter: Iurii Antykhovych
>Priority: Trivial
>
> Simple performance optimization in a few places of GraphX:
> {{Traversable.toMap}} can be replaced with {{collection.breakout}}.
> This would eliminate creation of an intermediate collection of tuples, see
> [Stack Overflow 
> article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections

2017-07-20 Thread Iurii Antykhovych (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095398#comment-16095398
 ] 

Iurii Antykhovych edited comment on SPARK-21491 at 7/20/17 9:22 PM:


This is relevant to all scala versions starting from 2.8, it's in 
`scala.collection.breakOut`.
The problem with {{collection.map(...).toMap}} is in creation of intermediate 
collection of tuples, that is converted to map then;
that leads to performance degradation and excess object allocation.
The price of {{collection.breakOut}} is the code readability, it significantly 
suffers I guess, compared to {{.toMap}} method.



was (Author: sereneant):
This is relevant to all scala versions starting from 2.8, it's in 
`scala.collection.breakOut`.
The problem with `collection.map(...).toMap` is in creation of intermediate 
collection of tuples, that is converted to map then;
that leads to performance degradation and excess object allocation.
The price of `collection.breakOut` is the code readability, it significantly 
suffers I guess, compared to '.toMap' method.


> Performance enhancement: eliminate creation of intermediate collections
> ---
>
> Key: SPARK-21491
> URL: https://issues.apache.org/jira/browse/SPARK-21491
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 2.2.0
>Reporter: Iurii Antykhovych
>Priority: Trivial
>
> Simple performance optimization in a few places of GraphX:
> {{Traversable.toMap}} can be replaced with {{collection.breakout}}.
> This would eliminate creation of an intermediate collection of tuples, see
> [Stack Overflow 
> article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections

2017-07-20 Thread Iurii Antykhovych (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095398#comment-16095398
 ] 

Iurii Antykhovych commented on SPARK-21491:
---

This is relevant to all scala versions starting from 2.8, it's in 
`scala.collection.breakOut`.
The problem with `collection.map(...).toMap` is in creation of intermediate 
collection of tuples, that is converted to map then;
that leads to performance degradation and excess object allocation.
The price of `collection.breakOut` is the code readability, it significantly 
suffers I guess, compared to '.toMap' method.


> Performance enhancement: eliminate creation of intermediate collections
> ---
>
> Key: SPARK-21491
> URL: https://issues.apache.org/jira/browse/SPARK-21491
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 2.2.0
>Reporter: Iurii Antykhovych
>Priority: Trivial
>
> Simple performance optimization in a few places of GraphX:
> {{Traversable.toMap}} can be replaced with {{collection.breakout}}.
> This would eliminate creation of an intermediate collection of tuples, see
> [Stack Overflow 
> article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections

2017-07-20 Thread Iurii Antykhovych (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iurii Antykhovych updated SPARK-21491:
--
Priority: Trivial  (was: Minor)

> Performance enhancement: eliminate creation of intermediate collections
> ---
>
> Key: SPARK-21491
> URL: https://issues.apache.org/jira/browse/SPARK-21491
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 2.2.0
>Reporter: Iurii Antykhovych
>Priority: Trivial
>
> Simple performance optimization in a few places of GraphX:
> {{Traversable.toMap}} can be replaced with {{collection.breakout}}.
> This would eliminate creation of an intermediate collection of tuples, see
> [Stack Overflow 
> article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21491) Performance enhancement: eliminate creation of intermediate collections

2017-07-20 Thread Iurii Antykhovych (JIRA)
Iurii Antykhovych created SPARK-21491:
-

 Summary: Performance enhancement: eliminate creation of 
intermediate collections
 Key: SPARK-21491
 URL: https://issues.apache.org/jira/browse/SPARK-21491
 Project: Spark
  Issue Type: Improvement
  Components: GraphX
Affects Versions: 2.2.0
Reporter: Iurii Antykhovych
Priority: Minor


Simple performance optimization in a few places of GraphX:
{{Traversable.toMap}} can be replaced with {{collection.breakout}}.
This would eliminate creation of an intermediate collection of tuples, see
[Stack Overflow 
article|https://stackoverflow.com/questions/1715681/scala-2-8-breakout]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org