[jira] [Commented] (CASSANDRA-12153) RestrictionSet.hasIN() is slow

2016-07-13 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375697#comment-15375697
 ] 

Tyler Hobbs commented on CASSANDRA-12153:
-

+1

> RestrictionSet.hasIN() is slow
> --
>
> Key: CASSANDRA-12153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Tyler Hobbs
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that 
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time.  It 
> looks like it's mostly slow because it creates a new LinkedHashSet (which is 
> expensive to init) and uses streams.  This can be replaced with a simple for 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12153) RestrictionSet.hasIN() is slow

2016-07-13 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375573#comment-15375573
 ] 

Benjamin Lerer commented on CASSANDRA-12153:


CI results:
||[utest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-12153-trunk-testall/]||[dtests|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-12153-trunk-dtest/]||

> RestrictionSet.hasIN() is slow
> --
>
> Key: CASSANDRA-12153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Tyler Hobbs
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that 
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time.  It 
> looks like it's mostly slow because it creates a new LinkedHashSet (which is 
> expensive to init) and uses streams.  This can be replaced with a simple for 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12153) RestrictionSet.hasIN() is slow

2016-07-12 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373170#comment-15373170
 ] 

Tyler Hobbs commented on CASSANDRA-12153:
-

Nice!  I'm +1 on your code changes.  Can you start a CI test run?

bq. PS: To be honest, I was not expecting that the use of new LinkHashSet, 
stream() and lambdas was so bad from a performance point of view. 

I was surprised by how much more expensive it was as well.

> RestrictionSet.hasIN() is slow
> --
>
> Key: CASSANDRA-12153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Tyler Hobbs
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that 
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time.  It 
> looks like it's mostly slow because it creates a new LinkedHashSet (which is 
> expensive to init) and uses streams.  This can be replaced with a simple for 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12153) RestrictionSet.hasIN() is slow

2016-07-11 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370492#comment-15370492
 ] 

Benjamin Lerer commented on CASSANDRA-12153:


I profiled trunk with Mission Control with the 
{{-XX:+UnlockDiagnosticVMOptions}} and {{-XX:+DebugNonSafepoints}} options.
To be sure that the problem was not hidden by the results processing part of 
the code, I used a program that fired queries on an empty table.
On my initial run {{RestrictionSet.hasIn()}} was shown second on the hot 
methods list with 3.14% of the time in the call tree. Looking at the other hot 
methods, I found out that {{HashMap.putVal}}, {{HashMap.resize()}}, and 
{{HashSet.init()}} were pretty high on the list.
In the allocation tab: {{HashMap$Node[]}}, {{LinkedHashMap}} and 
{{LinkedHashMap$Entry}} were representing a total pressure of 11,81%. All those 
were due to the {{LinkedHashSet}} creation in {{RestrictionSet.stream()}} and 
{{RestrictionSet.iterator()}} methods.

Removing {{RestrictionSet.stream()}} and the use of lambdas by simple loops 
using {{RestrictionSet.iterator()}} reduced {{RestrictionSet.hasIn()}} to 0.61% 
in  call tree. 
Replacing new {{LinkedHashSet()}} by a custom filtering iterator brought down 
{{RestrictionSet.hasIn()}} to 0.10% and removed all the allocations caused by 
the use of {{LinkedHashSet}}.

As we can only have duplicates when multiple column restrictions are used, I 
added a variable to keep track of it and only filter duplicates when needed. 
That has a strong effect everywhere where {{RestrictionSet.iterator()}} was 
used and changed the complete picture. {{RestrictionSet.hasIn()}} came back up 
to 0.51% just because other part of the code were faster :-).

I have pushed my experimental branch 
[here|https://github.com/apache/cassandra/compare/trunk...blerer:12153-trunk].
Waiting for your feedbacks.

PS: To be honest, I was not expecting that the use of new {{LinkHashSet}}, 
{{stream()}} and lambdas was so bad from a performance point of view. 


> RestrictionSet.hasIN() is slow
> --
>
> Key: CASSANDRA-12153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Tyler Hobbs
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that 
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time.  It 
> looks like it's mostly slow because it creates a new LinkedHashSet (which is 
> expensive to init) and uses streams.  This can be replaced with a simple for 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12153) RestrictionSet.hasIN() is slow

2016-07-08 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368576#comment-15368576
 ] 

Tyler Hobbs commented on CASSANDRA-12153:
-

bq. If we are really looking for speed, I think that we should have some field 
variables for hasIN, hasEq

When profiling with the patch applied, {{hasIN()}} drops to ~0.02% of the time, 
so I think that's fast enough.

bq. Any reason not to do the same with...

Seems like we might as well do that too.  Those didn't show up for the simple 
query I was profiling, but they might be in the hot path for some other types 
of queries.

> RestrictionSet.hasIN() is slow
> --
>
> Key: CASSANDRA-12153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Tyler Hobbs
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that 
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time.  It 
> looks like it's mostly slow because it creates a new LinkedHashSet (which is 
> expensive to init) and uses streams.  This can be replaced with a simple for 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12153) RestrictionSet.hasIN() is slow

2016-07-08 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368229#comment-15368229
 ] 

Benjamin Lerer commented on CASSANDRA-12153:


I am the one to blame for the {{stream()}} method. My main concern, when I 
created it, was just to simplify the code.
If we are really looking for speed, I think that we should have some field 
variables for {{hasIN}}, {{hasEq}} ...
It will move the computation at preparation time rather than at execution time 
and will perform it only once (if my memory is correct {{hasIN()}} is called 
multiple times).

bq. Then remove RestrictionSet stream() to discourage this from being 
reintroduced?

There is 2 problems associated to the {{stream()}} method. The creation of the 
{{LinkedHashSet}} which is used to remove the duplicates 
{{MultiColumnRestrictions}} and the Lambda expressions.
The {{{LinkedHashSet}} is unfortunatly also created in {{iterator()}} so 
removing {{stream()} will not solve that problem. 
I think, we could keep track of the fact that multicolumn restrictions are used 
or not and avoid creating the {{LinkedHashSet}} if they are not used.
I have no idea of the cost associated to the use of the lambda.

> RestrictionSet.hasIN() is slow
> --
>
> Key: CASSANDRA-12153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Tyler Hobbs
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that 
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time.  It 
> looks like it's mostly slow because it creates a new LinkedHashSet (which is 
> expensive to init) and uses streams.  This can be replaced with a simple for 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12153) RestrictionSet.hasIN() is slow

2016-07-08 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368069#comment-15368069
 ] 

Jeff Jirsa commented on CASSANDRA-12153:


Any reason not to do the same with:

https://github.com/thobbs/cassandra/blob/194b56b3c604ed4d6a5a56dcca8c613be13cb8ee/src/java/org/apache/cassandra/cql3/restrictions/ClusteringColumnRestrictions.java#L155

https://github.com/thobbs/cassandra/blob/194b56b3c604ed4d6a5a56dcca8c613be13cb8ee/src/java/org/apache/cassandra/cql3/restrictions/ClusteringColumnRestrictions.java#L166

Then remove {{RestrictionSet stream()}} to discourage this from being 
reintroduced?


> RestrictionSet.hasIN() is slow
> --
>
> Key: CASSANDRA-12153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Tyler Hobbs
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that 
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time.  It 
> looks like it's mostly slow because it creates a new LinkedHashSet (which is 
> expensive to init) and uses streams.  This can be replaced with a simple for 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)