[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons

2014-04-29 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984190#comment-13984190
 ] 

Sylvain Lebresne commented on CASSANDRA-6934:
-

I've pushed an additional commit on top of that last branch at 
https://github.com/pcmanus/cassandra/commits/6934-v2 that addresses the 
following remaining remarks/nits:
* ColumnToCollectionType should not be 'byte order comparable' since it's 
compare() != compareCollectionMembers(). It follows that the compare() override 
in CompoundSparseCellNameType.WithCollection don't have to bother checking 
isByteOrderComparable since it can't be.
* I believe the previous Composite.EOC.prefixComparisonResult was correct, but 
the new prefixCompare() is not. That is, if s1 is a strict prefix of s2, we 
should not take s2.eoc() into account no matter what (the EOC basically applies 
to the end component only). So I reverted back to the previous version (with an 
additional comment).
* DynamicCompositeType should never be 'byte order comparable'.
* We can simplify AbstractSimpleCellNameType/SimpleCType compare() further 
because we don't have EOC for simple names (or rather, they're always NONE -- 
the +/-Inf patch would have changed that, but since we don't do it). As a side 
note, I'd really rather avoid using {{|}} (instead of {{||}}) for conditions: 
the one line of code it saves is not worth the unusual syntax imo (where 
unusual means 'for the Cassandra code base').
* There was unrelated changes to CassandraDaemon for some reason, and a unused 
import in AbstractCompoundCellNameType. I removed those.

Can you share your benchmark/benchmark results for this for the records?

One last nit: mind taking a shot at replacing the TODO: Could use some comment 
on why that's a useful optimization to do comment in NodeBuilder.update() with 
an actual explanation?


 Optimise Byte + CellName comparisons
 

 Key: CASSANDRA-6934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1.0


 AbstractCompositeType is called a lot, so deserves some heavy optimisation. 
 SimpleCellNameType can be optimised easily, but should explore other 
 potential optimisations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons

2014-04-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984212#comment-13984212
 ] 

Benedict commented on CASSANDRA-6934:
-

Thanks

bq. As a side note, I'd really rather avoid using | (instead of ||) for 
conditions: the one line of code it saves is not worth the unusual syntax imo 
(where unusual means 'for the Cassandra code base').

Whilst I don't mind too much either way on this one, it was less to save a line 
of code and more to save a branch prediction slot in the CPU. It's a pretty 
micro-optimisation so unlikely to be measurable in our current state, but the 
branch prediction is almost certainly useless here and we can predict more 
accurately (which is all \| really does instead of \|\|). FTR, there are some 
places where I will definitely want to use it in future (e.g. associative 
cache) to permit better pipelining and optimisation.

Otherwise changes LGTM. I've pushed a [commented 
branch|https://github.com/belliottsmith/cassandra/tree/6934-v2], which also has 
a warning on the optimisation in SimpleCType and AbstractSimpleCellNameType on 
the optimisation that depends on ColumnSlice start/finish semantics

 Optimise Byte + CellName comparisons
 

 Key: CASSANDRA-6934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1.0


 AbstractCompositeType is called a lot, so deserves some heavy optimisation. 
 SimpleCellNameType can be optimised easily, but should explore other 
 potential optimisations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons

2014-04-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984213#comment-13984213
 ] 

Benedict commented on CASSANDRA-6934:
-

As to benchmarks, CASSANDRA-6553 are the main ones I was using to justify this; 
see the green line on [this 
graph|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.uber_contention_no_cache_CL_one_4_3.jsonmetric=op_rateoperation=counterreadsmoothing=1xmin=0xmax=258.61ymin=0ymax=21794.3]

The downside of course is that these numbers are for simple sparse types only - 
I need to update stress to support clustering columns (which I plan to do soon) 
to create more general graphs for all of the optimisations.

 Optimise Byte + CellName comparisons
 

 Key: CASSANDRA-6934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1.0


 AbstractCompositeType is called a lot, so deserves some heavy optimisation. 
 SimpleCellNameType can be optimised easily, but should explore other 
 potential optimisations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons

2014-04-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981211#comment-13981211
 ] 

Benedict commented on CASSANDRA-6934:
-

I've pushed a new branch 
[here|https://github.com/belliottsmith/cassandra/commits/6934-reselected]

This branch does away with all of the changes to start/end semantics when you 
want to select everything. There were still more things to deal with, and the 
patch was getting much too unwieldy for the benefit that would have conferred. 
I'd have liked to clean up the semantics, but I think that's a problem for 
another day/ticket since we can get 99% of our benefit without them.

I think I've addressed all your concerns, with the following notes:
# Instead of ordinal() I just use eoc.compareTo() - the same concern applies, 
but given ordering of an enum is part of its definition, it shouldn't be a risk 
(I would -1 any change that ordered them differently to their natural ordering)
# Not sure which newline you're referring to, scoured my patch to find a 
constructor with a field I'd added directly above and couldn't see one
# There were a couple of bugs in your rejig of NodeBuilder.update, but fixed 
them and also simplified it a bit further by extracting the POSITIVE_INFINITY 
behaviour into a separate method that is explicitly invoked when necessary
# Further optimised away the isStatic test in applicable types, as I'm pretty 
sure the sort order will remain unchanged.
# Eliminated all of the null checks in types, and simplified any empty checks
# Modified compare() behaviours to support EOCs being set on both types, as it 
_can_ happen (and if both are END it was broken)
# Modified SimpleCType + AbstractSimpleCellNameType compares to support both 
parameters being empty


 Optimise Byte + CellName comparisons
 

 Key: CASSANDRA-6934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1.0


 AbstractCompositeType is called a lot, so deserves some heavy optimisation. 
 SimpleCellNameType can be optimised easily, but should explore other 
 potential optimisations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons

2014-04-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978488#comment-13978488
 ] 

Sylvain Lebresne commented on CASSANDRA-6934:
-

h6. 1st patch (NEG_INF/POS_INF):

* We need to make sure that we don't use Composites.EMPTY in slices (at least 
for the finish bound), but that's still done for thrift and CQL2 due to the 
fromByteBuffer calls in CassandraServer#toInternalFilter, 
cql.QueryProcess#{getSlice,filterFromSelect} and ThriftValidation#asIFilter.
* When the SliceQueryFilter is reversed, a slice start bound should compare 
after it's finish, so we should use slice(POS_INF, NEG_INF) in that case. So I 
think all the POS_INF in SelectStatement should be NEG_INF, and more generally 
mind reversed in every case (in the thrift/CQL2 methods above, in SuperColumns, 
in SystemKeyspace, ...).  I think that also mean we need a 
ColumnSlice.ALL_COLUMNS_REVERSED, or at least check we never use 
ALL_COLUMNS/ALL_COLUMNS_ARRAY in reversed slice (haven't checked).
* I'd feel much more comfortable we're not missing some case if we added an 
assertion/check in SliceQueryFilter that we never use Composites.EMPTY as 
finish bound (or start bound if reversed). Ideally we'd even refuse it for 
start bound (finish when reversed), but I think that would require to fix a 
bunch of place where we don't use Composite#start() (which is somewhat wrong in 
the first place but well) for the start bound (there's at least 
SelectStatement#buildBound but likely others).
* In ColumnSlice#isAlwaysEmpty, I don't think we need to test 
{{!start.isEmpty()}} anymore (and since the method handle both forward and 
reversed slices, testing only one of {{start}} or {{finish}} actually feels 
wrongs).
* Nit: In AbstractCType.compare, I'd remove only one can have an EOC because 
I think the code works even if that's not the case and I'd rather not make that 
an official assumption (it's probably always true, but there is no point 
relying on it).
* Nit: many comparators test their arguments for {{null}} which I'm pretty sure 
is not necessary (it better be since a handful of comparator don't do it). 
Could be nice to remove this from all comparator while we're removing useless 
checks (or let say at least for consistency).
* Nit: A small comment on what prefixComparisonResult is would be useful, it's 
not obvious without going looking for how it's used.
* Nit: Slightly uncomfortable relying on EOC enum ordinal() (I'll grant that 
there is little reason to ever reorder than enum, but still feel like a bad 
habit). I'd prefer adding a specific field.
* Nit: Unused imports in SimpleDenseCellNameType/SimpleSparseCellnameType
* Nit: No newline between a field and a ctor is not very consistent with the 
code base :)


h6. 2nd patch (BTree change):

Looks good to me, though I've found it harder that needs to in 
NodeBuilder.update() to convince myself that POSITIVE_INFINITY wasn't ending up 
in a call to addNewKey/replaceNewKey. I'd prefer special casing it entirely: 
it's a very small code duplication for imo more clarity. And in general I've 
found the beginning of that method a tad hard to follow ('set i negative, so 
that found is false and then negate back i right away' is a tad contrived, as 
well as piecing together when owns is false exactly). Anyway, I've pushed a 
small commit with a suggestion (which I think is equivalent but feels 
clearer) at https://github.com/pcmanus/cassandra/commits/6934.

h6. 3rd patch (isByteOrderComparable):

* Int32Type, LongType and TimestampType are not byteOrderComparable if I'm not 
mistaken (because they are signed). I don't CompositeType is either unless it 
has only one element, even if all it's element are byteOrderComparable (because 
elements are not guaranteed to have the same length).
* Can't we move the isByteOrderComparable field in AbstractCType and use it in 
it's compare, which I think remove the need for adding one in 
AbstractCompoundCellNameType.
* In CompoundSparseCellNameType ctor, the collection type isn't taken into 
account, we should iterate over {{fullType}} (which includes columnNameType too 
as it happens).

* Nit: in the added compare, having {{i}} declared outside of the loops looks 
odd (style-wise) to me. Are you attached to that?


I'll note that the branch breaks a lot of the cql dtests (cql_tests.py), so 
probably a good idea to check those to make some obvious case hasn't been 
missed.


 Optimise Byte + CellName comparisons
 

 Key: CASSANDRA-6934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1.0


 AbstractCompositeType is called a lot, so deserves some heavy optimisation. 
 

[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons

2014-04-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960445#comment-13960445
 ] 

Benedict commented on CASSANDRA-6934:
-

Initial patch available 
[here|https://github.com/belliottsmith/cassandra/tree/6934]

Optimises to some extent the various compare() implementations for 
AbstractCType, and at the same time slightly optimises compare in BTree to 
avoid unwrapping the special +/-Inf values except when absolutely necessary, 
and to perform (potentially) one fewer comparison per update() when not 
updating identical values, and to not perform a wasteful start-of-range 
compare() when _not_ inserting.

There are further performance improvements that can be made to 
AbstractCType.compare() and its inheritors, but they're a little more invasive, 
and since CASSANDRA-6694 will entail some optimisation work to make comparisons 
less expensive, I will wait until then to do anything more.

I need to make some tweaks to stress so I can properly test the impact of this 
patch on CQL, as there's no easy way to perform inserts of random columns. As 
shown with CASSANDRA-6553, there is a marked improvement for simple composites, 
and some quick and dirty benchmarking on my local box for thrift columns with 
only the general purpose improvements showed a lesser but still marked impact.



 Optimise Byte + CellName comparisons
 

 Key: CASSANDRA-6934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1


 AbstractCompositeType is called a lot, so deserves some heavy optimisation. 
 SimpleCellNameType can be optimised easily, but should explore other 
 potential optimisations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons

2014-04-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960546#comment-13960546
 ] 

Benedict commented on CASSANDRA-6934:
-

I decided to put in some of the extra optimisations now after all - whenever 
the clustering/ordering components of a type all support unsigned comparison we 
now avoid almost all virtual method calls. This gives a 10-20% bump in some 
very quick and dirty tests on my box versus the prior optimisation, and 
probably has a larger effect on clustering columns (which still can't easily be 
benchmarked, but I will fix that next week)

 Optimise Byte + CellName comparisons
 

 Key: CASSANDRA-6934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1


 AbstractCompositeType is called a lot, so deserves some heavy optimisation. 
 SimpleCellNameType can be optimised easily, but should explore other 
 potential optimisations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)