[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984190#comment-13984190 ] Sylvain Lebresne commented on CASSANDRA-6934: - I've pushed an additional commit on top of that last branch at https://github.com/pcmanus/cassandra/commits/6934-v2 that addresses the following remaining remarks/nits: * ColumnToCollectionType should not be 'byte order comparable' since it's compare() != compareCollectionMembers(). It follows that the compare() override in CompoundSparseCellNameType.WithCollection don't have to bother checking isByteOrderComparable since it can't be. * I believe the previous Composite.EOC.prefixComparisonResult was correct, but the new prefixCompare() is not. That is, if s1 is a strict prefix of s2, we should not take s2.eoc() into account no matter what (the EOC basically applies to the end component only). So I reverted back to the previous version (with an additional comment). * DynamicCompositeType should never be 'byte order comparable'. * We can simplify AbstractSimpleCellNameType/SimpleCType compare() further because we don't have EOC for simple names (or rather, they're always NONE -- the +/-Inf patch would have changed that, but since we don't do it). As a side note, I'd really rather avoid using {{|}} (instead of {{||}}) for conditions: the one line of code it saves is not worth the unusual syntax imo (where unusual means 'for the Cassandra code base'). * There was unrelated changes to CassandraDaemon for some reason, and a unused import in AbstractCompoundCellNameType. I removed those. Can you share your benchmark/benchmark results for this for the records? One last nit: mind taking a shot at replacing the TODO: Could use some comment on why that's a useful optimization to do comment in NodeBuilder.update() with an actual explanation? Optimise Byte + CellName comparisons Key: CASSANDRA-6934 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1.0 AbstractCompositeType is called a lot, so deserves some heavy optimisation. SimpleCellNameType can be optimised easily, but should explore other potential optimisations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984212#comment-13984212 ] Benedict commented on CASSANDRA-6934: - Thanks bq. As a side note, I'd really rather avoid using | (instead of ||) for conditions: the one line of code it saves is not worth the unusual syntax imo (where unusual means 'for the Cassandra code base'). Whilst I don't mind too much either way on this one, it was less to save a line of code and more to save a branch prediction slot in the CPU. It's a pretty micro-optimisation so unlikely to be measurable in our current state, but the branch prediction is almost certainly useless here and we can predict more accurately (which is all \| really does instead of \|\|). FTR, there are some places where I will definitely want to use it in future (e.g. associative cache) to permit better pipelining and optimisation. Otherwise changes LGTM. I've pushed a [commented branch|https://github.com/belliottsmith/cassandra/tree/6934-v2], which also has a warning on the optimisation in SimpleCType and AbstractSimpleCellNameType on the optimisation that depends on ColumnSlice start/finish semantics Optimise Byte + CellName comparisons Key: CASSANDRA-6934 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1.0 AbstractCompositeType is called a lot, so deserves some heavy optimisation. SimpleCellNameType can be optimised easily, but should explore other potential optimisations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984213#comment-13984213 ] Benedict commented on CASSANDRA-6934: - As to benchmarks, CASSANDRA-6553 are the main ones I was using to justify this; see the green line on [this graph|http://riptano.github.io/cassandra_performance/graph/graph.html?stats=6553.uber_contention_no_cache_CL_one_4_3.jsonmetric=op_rateoperation=counterreadsmoothing=1xmin=0xmax=258.61ymin=0ymax=21794.3] The downside of course is that these numbers are for simple sparse types only - I need to update stress to support clustering columns (which I plan to do soon) to create more general graphs for all of the optimisations. Optimise Byte + CellName comparisons Key: CASSANDRA-6934 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1.0 AbstractCompositeType is called a lot, so deserves some heavy optimisation. SimpleCellNameType can be optimised easily, but should explore other potential optimisations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981211#comment-13981211 ] Benedict commented on CASSANDRA-6934: - I've pushed a new branch [here|https://github.com/belliottsmith/cassandra/commits/6934-reselected] This branch does away with all of the changes to start/end semantics when you want to select everything. There were still more things to deal with, and the patch was getting much too unwieldy for the benefit that would have conferred. I'd have liked to clean up the semantics, but I think that's a problem for another day/ticket since we can get 99% of our benefit without them. I think I've addressed all your concerns, with the following notes: # Instead of ordinal() I just use eoc.compareTo() - the same concern applies, but given ordering of an enum is part of its definition, it shouldn't be a risk (I would -1 any change that ordered them differently to their natural ordering) # Not sure which newline you're referring to, scoured my patch to find a constructor with a field I'd added directly above and couldn't see one # There were a couple of bugs in your rejig of NodeBuilder.update, but fixed them and also simplified it a bit further by extracting the POSITIVE_INFINITY behaviour into a separate method that is explicitly invoked when necessary # Further optimised away the isStatic test in applicable types, as I'm pretty sure the sort order will remain unchanged. # Eliminated all of the null checks in types, and simplified any empty checks # Modified compare() behaviours to support EOCs being set on both types, as it _can_ happen (and if both are END it was broken) # Modified SimpleCType + AbstractSimpleCellNameType compares to support both parameters being empty Optimise Byte + CellName comparisons Key: CASSANDRA-6934 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1.0 AbstractCompositeType is called a lot, so deserves some heavy optimisation. SimpleCellNameType can be optimised easily, but should explore other potential optimisations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978488#comment-13978488 ] Sylvain Lebresne commented on CASSANDRA-6934: - h6. 1st patch (NEG_INF/POS_INF): * We need to make sure that we don't use Composites.EMPTY in slices (at least for the finish bound), but that's still done for thrift and CQL2 due to the fromByteBuffer calls in CassandraServer#toInternalFilter, cql.QueryProcess#{getSlice,filterFromSelect} and ThriftValidation#asIFilter. * When the SliceQueryFilter is reversed, a slice start bound should compare after it's finish, so we should use slice(POS_INF, NEG_INF) in that case. So I think all the POS_INF in SelectStatement should be NEG_INF, and more generally mind reversed in every case (in the thrift/CQL2 methods above, in SuperColumns, in SystemKeyspace, ...). I think that also mean we need a ColumnSlice.ALL_COLUMNS_REVERSED, or at least check we never use ALL_COLUMNS/ALL_COLUMNS_ARRAY in reversed slice (haven't checked). * I'd feel much more comfortable we're not missing some case if we added an assertion/check in SliceQueryFilter that we never use Composites.EMPTY as finish bound (or start bound if reversed). Ideally we'd even refuse it for start bound (finish when reversed), but I think that would require to fix a bunch of place where we don't use Composite#start() (which is somewhat wrong in the first place but well) for the start bound (there's at least SelectStatement#buildBound but likely others). * In ColumnSlice#isAlwaysEmpty, I don't think we need to test {{!start.isEmpty()}} anymore (and since the method handle both forward and reversed slices, testing only one of {{start}} or {{finish}} actually feels wrongs). * Nit: In AbstractCType.compare, I'd remove only one can have an EOC because I think the code works even if that's not the case and I'd rather not make that an official assumption (it's probably always true, but there is no point relying on it). * Nit: many comparators test their arguments for {{null}} which I'm pretty sure is not necessary (it better be since a handful of comparator don't do it). Could be nice to remove this from all comparator while we're removing useless checks (or let say at least for consistency). * Nit: A small comment on what prefixComparisonResult is would be useful, it's not obvious without going looking for how it's used. * Nit: Slightly uncomfortable relying on EOC enum ordinal() (I'll grant that there is little reason to ever reorder than enum, but still feel like a bad habit). I'd prefer adding a specific field. * Nit: Unused imports in SimpleDenseCellNameType/SimpleSparseCellnameType * Nit: No newline between a field and a ctor is not very consistent with the code base :) h6. 2nd patch (BTree change): Looks good to me, though I've found it harder that needs to in NodeBuilder.update() to convince myself that POSITIVE_INFINITY wasn't ending up in a call to addNewKey/replaceNewKey. I'd prefer special casing it entirely: it's a very small code duplication for imo more clarity. And in general I've found the beginning of that method a tad hard to follow ('set i negative, so that found is false and then negate back i right away' is a tad contrived, as well as piecing together when owns is false exactly). Anyway, I've pushed a small commit with a suggestion (which I think is equivalent but feels clearer) at https://github.com/pcmanus/cassandra/commits/6934. h6. 3rd patch (isByteOrderComparable): * Int32Type, LongType and TimestampType are not byteOrderComparable if I'm not mistaken (because they are signed). I don't CompositeType is either unless it has only one element, even if all it's element are byteOrderComparable (because elements are not guaranteed to have the same length). * Can't we move the isByteOrderComparable field in AbstractCType and use it in it's compare, which I think remove the need for adding one in AbstractCompoundCellNameType. * In CompoundSparseCellNameType ctor, the collection type isn't taken into account, we should iterate over {{fullType}} (which includes columnNameType too as it happens). * Nit: in the added compare, having {{i}} declared outside of the loops looks odd (style-wise) to me. Are you attached to that? I'll note that the branch breaks a lot of the cql dtests (cql_tests.py), so probably a good idea to check those to make some obvious case hasn't been missed. Optimise Byte + CellName comparisons Key: CASSANDRA-6934 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1.0 AbstractCompositeType is called a lot, so deserves some heavy optimisation.
[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960445#comment-13960445 ] Benedict commented on CASSANDRA-6934: - Initial patch available [here|https://github.com/belliottsmith/cassandra/tree/6934] Optimises to some extent the various compare() implementations for AbstractCType, and at the same time slightly optimises compare in BTree to avoid unwrapping the special +/-Inf values except when absolutely necessary, and to perform (potentially) one fewer comparison per update() when not updating identical values, and to not perform a wasteful start-of-range compare() when _not_ inserting. There are further performance improvements that can be made to AbstractCType.compare() and its inheritors, but they're a little more invasive, and since CASSANDRA-6694 will entail some optimisation work to make comparisons less expensive, I will wait until then to do anything more. I need to make some tweaks to stress so I can properly test the impact of this patch on CQL, as there's no easy way to perform inserts of random columns. As shown with CASSANDRA-6553, there is a marked improvement for simple composites, and some quick and dirty benchmarking on my local box for thrift columns with only the general purpose improvements showed a lesser but still marked impact. Optimise Byte + CellName comparisons Key: CASSANDRA-6934 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 AbstractCompositeType is called a lot, so deserves some heavy optimisation. SimpleCellNameType can be optimised easily, but should explore other potential optimisations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960546#comment-13960546 ] Benedict commented on CASSANDRA-6934: - I decided to put in some of the extra optimisations now after all - whenever the clustering/ordering components of a type all support unsigned comparison we now avoid almost all virtual method calls. This gives a 10-20% bump in some very quick and dirty tests on my box versus the prior optimisation, and probably has a larger effect on clustering columns (which still can't easily be benchmarked, but I will fix that next week) Optimise Byte + CellName comparisons Key: CASSANDRA-6934 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934 Project: Cassandra Issue Type: Improvement Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 2.1 AbstractCompositeType is called a lot, so deserves some heavy optimisation. SimpleCellNameType can be optimised easily, but should explore other potential optimisations. -- This message was sent by Atlassian JIRA (v6.2#6252)