[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560967#comment-17560967 ] Brandon Williams commented on CASSANDRA-6936: - Yes. > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Benedict Elliott Smith >Assignee: Branimir Lambov >Priority: Normal > Labels: compaction, performance > Fix For: 4.2 > > Time Spent: 25h > Remaining Estimate: 0h > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560942#comment-17560942 ] Ivan Senic commented on CASSANDRA-6936: --- Do I understand good that this will be first available in the `4.2` release that is scheduled to go out in a year? > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Benedict Elliott Smith >Assignee: Branimir Lambov >Priority: Normal > Labels: compaction, performance > Fix For: 4.2 > > Time Spent: 25h > Remaining Estimate: 0h > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17559272#comment-17559272 ] Caleb Rackliffe commented on CASSANDRA-6936: +1 > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Benedict Elliott Smith >Assignee: Branimir Lambov >Priority: Normal > Labels: compaction, performance > Fix For: 4.x > > Time Spent: 24.5h > Remaining Estimate: 0h > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557597#comment-17557597 ] Caleb Rackliffe commented on CASSANDRA-6936: btw, not sure anything I've contributed here is co-author worthy, but if you do, I'd use this e-mail... {noformat} Co-authored-by: Caleb Rackliffe {noformat} > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Benedict Elliott Smith >Assignee: Branimir Lambov >Priority: Normal > Labels: compaction, performance > Fix For: 4.x > > Time Spent: 24h > Remaining Estimate: 0h > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557238#comment-17557238 ] Caleb Rackliffe commented on CASSANDRA-6936: My pass at review is complete, and all comments/nits (none of which amount to serious pushback) are inline in the PR. I'll officially +1 once those items are resolved (or reasonably ignored). Nice work! > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Benedict Elliott Smith >Assignee: Branimir Lambov >Priority: Normal > Labels: compaction, performance > Fix For: 4.x > > Time Spent: 23h 10m > Remaining Estimate: 0h > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471990#comment-17471990 ] Benedict Elliott Smith commented on CASSANDRA-6936: --- I trust you to have made good choices, [~blambov]. I'll see if I can find some time to get an overview the work for some high level feedback about the serialisation format, but I haven't thought about this problem domain in a long time so my consideration may be less valuable than you imagine. I won't likely have the time to perform a full review either way. > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Benedict Elliott Smith >Assignee: Branimir Lambov >Priority: Normal > Labels: compaction, performance > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471978#comment-17471978 ] Branimir Lambov commented on CASSANDRA-6936: [~benedict], I think it would be best if you review this, so that we can incorporate any ideas you may have into the encoding while it still isn't used in any persisted data and we can freely modify it. Do you think you will have the time to do so? > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Benedict Elliott Smith >Assignee: Branimir Lambov >Priority: Normal > Labels: compaction, performance > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17435382#comment-17435382 ] Benedict Elliott Smith commented on CASSANDRA-6936: --- Another blast from the past. I'm looking forward to seeing this land. > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Benedict Elliott Smith >Assignee: Branimir Lambov >Priority: Normal > Labels: compaction, performance > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198353#comment-16198353 ] Alex Petrov commented on CASSANDRA-6936: Nick Dimiduk also worked on byte-ordered types in HBase: [HBASE-8201|https://issues.apache.org/jira/browse/HBASE-8201] and [OrderedBytes|https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/OrderedBytes.java]. > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement >Reporter: Benedict >Assignee: Branimir Lambov > Labels: compaction, performance > Fix For: 4.x > > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188967#comment-16188967 ] Jeff Jirsa commented on CASSANDRA-6936: --- Realize this ticket is mostly idle, but CASSANDRA-13553 has a similar need (byte order comparable types to map into rocksdb storage model) and in that Dikang found this, whic seems interesting: https://github.com/ndimiduk/orderly > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement >Reporter: Benedict >Assignee: Branimir Lambov > Labels: compaction, performance > Fix For: 4.x > > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348369#comment-15348369 ] Jonathan Ellis commented on CASSANDRA-6936: --- My understanding is that compaction is largely cpu-bound on cell comparisons. I'd like to see a prototype of what kind of benefits we can get there, e.g. using blob types (which are already byte-comparable). > Make all byte representations of types comparable by their unsigned byte > representation only > > > Key: CASSANDRA-6936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 > Project: Cassandra > Issue Type: Improvement >Reporter: Benedict >Assignee: Branimir Lambov > Labels: compaction, performance > Fix For: 4.x > > > This could be a painful change, but is necessary for implementing a > trie-based index, and settling for less would be suboptimal; it also should > make comparisons cheaper all-round, and since comparison operations are > pretty much the majority of C*'s business, this should be easily felt (see > CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with > major performance impacts). No copying/special casing/slicing should mean > fewer opportunities to introduce performance regressions as well. > Since I have slated for 3.0 a lot of non-backwards-compatible sstable > changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653700#comment-14653700 ] Aleksey Yeschenko commented on CASSANDRA-6936: -- [~jbellis] see CASSANDRA-9901 Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: compaction, performance Fix For: 3.x This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524461#comment-14524461 ] Jonathan Ellis commented on CASSANDRA-6936: --- [~iamaleksey] I think you had some comments on this from NGCC discussion? Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.x This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375771#comment-14375771 ] Benedict commented on CASSANDRA-6936: - So, the more often I think of future storage changes, the more this becomes a pain and a headache. I would like to reassess the possibility of making everything byte-order comparable. How widely deployed are custom AbstractType implementations where the comparator makes a difference? Because it seems dropping support for just this (and having the user define an ASC/DESC order on the fields for maps/sets/tables within a UDT instead, for instance) would give us the ability to deliver it universally. As far as I am aware, we're the only database that hamstrings ourselves with this limitation (or permittance). I would like to byte-prefix compress our index file (because as standard it takes up a significant proportion of the data it indexes unnecessarily, inflating the number of disk accesses and reducing the effective capacity of the key cache), but this isn't possible without a majority of fields supporting this. Even then, if we have special casing for those that do not, this is a headache and code complexity. It also pollutes the icache and branch predictors (not just with the inflation of variances, but in the logic to select between them). This is not to be understated: it's surprising how many icache misses you can get on a simple in-memory stress workload, which is underrepresentative of the variation for a normal deployment. vtune rates our utilisation of chips pretty poorly, and this is a major contributor. The same is true for optimising merges (we get significantly better algorithmic complexity with much fewer changes if the comparable fields are byte-prefix comparable), and for compressing clustering columns in data files on disk. I am certain I will encounter more scenarios before long. I think the cumulative performance wins here would be really _very_ significant, for all workloads (compaction, disk reads and in-memory reads all have significant wins from this change). Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376465#comment-14376465 ] Benedict commented on CASSANDRA-6936: - Right. There are three possibilities here: 1) do nothing 2) make all *common* fields behave this way 3) make all fields behave this way If we deliver 2 we're likely to get a significant chunk of any performance benefit, but at the cost of code simplicity. 3 should give us a smidgen more benefit but with simpler code (which in turn may let us squeeze more out of it, as the code becomes less brittle and easier to test, so we can push it a little further). There's also an orthogonal discussion of a perhaps weakening of the requirements for this ticket to just binary prefix comparable, or even _byte_ prefix comparable, rather than _unsigned binary_ prefix comparable. If any such relaxation makes it appreciable easier and less ugly. I just want us to investigate as open mindedly as possible how viable going the whole hog is, and where the ugliness, deprecation or user pain points might be. It's possible it's a no-go, but I think we may have aborted too early, given the significant upsides. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376410#comment-14376410 ] Jonathan Ellis commented on CASSANDRA-6936: --- To get that simplification we'd have to commit to making *all* types byte-order-comparable, not just a subset as above, right? Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305403#comment-14305403 ] Benedict commented on CASSANDRA-6936: - Well, the problem is that CASSANDRA-8731 only optimises compares involving multiple clustering columns. Wide rows with a single clustering column will still be very expensive - unless they're byte-order comparable, in which case 8731 could optimise those significantly as well (by creating trees within a clustering column, rather than between them) Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305353#comment-14305353 ] Aleksey Yeschenko commented on CASSANDRA-6936: -- I'm hoping that just having CASSANDRA-8730 and CASSANDRA-8731 would be enough. Many subtle and maybe not so subtle issues with converting representations under the hood like this will come up. One would be breaking timestamp ties in reconcile. To preserve old rules and avoid corruption you'd have to convert the new representation back to the regular representation, and do a comparison on that. So cells would now have to reference the type of the value, too. bq. v4 protocol includes new serialization formats Do you mean new types, or changing serialization format of the existing types? The latter wouldn't be welcomed by driver authors. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305367#comment-14305367 ] Aleksey Yeschenko commented on CASSANDRA-6936: -- Additionally, I wouldn't want to layer extra conversion logic on top of the already happening CASSANDRA-8099. We will have bugs there (in back and forth convertion of mutations and read commands). We are still catching bugs of this kind from CASSANDRA-3237. You don't want to make things worth by having this on top, in a single release. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305376#comment-14305376 ] Benedict commented on CASSANDRA-6936: - Introducing new types that don't need conversion that v4 protocols can implement makes sense; but we can easily convert on the fly for earlier ones, or v4 as well. It's not challenging to do so, especially for the common fields of int, long, timestamp and UUID. Timestamp reconciliation isn't as much of a problem, since we convert them to actual native longs in Java-land; either representation is fine for those. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305379#comment-14305379 ] Jonathan Ellis commented on CASSANDRA-6936: --- It wouldn't be welcomed by authors but it's not THAT big a deal, realistically. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305381#comment-14305381 ] Jonathan Ellis commented on CASSANDRA-6936: --- Hmm, so basically CASSANDRA-8731's goal in life is to reduce the impact of optimizing compares (since 8731 will perform less of them). Let's see if we can do 8731 for 3.0 and see how big a problem this still is in practice at that point. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305383#comment-14305383 ] Aleksey Yeschenko commented on CASSANDRA-6936: -- bq. Timestamp reconciliation isn't as much of a problem I was talking about a more esoteric thing - Cell#reconcile() when the timestamps are equal. Using BB.compareTo() on the value. Need to have the exact same representation as we have now, there. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305224#comment-14305224 ] Aleksey Yeschenko commented on CASSANDRA-6936: -- Are we still talking about making *all* types byte-order compatible, or just a subset of them? B/c the former is almost certainly a non-option so long as we support custom types (and unfortunately it doesn't seem like dropping that is an option). Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305232#comment-14305232 ] Benedict commented on CASSANDRA-6936: - Just a subset, given the support for custom types. Ideally all non-custom types, but at least the common ones. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305181#comment-14305181 ] Benedict commented on CASSANDRA-6936: - bq. Maybe a time will come where comparisons are our main bottleneck but we're not there atm and future storage changes will probably impact this as well. We are there already. Speak to [~jblangs...@datastax.com], for instance, who's been working with two users recently seeing CPU costs of comparison bottleneck performance. One of these customers is seeing a blistering 4MB/s of compaction throughput with their CPUs maxed out. Comparisons are pretty much the main time sink for c* when working with clustering columns, and especially collections. The big problem fields are int, bigint and timestamp. All of these are very commonly used, and trivial to make byte-order comparable. The optimisations made a little while back had a significant impact on CPU cost of merges, and they all depend on byte-order comaprability of every clustering column on the table. For such small fields the cost of the virtual invocation is a significant percentage of the time spent since the data will generally be in cache, having just been read off disk. We can avoid multiple such virtual invocations if all of the fields are byte-order comparable. It also improves instruction cache occupancy for these common methods, since they all go through the same codepath (at the time of making those optimisations, instruction cache misses were actually a significant problem, and likely worse on a live server with a more varied workload). Future storage changes largely depend on it too for delivering the best performance, as the binary trie is likely to be the most significant win. Further CASSANDRA-8731 can perhaps exploit the nature of these fields to reduce costs of merging even further. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305283#comment-14305283 ] Jonathan Ellis commented on CASSANDRA-6936: --- (JB's issue: CASSANDRA-8730) It sounds tractable for 3.0 if we start soon; neither backwards compatibility with old-version sstables nor telling clients that v4 protocol includes new serialization formats should be a huge obstacle. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304540#comment-14304540 ] Jonathan Ellis commented on CASSANDRA-6936: --- I'm not sure this is worth the pain tbh. Some common comparisons (notably text) are already byte-comparable and others (int32) are close enough that speed gain will likely be negligible. Maybe a time will come where comparisons are our main bottleneck but we're not there atm and future storage changes will probably impact this as well. I'm inclined to close as Later. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948999#comment-13948999 ] Sylvain Lebresne commented on CASSANDRA-6936: - Not too sure what you have in mind here. I don't see how we could achieve that given that some type (TimeUUID, IntegerType, ...) are intrinsically not comparable by their bytes representation. And since we do return that representation to the user, it's not like we can change it to whatever suits us. Or I guess we could have some conversion of representation when receiving/sending values but 1) I don't see an easy way to have a bytes comparable representation of say IntegerType (since it's variable length) and 2) I'm rather uncomfortable with doing complex bit manipulations of the user data. Besides, there is the custom types (i.e. when users provide their own AbstractType implementation). Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949150#comment-13949150 ] Benedict commented on CASSANDRA-6936: - bq. Or I guess we could have some conversion of representation when receiving/sending values I would settle for conversion when reading/writing from disk for these, but at send/receive would be best, so that we can benefit from the changes in memory as well. But our on-disk indexing is currently quite lacking, and improving that would be a tremendous help by itself. bq. I don't see an easy way to have a bytes comparable representation of say IntegerType (since it's variable length) [http://www.dlugosz.com/ZIP2/VLI.html] looks to be one pretty simple such encoding, but there are others bq. there is the custom types This is more of an issue. DecimalType is also tricky (though still achievable I'm sure). It _may_ be that we have a slow fallback for those types we decide are too problematic to convert, but it would be good to aim for a situation where we can have a fast route, and where we can make on-disk optimisations. In an ideal world, though, we would simply not support indexing (clustering/naming) on fields that can't be given this property (which is probably very few, and probably not a major limitation). bq. I'm rather uncomfortable with doing complex bit manipulations of the user data... And since we do return that representation to the user, it's not like we can change it to whatever suits us I'm not sure your rationale for this. It seems an arbitrary distinction from all of the other complex things we do to user data. All we do is shuffle around/encode/wrap user data. This is exactly the kind of thing a database is supposed to do to make the user's life easier, and in this event _we chose_ the encoding, so the user has no specific attachment to it. We could easily create new types that require no conversion, and encourage users to switch for safety/efficiency, but so long as any conversion is lossless, it shouldn't be a problem. Investigating this has raised another related issue, which is that I only now realised we store a 4-byte length for every single value. This seems immensely wasteful, and at the same time as any of these changes we should push this logic into AbstractType, so that those that are fixed length, or only need a short length, or can otherwise encode their length, can decide for themselves what size length to write. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949611#comment-13949611 ] Aleksey Yeschenko commented on CASSANDRA-6936: -- bq. so that those that are fixed length, or only need a short length, or can otherwise encode their length, can decide for themselves what size length to write. Strictly speaking, we don't have any fixed-length types, because even a boolean column can contain an empty value (empty BB passed via Thrift, for example). Not that it makes the larger point invalid, just something to be aware of. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949619#comment-13949619 ] Benedict commented on CASSANDRA-6936: - Well, it depends on what you mean by fixed-width. That may possibly be encodable by another means - your example of Boolean is trivially encodable in a fixed-width on-disk representation. But I take your wider point. Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949626#comment-13949626 ] Aleksey Yeschenko commented on CASSANDRA-6936: -- Fixed, but not single-byte, because BooleanType would accept any BB as long as it's 0 or 1 byte of size, and it can't be changed for backward-compatibility reasons :\ Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
[ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949631#comment-13949631 ] Benedict commented on CASSANDRA-6936: - Oh. :( Make all byte representations of types comparable by their unsigned byte representation only Key: CASSANDRA-6936 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 This could be a painful change, but is necessary for implementing a trie-based index, and settling for less would be suboptimal; it also should make comparisons cheaper all-round, and since comparison operations are pretty much the majority of C*'s business, this should be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with major performance impacts). No copying/special casing/slicing should mean fewer opportunities to introduce performance regressions as well. Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully this shouldn't be too much more of a burden. -- This message was sent by Atlassian JIRA (v6.2#6252)