[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2022-06-30 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560967#comment-17560967
 ] 

Brandon Williams commented on CASSANDRA-6936:
-

Yes.

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Benedict Elliott Smith
>Assignee: Branimir Lambov
>Priority: Normal
>  Labels: compaction, performance
> Fix For: 4.2
>
>  Time Spent: 25h
>  Remaining Estimate: 0h
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2022-06-30 Thread Ivan Senic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560942#comment-17560942
 ] 

Ivan Senic commented on CASSANDRA-6936:
---

Do I understand good that this will be first available in the `4.2` release 
that is scheduled to go out in a year?

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Benedict Elliott Smith
>Assignee: Branimir Lambov
>Priority: Normal
>  Labels: compaction, performance
> Fix For: 4.2
>
>  Time Spent: 25h
>  Remaining Estimate: 0h
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2022-06-27 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17559272#comment-17559272
 ] 

Caleb Rackliffe commented on CASSANDRA-6936:


+1

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Benedict Elliott Smith
>Assignee: Branimir Lambov
>Priority: Normal
>  Labels: compaction, performance
> Fix For: 4.x
>
>  Time Spent: 24.5h
>  Remaining Estimate: 0h
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2022-06-22 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557597#comment-17557597
 ] 

Caleb Rackliffe commented on CASSANDRA-6936:


btw, not sure anything I've contributed here is co-author worthy, but if you 
do, I'd use this e-mail...

{noformat}
Co-authored-by: Caleb Rackliffe 
{noformat}

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Benedict Elliott Smith
>Assignee: Branimir Lambov
>Priority: Normal
>  Labels: compaction, performance
> Fix For: 4.x
>
>  Time Spent: 24h
>  Remaining Estimate: 0h
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2022-06-21 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557238#comment-17557238
 ] 

Caleb Rackliffe commented on CASSANDRA-6936:


My pass at review is complete, and all comments/nits (none of which amount to 
serious pushback) are inline in the PR. I'll officially +1 once those items are 
resolved (or reasonably ignored). Nice work!

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Benedict Elliott Smith
>Assignee: Branimir Lambov
>Priority: Normal
>  Labels: compaction, performance
> Fix For: 4.x
>
>  Time Spent: 23h 10m
>  Remaining Estimate: 0h
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2022-01-10 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471990#comment-17471990
 ] 

Benedict Elliott Smith commented on CASSANDRA-6936:
---

I trust you to have made good choices, [~blambov]. I'll see if I can find some 
time to get an overview the work for some high level feedback about the 
serialisation format, but I haven't thought about this problem domain in a long 
time so my consideration may be less valuable than you imagine. I won't likely 
have the time to perform a full review either way.

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Benedict Elliott Smith
>Assignee: Branimir Lambov
>Priority: Normal
>  Labels: compaction, performance
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2022-01-10 Thread Branimir Lambov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471978#comment-17471978
 ] 

Branimir Lambov commented on CASSANDRA-6936:


[~benedict], I think it would be best if you review this, so that we can 
incorporate any ideas you may have into the encoding while it still isn't used 
in any persisted data and we can freely modify it. Do you think you will have 
the time to do so?

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Benedict Elliott Smith
>Assignee: Branimir Lambov
>Priority: Normal
>  Labels: compaction, performance
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2021-10-28 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17435382#comment-17435382
 ] 

Benedict Elliott Smith commented on CASSANDRA-6936:
---

Another blast from the past. I'm looking forward to seeing this land.

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Benedict Elliott Smith
>Assignee: Branimir Lambov
>Priority: Normal
>  Labels: compaction, performance
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2017-10-10 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198353#comment-16198353
 ] 

Alex Petrov commented on CASSANDRA-6936:


Nick Dimiduk also worked on byte-ordered types in HBase: 
[HBASE-8201|https://issues.apache.org/jira/browse/HBASE-8201] and 
[OrderedBytes|https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/OrderedBytes.java].

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>  Labels: compaction, performance
> Fix For: 4.x
>
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2017-10-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188967#comment-16188967
 ] 

Jeff Jirsa commented on CASSANDRA-6936:
---

Realize this ticket is mostly idle, but CASSANDRA-13553 has a similar need 
(byte order comparable types to map into rocksdb storage model)  and in that 
Dikang found this, whic seems interesting: https://github.com/ndimiduk/orderly



> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>  Labels: compaction, performance
> Fix For: 4.x
>
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2016-06-24 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348369#comment-15348369
 ] 

Jonathan Ellis commented on CASSANDRA-6936:
---

My understanding is that compaction is largely cpu-bound on cell comparisons.  
I'd like to see a prototype of what kind of benefits we can get there, e.g. 
using blob types (which are already byte-comparable).

> Make all byte representations of types comparable by their unsigned byte 
> representation only
> 
>
> Key: CASSANDRA-6936
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Branimir Lambov
>  Labels: compaction, performance
> Fix For: 4.x
>
>
> This could be a painful change, but is necessary for implementing a 
> trie-based index, and settling for less would be suboptimal; it also should 
> make comparisons cheaper all-round, and since comparison operations are 
> pretty much the majority of C*'s business, this should be easily felt (see 
> CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
> major performance impacts). No copying/special casing/slicing should mean 
> fewer opportunities to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
> changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-08-04 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653700#comment-14653700
 ] 

Aleksey Yeschenko commented on CASSANDRA-6936:
--

[~jbellis] see CASSANDRA-9901

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: compaction, performance
 Fix For: 3.x


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-05-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524461#comment-14524461
 ] 

Jonathan Ellis commented on CASSANDRA-6936:
---

[~iamaleksey] I think you had some comments on this from NGCC discussion?

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.x


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-03-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375771#comment-14375771
 ] 

Benedict commented on CASSANDRA-6936:
-

So, the more often I think of future storage changes, the more this becomes a 
pain and a headache. I would like to reassess the possibility of making 
everything byte-order comparable. How widely deployed are custom AbstractType 
implementations where the comparator makes a difference? Because it seems 
dropping support for just this (and having the user define an ASC/DESC order on 
the fields for maps/sets/tables within a UDT instead, for instance) would give 
us the ability to deliver it universally.

As far as I am aware, we're the only database that hamstrings ourselves with 
this limitation (or permittance). I would like to byte-prefix compress our 
index file (because as standard it takes up a significant proportion of the 
data it indexes unnecessarily, inflating the number of disk accesses and 
reducing the effective capacity of the key cache), but this isn't possible 
without a majority of fields supporting this. Even then, if we have special 
casing for those that do not, this is a headache and code complexity. It also 
pollutes the icache and branch predictors (not just with the inflation of 
variances, but in the logic to select between them). This is not to be 
understated: it's surprising how many icache misses you can get on a simple 
in-memory stress workload, which is underrepresentative of the variation for a 
normal deployment. vtune rates our utilisation of chips pretty poorly, and this 
is a major contributor. The same is true for optimising merges (we get 
significantly better algorithmic complexity with much fewer changes if the 
comparable fields are byte-prefix comparable), and for compressing clustering 
columns in data files on disk. I am certain I will encounter more scenarios 
before long.

I think the cumulative performance wins here would be really _very_ 
significant, for all workloads (compaction, disk reads and in-memory reads all 
have significant wins from this change).

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-03-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376465#comment-14376465
 ] 

Benedict commented on CASSANDRA-6936:
-

Right. There are three possibilities here: 

1) do nothing
2) make all *common* fields behave this way
3) make all fields behave this way

If we deliver 2 we're likely to get a significant chunk of any performance 
benefit, but at the cost of code simplicity. 3 should give us a smidgen more 
benefit but with simpler code (which in turn may let us squeeze more out of it, 
as the code becomes less brittle and easier to test, so we can push it a little 
further). There's also an orthogonal discussion of a perhaps weakening of the 
requirements for this ticket to just binary prefix comparable, or even _byte_ 
prefix comparable, rather than _unsigned binary_ prefix comparable. If any such 
relaxation makes it appreciable easier and less ugly.

I just want us to investigate as open mindedly as possible how viable going the 
whole hog is, and where the ugliness, deprecation or user pain points might be. 
It's possible it's a no-go, but I think we may have aborted too early, given 
the significant upsides.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-03-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376410#comment-14376410
 ] 

Jonathan Ellis commented on CASSANDRA-6936:
---

To get that simplification we'd have to commit to making *all* types 
byte-order-comparable, not just a subset as above, right?

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305403#comment-14305403
 ] 

Benedict commented on CASSANDRA-6936:
-

Well, the problem is that CASSANDRA-8731 only optimises compares involving 
multiple clustering columns. Wide rows with a single clustering column will 
still be very expensive - unless they're byte-order comparable, in which case 
8731 could optimise those significantly as well (by creating trees within a 
clustering column, rather than between them)

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305353#comment-14305353
 ] 

Aleksey Yeschenko commented on CASSANDRA-6936:
--

I'm hoping that just having CASSANDRA-8730 and CASSANDRA-8731 would be enough.

Many subtle and maybe not so subtle issues with converting representations 
under the hood like this will come up.
One would be breaking timestamp ties in reconcile. To preserve old rules and 
avoid corruption you'd have to convert the new representation back to the 
regular representation, and do a comparison on that. So cells would now have to 
reference the type of the value, too.

bq. v4 protocol includes new serialization formats

Do you mean new types, or changing serialization format of the existing types? 
The latter wouldn't be welcomed by driver authors.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305367#comment-14305367
 ] 

Aleksey Yeschenko commented on CASSANDRA-6936:
--

Additionally, I wouldn't want to layer extra conversion logic on top of the 
already happening CASSANDRA-8099. We will have bugs there (in back and forth 
convertion of mutations and read commands). We are still catching bugs of this 
kind from CASSANDRA-3237. You don't want to make things worth by having this on 
top, in a single release.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305376#comment-14305376
 ] 

Benedict commented on CASSANDRA-6936:
-

Introducing new types that don't need conversion that v4 protocols can 
implement makes sense; but we can easily convert on the fly for earlier ones, 
or v4 as well. It's not challenging to do so, especially for the common fields 
of int, long, timestamp and UUID.

Timestamp reconciliation isn't as much of a problem, since we convert them to 
actual native longs in Java-land; either representation is fine for those.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305379#comment-14305379
 ] 

Jonathan Ellis commented on CASSANDRA-6936:
---

It wouldn't be welcomed by authors but it's not THAT big a deal, realistically.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305381#comment-14305381
 ] 

Jonathan Ellis commented on CASSANDRA-6936:
---

Hmm, so basically CASSANDRA-8731's goal in life is to reduce the impact of 
optimizing compares (since 8731 will perform less of them).

Let's see if we can do 8731 for 3.0 and see how big a problem this still is in 
practice at that point.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305383#comment-14305383
 ] 

Aleksey Yeschenko commented on CASSANDRA-6936:
--

bq. Timestamp reconciliation isn't as much of a problem

I was talking about a more esoteric thing - Cell#reconcile() when the 
timestamps are equal. Using BB.compareTo() on the value. Need to have the exact 
same representation as we have now, there.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305224#comment-14305224
 ] 

Aleksey Yeschenko commented on CASSANDRA-6936:
--

Are we still talking about making *all* types byte-order compatible, or just a 
subset of them?

B/c the former is almost certainly a non-option so long as we support custom 
types (and unfortunately it doesn't seem like dropping that is an option).

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305232#comment-14305232
 ] 

Benedict commented on CASSANDRA-6936:
-

Just a subset, given the support for custom types. Ideally all non-custom 
types, but at least the common ones.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305181#comment-14305181
 ] 

Benedict commented on CASSANDRA-6936:
-

bq. Maybe a time will come where comparisons are our main bottleneck but we're 
not there atm and future storage changes will probably impact this as well.

We are there already. Speak to [~jblangs...@datastax.com], for instance, who's 
been working with two users recently seeing CPU costs of comparison bottleneck 
performance. One of these customers is seeing a blistering 4MB/s of compaction 
throughput with their CPUs maxed out. Comparisons are pretty much the main time 
sink for c* when working with clustering columns, and especially collections.

The big problem fields are int, bigint and timestamp. All of these are very 
commonly used, and trivial to make byte-order comparable. The optimisations 
made a little while back had a significant impact on CPU cost of merges, and 
they all depend on byte-order comaprability of every clustering column on the 
table. For such small fields the cost of the virtual invocation is a 
significant percentage of the time spent since the data will generally be in 
cache, having just been read off disk. We can avoid multiple such virtual 
invocations if all of the fields are byte-order comparable. It also improves 
instruction cache occupancy for these common methods, since they all go through 
the same codepath (at the time of making those optimisations, instruction cache 
misses were actually a significant problem, and likely worse on a live server 
with a more varied workload).

Future storage changes largely depend on it too for delivering the best 
performance, as the binary trie is likely to be the most significant win. 
Further CASSANDRA-8731 can perhaps exploit the nature of these fields to reduce 
costs of merging even further.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305283#comment-14305283
 ] 

Jonathan Ellis commented on CASSANDRA-6936:
---

(JB's issue: CASSANDRA-8730)

It sounds tractable for 3.0 if we start soon; neither backwards compatibility 
with old-version sstables nor telling clients that v4 protocol includes new 
serialization formats should be a huge obstacle.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2015-02-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304540#comment-14304540
 ] 

Jonathan Ellis commented on CASSANDRA-6936:
---

I'm not sure this is worth the pain tbh.  Some common comparisons (notably 
text) are already byte-comparable and others (int32) are close enough that 
speed gain will likely be negligible.

Maybe a time will come where comparisons are our main bottleneck but we're not 
there atm and future storage changes will probably impact this as well.

I'm inclined to close as Later.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2014-03-27 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948999#comment-13948999
 ] 

Sylvain Lebresne commented on CASSANDRA-6936:
-

Not too sure what you have in mind here. I don't see how we could achieve that 
given that some type (TimeUUID, IntegerType, ...) are intrinsically not 
comparable by their bytes representation. And since we do return that 
representation to the user, it's not like we can change it to whatever suits 
us. Or I guess we could have some conversion of representation when 
receiving/sending values but 1) I don't see an easy way to have a bytes 
comparable representation of say IntegerType (since it's variable length) and 
2) I'm rather uncomfortable with doing complex bit manipulations of the user 
data. Besides, there is the custom types (i.e. when users provide their own 
AbstractType implementation).

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2014-03-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949150#comment-13949150
 ] 

Benedict commented on CASSANDRA-6936:
-

bq. Or I guess we could have some conversion of representation when 
receiving/sending values

I would settle for conversion when reading/writing from disk for these, but at 
send/receive would be best, so that we can benefit from the changes in memory 
as well. But our on-disk indexing is currently quite lacking, and improving 
that would be a tremendous help by itself.

bq. I don't see an easy way to have a bytes comparable representation of say 
IntegerType (since it's variable length)

[http://www.dlugosz.com/ZIP2/VLI.html] looks to be one pretty simple such 
encoding, but there are others

bq.  there is the custom types

This is more of an issue. DecimalType is also tricky (though still achievable 
I'm sure). It _may_ be that we have a slow fallback for those types we decide 
are too problematic to convert, but it would be good to aim for a situation 
where we can have a fast route, and where we can make on-disk optimisations. In 
an ideal world, though, we would simply not support indexing 
(clustering/naming) on fields that can't be given this property (which is 
probably very few, and probably not a major limitation).

bq.  I'm rather uncomfortable with doing complex bit manipulations of the user 
data... And since we do return that representation to the user, it's not like 
we can change it to whatever suits us

I'm not sure your rationale for this. It seems an arbitrary distinction from 
all of the other complex things we do to user data. All we do is shuffle 
around/encode/wrap user data. This is exactly the kind of thing a database is 
supposed to do to make the user's life easier, and in this event _we chose_ the 
encoding, so the user has no specific attachment to it. We could easily create 
new types that require no conversion, and encourage users to switch for 
safety/efficiency, but so long as any conversion is lossless, it shouldn't be a 
problem. 

Investigating this has raised another related issue, which is that I only now 
realised we store a 4-byte length for every single value. This seems immensely 
wasteful, and at the same time as any of these changes we should push this 
logic into AbstractType, so that those that are fixed length, or only need a 
short length, or can otherwise encode their length, can decide for themselves 
what size length to write.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2014-03-27 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949611#comment-13949611
 ] 

Aleksey Yeschenko commented on CASSANDRA-6936:
--

bq. so that those that are fixed length, or only need a short length, or can 
otherwise encode their length, can decide for themselves what size length to 
write.

Strictly speaking, we don't have any fixed-length types, because even a boolean 
column can contain an empty value (empty BB passed via Thrift, for example). 
Not that it makes the larger point invalid, just something to be aware of.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2014-03-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949619#comment-13949619
 ] 

Benedict commented on CASSANDRA-6936:
-

Well, it depends on what you mean by fixed-width. That may possibly be 
encodable by another means - your example of Boolean is trivially encodable in 
a fixed-width on-disk representation. But I take your wider point.

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2014-03-27 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949626#comment-13949626
 ] 

Aleksey Yeschenko commented on CASSANDRA-6936:
--

Fixed, but not single-byte, because BooleanType would accept any BB as long as 
it's 0 or 1 byte of size, and it can't be changed for backward-compatibility 
reasons :\

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2014-03-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949631#comment-13949631
 ] 

Benedict commented on CASSANDRA-6936:
-

Oh. :(

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)