[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438694#comment-13438694
 ] 

André Cruz commented on CASSANDRA-4245:
---

I'm also interested in a UTF-8 comparator that orders columns alphabetically. 
In fact, I was expecting this to be the default behaviour in Cassandra until it 
bit me. For example, with 3 columns: André, Zeus and Ándré.

I was expecting:
André
Ándré
Zeus

The result was:
André
Zeus
Ándré

This is what's being discussed in this issue, right?

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Assignee: Aaron Morton
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-07-07 Thread Ertio Lew (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13408765#comment-13408765
 ] 

Ertio Lew commented on CASSANDRA-4245:
--

Any progress on this? when can we expect this ?

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Assignee: Aaron Morton
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-05-18 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279073#comment-13279073
 ] 

Aaron Morton commented on CASSANDRA-4245:
-

You're right, my thinking has been dogmatic. AARON and aaron are never equal, 
they are just sorted close to each other. 
 
Will see if I can hack up a LocalAwareUTF8Type() that takes a local as a type 
param. 

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-05-17 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277721#comment-13277721
 ] 

Aaron Morton commented on CASSANDRA-4245:
-

bq.  case-insensitive but case-preserving,
Thinking about it, the six columns option more closely matches the RDBMS 
experience. Where the case of the string is preserved but then ignored in 
queries. It probably also take less to implement.

bq.  And providing a comparator per locale is clearly insane.
I would imagine the collation being a comparator property that was used to 
construct the java.text.Collator - e.g 
UTF8Type(query_collation=english_CI_AS) for english local, case 
insensitive, accent sensitive. Would need to do some research on how to use the 
java.text.Collator correctly though. 

Still think there may be a need, but it's more than a drop in comparator.

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-05-17 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277751#comment-13277751
 ] 

Sylvain Lebresne commented on CASSANDRA-4245:
-

bq.  Where the case of the string is preserved but then ignored in queries.

I don't think that this is what the '6 columns' option does. Namely, if we have 
6 columns, it means that we don't ignore the case in queries (since we have 
multiple values for the same name but with different case). In other words, if 
its case insensitivity we want, it's the 3 columns option, for which I kind of 
agree with Jonathan and Brandon, can be done client side fairly easily by 
lower-casing everything. The 6 columns option is more about having a different 
string order that puts strings that differ only by case closer, which can be 
neat, but is it so useful that it justify being a native type?

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-05-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276807#comment-13276807
 ] 

Jonathan Ellis commented on CASSANDRA-4245:
---

bq. I think 3 columns is what we want

In that case, I think our message should be, call toLowerCase client side.  
It's virtually painless and doesn't expose us to the mess that is 
case-insensitive but case-preserving, which is what I think you're suggesting.

bq. There is a default case insensitive comparator in java 

Note that this Comparator does not take locale into account, and will result 
in an unsatisfactory ordering for certain locales. The java.text package 
provides Collators to allow locale-sensitive ordering.

I'm starting to think that Brandon is right, and trying to do this in a 
unicode-aware world is a world of hurt.  In particular, a single 
case-insensitive comparator will never provide the right ordering for all 
locales.  And providing a comparator per locale is clearly insane.

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-05-15 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13275982#comment-13275982
 ] 

Brandon Williams commented on CASSANDRA-4245:
-

I'm more concerned with proliferating comparators when another solution is just 
as good.  The cost of supporting comparators runs deep into clients, hadoop, 
pig, and more, and I feel like we've already taken one misstep with DateType, 
which is just a long underneath (and a timestamp as a long would've worked just 
fine instead.)

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-05-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276012#comment-13276012
 ] 

Jonathan Ellis commented on CASSANDRA-4245:
---

What other solution is just as good in the I want case-insensitive collation 
based on the column name case?  This is even more important in CQL3 so I'm 
inclined to support it.

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-05-15 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276020#comment-13276020
 ] 

Brandon Williams commented on CASSANDRA-4245:
-

Forcing a case upon insertion (and if necessary, storing the case-sensitive 
value elsewhere) seems fairly workable (unless you need uniqueness, though that 
seems a bit odd,) but if it's important for CQL3 then I'm not opposed.

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

2012-05-15 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276274#comment-13276274
 ] 

Aaron Morton commented on CASSANDRA-4245:
-

Was thinking about the impact of case insensitive comparisons.

Say we have the values: aaron, Aaron, AARON, Äaron, BOB and bob. Using a Case 
Insensitive, Accent Sensitive collation the order should be (am using bytes as 
a secondary ordering, and guessing Ä occurs after the non accented A):

1. AARON, Aaron, aaron
2. Äaron
3. Bob, bob

We need to decide if the collation above results in three or six columns in 
Cassandra. 

Some examples of where the comparison is used:
 * When writing the sorted memtable we are not concerned with equality, only 
relative ordering which is: AARON, Aaron,  aaron, Äaron, Bob, bob 
* When apply a mutation to a CF we are concerned with equality, relative 
ordering is not important. The six columns should be treated as six unique 
values, or as three columns. 
* When resolving a query we are concerned with equality and relative ordering, 
but the equality is different to the examples above. We need to know that the 
three non accented Aaron's are equal, and that Bobs occur later. 

If three columns writing AARON then aaron then reading aaron may result 
in AARON being returned. When reducing columns in a slice we need a 
deterministic way to select the column name to use in the response. And / or we 
the response digest needs to be calculated differently.  
 
If six columns comparators need to support a unique ordering that is used in 
memtables and sstables, and a query ordering used when slicing. In the 
example query ordering results in 3 unique values, unique ordering results in 
6.  

I _think_ 3 columns is what we want. Thoughts ? 

wrt the configuration, collation could be a CF level configuration used by 
comparators that support it. Per column collation would only be used by 
secondary indexing and seems a little overkill. 

 Provide a UT8Type (case insensitive) comparator
 ---

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira