[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438694#comment-13438694 ] André Cruz commented on CASSANDRA-4245: --- I'm also interested in a UTF-8 comparator that orders columns alphabetically. In fact, I was expecting this to be the default behaviour in Cassandra until it bit me. For example, with 3 columns: André, Zeus and Ándré. I was expecting: André Ándré Zeus The result was: André Zeus Ándré This is what's being discussed in this issue, right? Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Assignee: Aaron Morton Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13408765#comment-13408765 ] Ertio Lew commented on CASSANDRA-4245: -- Any progress on this? when can we expect this ? Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Assignee: Aaron Morton Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279073#comment-13279073 ] Aaron Morton commented on CASSANDRA-4245: - You're right, my thinking has been dogmatic. AARON and aaron are never equal, they are just sorted close to each other. Will see if I can hack up a LocalAwareUTF8Type() that takes a local as a type param. Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277721#comment-13277721 ] Aaron Morton commented on CASSANDRA-4245: - bq. case-insensitive but case-preserving, Thinking about it, the six columns option more closely matches the RDBMS experience. Where the case of the string is preserved but then ignored in queries. It probably also take less to implement. bq. And providing a comparator per locale is clearly insane. I would imagine the collation being a comparator property that was used to construct the java.text.Collator - e.g UTF8Type(query_collation=english_CI_AS) for english local, case insensitive, accent sensitive. Would need to do some research on how to use the java.text.Collator correctly though. Still think there may be a need, but it's more than a drop in comparator. Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277751#comment-13277751 ] Sylvain Lebresne commented on CASSANDRA-4245: - bq. Where the case of the string is preserved but then ignored in queries. I don't think that this is what the '6 columns' option does. Namely, if we have 6 columns, it means that we don't ignore the case in queries (since we have multiple values for the same name but with different case). In other words, if its case insensitivity we want, it's the 3 columns option, for which I kind of agree with Jonathan and Brandon, can be done client side fairly easily by lower-casing everything. The 6 columns option is more about having a different string order that puts strings that differ only by case closer, which can be neat, but is it so useful that it justify being a native type? Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276807#comment-13276807 ] Jonathan Ellis commented on CASSANDRA-4245: --- bq. I think 3 columns is what we want In that case, I think our message should be, call toLowerCase client side. It's virtually painless and doesn't expose us to the mess that is case-insensitive but case-preserving, which is what I think you're suggesting. bq. There is a default case insensitive comparator in java Note that this Comparator does not take locale into account, and will result in an unsatisfactory ordering for certain locales. The java.text package provides Collators to allow locale-sensitive ordering. I'm starting to think that Brandon is right, and trying to do this in a unicode-aware world is a world of hurt. In particular, a single case-insensitive comparator will never provide the right ordering for all locales. And providing a comparator per locale is clearly insane. Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13275982#comment-13275982 ] Brandon Williams commented on CASSANDRA-4245: - I'm more concerned with proliferating comparators when another solution is just as good. The cost of supporting comparators runs deep into clients, hadoop, pig, and more, and I feel like we've already taken one misstep with DateType, which is just a long underneath (and a timestamp as a long would've worked just fine instead.) Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276012#comment-13276012 ] Jonathan Ellis commented on CASSANDRA-4245: --- What other solution is just as good in the I want case-insensitive collation based on the column name case? This is even more important in CQL3 so I'm inclined to support it. Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276020#comment-13276020 ] Brandon Williams commented on CASSANDRA-4245: - Forcing a case upon insertion (and if necessary, storing the case-sensitive value elsewhere) seems fairly workable (unless you need uniqueness, though that seems a bit odd,) but if it's important for CQL3 then I'm not opposed. Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276274#comment-13276274 ] Aaron Morton commented on CASSANDRA-4245: - Was thinking about the impact of case insensitive comparisons. Say we have the values: aaron, Aaron, AARON, Äaron, BOB and bob. Using a Case Insensitive, Accent Sensitive collation the order should be (am using bytes as a secondary ordering, and guessing Ä occurs after the non accented A): 1. AARON, Aaron, aaron 2. Äaron 3. Bob, bob We need to decide if the collation above results in three or six columns in Cassandra. Some examples of where the comparison is used: * When writing the sorted memtable we are not concerned with equality, only relative ordering which is: AARON, Aaron, aaron, Äaron, Bob, bob * When apply a mutation to a CF we are concerned with equality, relative ordering is not important. The six columns should be treated as six unique values, or as three columns. * When resolving a query we are concerned with equality and relative ordering, but the equality is different to the examples above. We need to know that the three non accented Aaron's are equal, and that Bobs occur later. If three columns writing AARON then aaron then reading aaron may result in AARON being returned. When reducing columns in a slice we need a deterministic way to select the column name to use in the response. And / or we the response digest needs to be calculated differently. If six columns comparators need to support a unique ordering that is used in memtables and sstables, and a query ordering used when slicing. In the example query ordering results in 3 unique values, unique ordering results in 6. I _think_ 3 columns is what we want. Thoughts ? wrt the configuration, collation could be a CF level configuration used by comparators that support it. Per column collation would only be used by secondary indexing and seems a little overkill. Provide a UT8Type (case insensitive) comparator --- Key: CASSANDRA-4245 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 Project: Cassandra Issue Type: New Feature Reporter: Ertio Lew Priority: Minor It is a common use case to use a bunch of entity names as column names then use the row as a search index, using search by range. For such use cases others, it is useful to have a UTF8 comparator that provides case insensitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira