[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913498#comment-17913498 ] Cheng Wang commented on CASSANDRA-20102: +1. The patch LGTM. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903841#comment-17903841 ] Jordan West commented on CASSANDRA-20102: - With the tests all passing besides the unrelated TCM one and the changes [~jolynch] has made I am +1 > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902811#comment-17902811 ] Caleb Rackliffe commented on CASSANDRA-20102: - No objection from me, but how much pre-CASSANDRA-20080 content would we have to bring along? > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902806#comment-17902806 ] David Capwell commented on CASSANDRA-20102: --- [~ifesdjeen], [~maedhroz] we really need to forward port the harry/ast work from cep-15-accord to trunk... would love to just add these functions to the generated CQL to see if they handle all the fun edge cases... > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902334#comment-17902334 ] Jordan West commented on CASSANDRA-20102: - CircleCI: * [java17|https://app.circleci.com/pipelines/github/jrwest/cassandra/204/workflows/e09bf370-58b1-49df-a697-eed4ee368f06] - only had one unrelated failure (related to TCM) * [java11|https://app.circleci.com/pipelines/github/jrwest/cassandra/204/workflows/7a73e890-8cfe-4202-8fd8-44499937b8b0] - running now > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902072#comment-17902072 ] Joey Lynch commented on CASSANDRA-20102: Awesome, rebased onto trunk, commit message standardized, and documentation added. The [patch|https://github.com/apache/cassandra/pull/3707/commits/5e89621d9659fc57ff0fc038fd42924257a9bf22] should be ready to test and then we just need [~clohfink] to take another look before commit. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902060#comment-17902060 ] Jordan West commented on CASSANDRA-20102: - Agreed. I can kick it off. Traveling today so if you want to get those bits finalized I can kick off the build this evening or early tomorrow. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902059#comment-17902059 ] Joey Lynch commented on CASSANDRA-20102: I have not run circleci that recently, my guess is things have probably changed so probably faster if you run them? I can squash and add docs later tonight if you just want to do one run. I'm thinking folks would have objected by now if they object. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902057#comment-17902057 ] Jordan West commented on CASSANDRA-20102: - [~jolynch] do you have circle ci set up? I can kick off a build if you would like. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901971#comment-17901971 ] Joey Lynch commented on CASSANDRA-20102: Converted my fuzz test to a [quicktheories|https://github.com/apache/cassandra/pull/3707/files#diff-1af98e0e50f1a39a49eb7e75ae1dc7f3c3620040104849298f6635e6dd03ca85R84] test, thanks [~jwest] ! I think this is ready for review. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Apache Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901049#comment-17901049 ] Joey Lynch commented on CASSANDRA-20102: Ok sounds good to me. Any feedback on the functions or implementation I should address? Will get the QT test done tomorrow. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900967#comment-17900967 ] Chris Lohfink commented on CASSANDRA-20102: --- +1 for trunk, wont be for 4.1 for sure since this is new feature > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900105#comment-17900105 ] Brandon Williams commented on CASSANDRA-20102: -- ML thread: https://lists.apache.org/thread/k9ml1k4fg6o7mfby1nr3y0mnq9r90dym > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900108#comment-17900108 ] Joey Lynch commented on CASSANDRA-20102: Awesome, changed the proposal to {{octet_length}} and {{length}}, thanks for catching me up. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octet_length}} - Defined on all types and returns the number of > underlying bytes in the bytebuffer representation of the column. Does _not_ > account for storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octet_length(value) from test.test; > system.length(key) | system.octet_length(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octet_length(key), octet_length(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octet_length(key) | system.octet_length(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900100#comment-17900100 ] Joey Lynch commented on CASSANDRA-20102: Oh awesome! Updating. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octetlength}} - Defined on all types and returns the number of underlying > bytes in the bytebuffer representation of the column. Does _not_ account for > storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octetlength(value) from test.test; > system.length(key) | system.octetlength(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octetlength(key), octetlength(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octetlength(key) | system.octetlength(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900099#comment-17900099 ] Stefan Miklosovic commented on CASSANDRA-20102: --- It actually seems to be the rule we have, heh https://issues.apache.org/jira/browse/CASSANDRA-18037 https://github.com/apache/cassandra/commit/d3c55ed8f82c601d188fedbecb62a464171ea624 > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octetlength}} - Defined on all types and returns the number of underlying > bytes in the bytebuffer representation of the column. Does _not_ account for > storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octetlength(value) from test.test; > system.length(key) | system.octetlength(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octetlength(key), octetlength(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octetlength(key) | system.octetlength(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900097#comment-17900097 ] Stefan Miklosovic commented on CASSANDRA-20102: --- Somewhat tangential to this is CASSANDRA-19546 I would love to see both in and I think that it is great idea to simplify life for people in this regard. https://github.com/apache/cassandra/pull/3239/files#diff-29608698944826e9532ac0678a1c6166e00e0f07ec6d81ead60a2bba484f4407 > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octetlength}} - Defined on all types and returns the number of underlying > bytes in the bytebuffer representation of the column. Does _not_ account for > storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octetlength(value) from test.test; > system.length(key) | system.octetlength(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octetlength(key), octetlength(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octetlength(key) | system.octetlength(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string
[ https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900098#comment-17900098 ] Joey Lynch commented on CASSANDRA-20102: Oh nice yeah I agree together they are very helpful (these for computers/code, those others for humans)! [~smiklosovic] do you know if we are ok having `_` in functions? I couldn't find many examples where we had underscores but if that's kosher than I'd change this proposal to {{octet_length}}. > Support simple length functions for blob and string > --- > > Key: CASSANDRA-20102 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20102 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Low > Fix For: 5.x > > > A common question from users is how to query for the length of {{blob}} and > {{text}} columns, and the answer is often for users to enable UDFs and then > write their own > ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length], > > [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions], > ...). I think it would make sense for Cassandra to support a subset of the > SQL99 [binary length > functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression] > and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's > [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] > function. > Specifically I propose two new system functions: > # {{octetlength}} - Defined on all types and returns the number of underlying > bytes in the bytebuffer representation of the column. Does _not_ account for > storage overhead (e.g. does not account for metadata) > # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 > codepoints in the string, equivalent to Java's > [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()] > function. > Patch coming shortly. Particularly interested in feedback if we A) think > these functions are reasonable to have inside Cassandra, and B) like the > names of the functions to align with SQL or if we want to go nonstandard and > do like {{blength}} instead of octet length. > An example of the difference between octet length and string length > {code} > // A table with text key and blob value > cqlsh> select key, value from test.test; > key | value > ---+ > hello | 0x12af > cqlsh> select length(key), octetlength(value) from test.test; > system.length(key) | system.octetlength(value) > +--- > 5 | 2 > cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF); > cqlsh> select length(key), octetlength(key), octetlength(value) from > test.test where key = 'こんにちは世界'; > system.length(key) | system.octetlength(key) | system.octetlength(value) > +-+--- > 7 | 21 | 2 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org