[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2025-01-15 Thread Cheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913498#comment-17913498
 ] 

Cheng Wang commented on CASSANDRA-20102:


+1. The patch LGTM. 

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-12-07 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903841#comment-17903841
 ] 

Jordan West commented on CASSANDRA-20102:
-

With the tests all passing besides the unrelated TCM one and the changes 
[~jolynch] has made I am +1

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-12-03 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902811#comment-17902811
 ] 

Caleb Rackliffe commented on CASSANDRA-20102:
-

No objection from me, but how much pre-CASSANDRA-20080 content would we have to 
bring along?

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-12-03 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902806#comment-17902806
 ] 

David Capwell commented on CASSANDRA-20102:
---

[~ifesdjeen], [~maedhroz] we really need to forward port the harry/ast work 
from cep-15-accord to trunk... would love to just add these functions to the 
generated CQL to see if they handle all the fun edge cases...

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-12-02 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902334#comment-17902334
 ] 

Jordan West commented on CASSANDRA-20102:
-

CircleCI:

* 
[java17|https://app.circleci.com/pipelines/github/jrwest/cassandra/204/workflows/e09bf370-58b1-49df-a697-eed4ee368f06]
 - only had one unrelated failure (related to TCM)
* 
[java11|https://app.circleci.com/pipelines/github/jrwest/cassandra/204/workflows/7a73e890-8cfe-4202-8fd8-44499937b8b0]
 - running now

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-30 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902072#comment-17902072
 ] 

Joey Lynch commented on CASSANDRA-20102:


Awesome, rebased onto trunk, commit message standardized, and documentation 
added.  The 
[patch|https://github.com/apache/cassandra/pull/3707/commits/5e89621d9659fc57ff0fc038fd42924257a9bf22]
 should be ready to test and then we just need [~clohfink] to take another look 
before commit.

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-30 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902060#comment-17902060
 ] 

Jordan West commented on CASSANDRA-20102:
-

Agreed.  I can kick it off. Traveling today so if you want to get those bits 
finalized I can kick off the build this evening or early tomorrow. 

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-30 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902059#comment-17902059
 ] 

Joey Lynch commented on CASSANDRA-20102:


I have not run circleci that recently, my guess is things have probably changed 
so probably faster if you run them? I can squash and add docs later tonight if 
you just want to do one run. I'm thinking folks would have objected by now if 
they object.

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-30 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902057#comment-17902057
 ] 

Jordan West commented on CASSANDRA-20102:
-

[~jolynch] do you have circle ci set up? I can kick off a build if you would 
like. 

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-29 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901971#comment-17901971
 ] 

Joey Lynch commented on CASSANDRA-20102:


Converted my fuzz test to a 
[quicktheories|https://github.com/apache/cassandra/pull/3707/files#diff-1af98e0e50f1a39a49eb7e75ae1dc7f3c3620040104849298f6635e6dd03ca85R84]
 test, thanks [~jwest] !

I think this is ready for review.

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Apache Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-25 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17901049#comment-17901049
 ] 

Joey Lynch commented on CASSANDRA-20102:


Ok sounds good to me. Any feedback on the functions or implementation I should 
address? Will get the QT test done tomorrow.

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-25 Thread Chris Lohfink (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900967#comment-17900967
 ] 

Chris Lohfink commented on CASSANDRA-20102:
---

+1 for trunk, wont be for 4.1 for sure since this is new feature

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-21 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900105#comment-17900105
 ] 

Brandon Williams commented on CASSANDRA-20102:
--

ML thread: https://lists.apache.org/thread/k9ml1k4fg6o7mfby1nr3y0mnq9r90dym

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-21 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900108#comment-17900108
 ] 

Joey Lynch commented on CASSANDRA-20102:


Awesome, changed the proposal to {{octet_length}} and {{length}}, thanks for 
catching me up.

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octet_length}} - Defined on all types and returns the number of 
> underlying bytes in the bytebuffer representation of the column. Does _not_ 
> account for storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octet_length(value) from test.test;
>  system.length(key) | system.octet_length(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octet_length(key), octet_length(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octet_length(key) | system.octet_length(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-21 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900100#comment-17900100
 ] 

Joey Lynch commented on CASSANDRA-20102:


Oh awesome! Updating.

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octetlength}} - Defined on all types and returns the number of underlying 
> bytes in the bytebuffer representation of the column. Does _not_ account for 
> storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octetlength(value) from test.test;
>  system.length(key) | system.octetlength(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octetlength(key), octetlength(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octetlength(key) | system.octetlength(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-21 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900099#comment-17900099
 ] 

Stefan Miklosovic commented on CASSANDRA-20102:
---

It actually seems to be the rule we have, heh

https://issues.apache.org/jira/browse/CASSANDRA-18037

https://github.com/apache/cassandra/commit/d3c55ed8f82c601d188fedbecb62a464171ea624



> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octetlength}} - Defined on all types and returns the number of underlying 
> bytes in the bytebuffer representation of the column. Does _not_ account for 
> storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octetlength(value) from test.test;
>  system.length(key) | system.octetlength(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octetlength(key), octetlength(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octetlength(key) | system.octetlength(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-21 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900097#comment-17900097
 ] 

Stefan Miklosovic commented on CASSANDRA-20102:
---

Somewhat tangential to this is CASSANDRA-19546 I would love to see both in and 
I think that it is great idea to simplify life for people in this regard.

https://github.com/apache/cassandra/pull/3239/files#diff-29608698944826e9532ac0678a1c6166e00e0f07ec6d81ead60a2bba484f4407

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octetlength}} - Defined on all types and returns the number of underlying 
> bytes in the bytebuffer representation of the column. Does _not_ account for 
> storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octetlength(value) from test.test;
>  system.length(key) | system.octetlength(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octetlength(key), octetlength(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octetlength(key) | system.octetlength(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-20102) Support simple length functions for blob and string

2024-11-21 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900098#comment-17900098
 ] 

Joey Lynch commented on CASSANDRA-20102:


Oh nice yeah I agree together they are very helpful (these for computers/code, 
those others for humans)! [~smiklosovic] do you know if we are ok having `_` in 
functions? I couldn't find many examples where we had underscores but if that's 
kosher than I'd change this proposal to {{octet_length}}.

> Support simple length functions for blob and string
> ---
>
> Key: CASSANDRA-20102
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20102
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Low
> Fix For: 5.x
>
>
> A common question from users is how to query for the length of {{blob}} and 
> {{text}} columns, and the answer is often for users to enable UDFs and then 
> write their own 
> ([1|https://stackoverflow.com/questions/49674114/cassandra-query-string-length],
>  
> [2|https://stackoverflow.com/questions/58401750/how-can-i-get-size-of-blob-column-of-cassandra-using-user-defined-functions],
>  ...). I think it would make sense for Cassandra to support a subset of the 
> SQL99 [binary length 
> functions|https://sql-99.readthedocs.io/en/latest/chapters/05.html#octet-length-expression]
>  and a UTF8 codepoint aware {{length}} function similar to PostgreSQL's 
> [{{length}}|https://www.postgresql.org/docs/9.1/functions-string.html] 
> function.
> Specifically I propose two new system functions:
> # {{octetlength}} - Defined on all types and returns the number of underlying 
> bytes in the bytebuffer representation of the column. Does _not_ account for 
> storage overhead (e.g. does not account for metadata)
> # {{length}} - Defined on UTF8 strings, and returns the number of UTF8 
> codepoints in the string, equivalent to Java's 
> [{{String.length}}|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/String.html#length()]
>  function.
> Patch coming shortly. Particularly interested in feedback if we A) think 
> these functions are reasonable to have inside Cassandra, and B) like the 
> names of the functions to align with SQL or if we want to go nonstandard and 
> do like {{blength}} instead of octet length. 
> An example of the difference between octet length and string length
> {code}
> // A table with text key and blob value
> cqlsh> select key, value from test.test;
>  key   | value
> ---+
>  hello | 0x12af
> cqlsh> select length(key), octetlength(value) from test.test;
>  system.length(key) | system.octetlength(value)
> +---
>   5 | 2
> cqlsh> INSERT INTO test.test (key, value) VALUES ('こんにちは世界', 0x12AF);
> cqlsh> select length(key), octetlength(key), octetlength(value) from 
> test.test where key = 'こんにちは世界';
>  system.length(key) | system.octetlength(key) | system.octetlength(value)
> +-+---
>   7 |  21 | 2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org