Re: [PR] CASSANDRA-20234: CBUtil serialization of UTF8 does not handle all UTF8 properly [cassandra]

via GitHub Wed, 22 Jan 2025 10:16:25 -0800


dcapwell commented on code in PR #3815:
URL: https://github.com/apache/cassandra/pull/3815#discussion_r1925743003



##########
src/java/org/apache/cassandra/db/TypeSizes.java:
##########
@@ -47,19 +48,7 @@ public static int sizeof(String value)
 
     public static int encodedUTF8Length(String st)
     {
-        int strlen = st.length();
-        int utflen = 0;
-        for (int i = 0; i < strlen; i++)
-        {
-            int c = st.charAt(i);
-            if ((c >= 0x0001) && (c <= 0x007F))
-                utflen++;
-            else if (c > 0x07FF)
-                utflen += 3;
-            else
-                utflen += 2;
-        }
-        return utflen;
+        return ByteBufUtil.utf8Bytes(st);

Review Comment:
   the history i found was that we used to write 0, then write the bytes, then 
go back and update the size header... we then switched to using the above 
method in 
[CASSANDRA-15410](https://issues.apache.org/jira/browse/CASSANDRA-15410)... 
15410 looks like it was focused on ascii and that this specific change was just 
thrown in...  but couldn't tell more from the history



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-20234: CBUtil serialization of UTF8 does not handle all UTF8 properly [cassandra]

Reply via email to