yifan-c commented on PR #3815:
URL: https://github.com/apache/cassandra/pull/3815#issuecomment-2608036903

   Out of curiosity, I wrote a micro bench to compare the netty implementation 
and the old (but incorrect) implementation. 
   
   It looks like the Netty implementation starts to become noticeably slower 
when the string is large, i.e. over 2048 characters, using roughly 20~30% more 
time. As long as correctness is concerned, I am +1 on the change. 
   
   ```
        [java] Benchmark                                    (stringLength)  
Mode  Cnt     Score     Error  Units
        [java] Utf8StringLengthBench.getLengthUsingNetty                 8  
avgt   15     5.607 ±   0.019  ns/op
        [java] Utf8StringLengthBench.getLengthUsingNetty                16  
avgt   15     8.646 ±   0.129  ns/op
        [java] Utf8StringLengthBench.getLengthUsingNetty                64  
avgt   15    28.660 ±   0.677  ns/op
        [java] Utf8StringLengthBench.getLengthUsingNetty               512  
avgt   15   272.655 ±   7.848  ns/op
        [java] Utf8StringLengthBench.getLengthUsingNetty              1024  
avgt   15   549.272 ±  18.575  ns/op
        [java] Utf8StringLengthBench.getLengthUsingNetty              2048  
avgt   15  1153.916 ±  59.749  ns/op
        [java] Utf8StringLengthBench.getLengthUsingNetty              4096  
avgt   15  3193.655 ± 109.059  ns/op
        [java] Utf8StringLengthBench.getLengthUsingNetty              8192  
avgt   15  6595.732 ± 161.738  ns/op
        [java] Utf8StringLengthBench.getLengthUsingOldImpl               8  
avgt   15     5.707 ±   0.167  ns/op
        [java] Utf8StringLengthBench.getLengthUsingOldImpl              16  
avgt   15     7.577 ±   0.311  ns/op
        [java] Utf8StringLengthBench.getLengthUsingOldImpl              64  
avgt   15    30.324 ±   0.481  ns/op
        [java] Utf8StringLengthBench.getLengthUsingOldImpl             512  
avgt   15   289.810 ±  12.818  ns/op
        [java] Utf8StringLengthBench.getLengthUsingOldImpl            1024  
avgt   15   567.252 ±  27.273  ns/op
        [java] Utf8StringLengthBench.getLengthUsingOldImpl            2048  
avgt   15  1176.009 ±  64.416  ns/op
        [java] Utf8StringLengthBench.getLengthUsingOldImpl            4096  
avgt   15  2450.714 ± 108.941  ns/op
        [java] Utf8StringLengthBench.getLengthUsingOldImpl            8192  
avgt   15  5323.122 ± 297.016  ns/op
   ```
   
   And here is the bench script. 
   
   ```java
   @BenchmarkMode(Mode.AverageTime)
   @OutputTimeUnit(TimeUnit.NANOSECONDS)
   @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
   @Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
   @Fork(value = 3, jvmArgsAppend = "-Xmx512M")
   @Threads(1)
   @State(Scope.Benchmark)
   public class Utf8StringLengthBench
   {
       private static final Random random = new Random(42);
   
       @Param({ "8", "16", "64", "512", "1024", "2048", "4096", "8192" })
       private int stringLength;
   
       private String utf8;
   
       @Setup
       public void setup() throws NoSuchAlgorithmException
       {
           byte[] bytes = new byte[stringLength];
           random.nextBytes(bytes);
           utf8 = new String(bytes, StandardCharsets.UTF_8);
       }
   
       @Benchmark
       public void getLengthUsingNetty(Blackhole bh)
       {
           bh.consume(TypeSizes.encodedUTF8Length(utf8));
       }
   
       @Benchmark
       public void getLengthUsingOldImpl(Blackhole bh)
       {
           bh.consume(oldUtfLengthImpl(utf8));
       }
   
       private static int oldUtfLengthImpl(String st)
       {
           int strlen = st.length();
           int utflen = 0;
           for (int i = 0; i < strlen; i++)
           {
               int c = st.charAt(i);
               if ((c >= 0x0001) && (c <= 0x007F))
                   utflen++;
               else if (c > 0x07FF)
                   utflen += 3;
               else
                   utflen += 2;
           }
           return utflen;
       }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to