yifan-c commented on PR #3815:
URL: https://github.com/apache/cassandra/pull/3815#issuecomment-2608036903
Out of curiosity, I wrote a micro bench to compare the netty implementation
and the old (but incorrect) implementation.
It looks like the Netty implementation starts to become noticeably slower
when the string is large, i.e. over 2048 characters, using roughly 20~30% more
time. As long as correctness is concerned, I am +1 on the change.
```
[java] Benchmark (stringLength)
Mode Cnt Score Error Units
[java] Utf8StringLengthBench.getLengthUsingNetty 8
avgt 15 5.607 ± 0.019 ns/op
[java] Utf8StringLengthBench.getLengthUsingNetty 16
avgt 15 8.646 ± 0.129 ns/op
[java] Utf8StringLengthBench.getLengthUsingNetty 64
avgt 15 28.660 ± 0.677 ns/op
[java] Utf8StringLengthBench.getLengthUsingNetty 512
avgt 15 272.655 ± 7.848 ns/op
[java] Utf8StringLengthBench.getLengthUsingNetty 1024
avgt 15 549.272 ± 18.575 ns/op
[java] Utf8StringLengthBench.getLengthUsingNetty 2048
avgt 15 1153.916 ± 59.749 ns/op
[java] Utf8StringLengthBench.getLengthUsingNetty 4096
avgt 15 3193.655 ± 109.059 ns/op
[java] Utf8StringLengthBench.getLengthUsingNetty 8192
avgt 15 6595.732 ± 161.738 ns/op
[java] Utf8StringLengthBench.getLengthUsingOldImpl 8
avgt 15 5.707 ± 0.167 ns/op
[java] Utf8StringLengthBench.getLengthUsingOldImpl 16
avgt 15 7.577 ± 0.311 ns/op
[java] Utf8StringLengthBench.getLengthUsingOldImpl 64
avgt 15 30.324 ± 0.481 ns/op
[java] Utf8StringLengthBench.getLengthUsingOldImpl 512
avgt 15 289.810 ± 12.818 ns/op
[java] Utf8StringLengthBench.getLengthUsingOldImpl 1024
avgt 15 567.252 ± 27.273 ns/op
[java] Utf8StringLengthBench.getLengthUsingOldImpl 2048
avgt 15 1176.009 ± 64.416 ns/op
[java] Utf8StringLengthBench.getLengthUsingOldImpl 4096
avgt 15 2450.714 ± 108.941 ns/op
[java] Utf8StringLengthBench.getLengthUsingOldImpl 8192
avgt 15 5323.122 ± 297.016 ns/op
```
And here is the bench script.
```java
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(value = 3, jvmArgsAppend = "-Xmx512M")
@Threads(1)
@State(Scope.Benchmark)
public class Utf8StringLengthBench
{
private static final Random random = new Random(42);
@Param({ "8", "16", "64", "512", "1024", "2048", "4096", "8192" })
private int stringLength;
private String utf8;
@Setup
public void setup() throws NoSuchAlgorithmException
{
byte[] bytes = new byte[stringLength];
random.nextBytes(bytes);
utf8 = new String(bytes, StandardCharsets.UTF_8);
}
@Benchmark
public void getLengthUsingNetty(Blackhole bh)
{
bh.consume(TypeSizes.encodedUTF8Length(utf8));
}
@Benchmark
public void getLengthUsingOldImpl(Blackhole bh)
{
bh.consume(oldUtfLengthImpl(utf8));
}
private static int oldUtfLengthImpl(String st)
{
int strlen = st.length();
int utflen = 0;
for (int i = 0; i < strlen; i++)
{
int c = st.charAt(i);
if ((c >= 0x0001) && (c <= 0x007F))
utflen++;
else if (c > 0x07FF)
utflen += 3;
else
utflen += 2;
}
return utflen;
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]