[ https://issues.apache.org/jira/browse/FLINK-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987752#comment-16987752 ]
Roman Grebennikov edited comment on FLINK-14346 at 12/4/19 11:27 AM: --------------------------------------------------------------------- [~arvid heise] current progress: # I've modified StringSerializationBenchmark to have 1b-128b-16kb strings measured. Also StringSerializationBenchmark and PojoSerializationBenchmark are moved to full package, so they won't be included into the regular performance regression run. # Fallback code for writeString is removed, the code modified in a way so it will work with any length of the string. # Added a test to validate binary compatibility between new and old implementation. {noformat} [info] Benchmark (length) (stringType) Mode Cnt Score Error Units [info] StringDeserializerBenchmark.deserializeDefault 1 ascii avgt 50 45.618 ± 0.339 ns/op [info] StringDeserializerBenchmark.deserializeDefault 2 ascii avgt 50 61.348 ± 0.579 ns/op [info] StringDeserializerBenchmark.deserializeDefault 4 ascii avgt 50 88.067 ± 1.058 ns/op [info] StringDeserializerBenchmark.deserializeDefault 8 ascii avgt 50 142.902 ± 1.121 ns/op [info] StringDeserializerBenchmark.deserializeDefault 16 ascii avgt 50 249.181 ± 1.920 ns/op [info] StringDeserializerBenchmark.deserializeDefault 32 ascii avgt 50 466.382 ± 1.502 ns/op [info] StringDeserializerBenchmark.deserializeImproved 1 ascii avgt 50 49.916 ± 0.132 ns/op [info] StringDeserializerBenchmark.deserializeImproved 2 ascii avgt 50 50.278 ± 0.064 ns/op [info] StringDeserializerBenchmark.deserializeImproved 4 ascii avgt 50 50.365 ± 0.129 ns/op [info] StringDeserializerBenchmark.deserializeImproved 8 ascii avgt 50 52.463 ± 0.301 ns/op [info] StringDeserializerBenchmark.deserializeImproved 16 ascii avgt 50 55.711 ± 0.597 ns/op [info] StringDeserializerBenchmark.deserializeImproved 32 ascii avgt 50 65.342 ± 0.555 ns/op [info] StringSerializerBenchmark.serializeDefault 1 ascii avgt 50 31.076 ± 0.192 ns/op [info] StringSerializerBenchmark.serializeDefault 2 ascii avgt 50 31.770 ± 1.811 ns/op [info] StringSerializerBenchmark.serializeDefault 4 ascii avgt 50 39.251 ± 0.189 ns/op [info] StringSerializerBenchmark.serializeDefault 8 ascii avgt 50 57.736 ± 0.253 ns/op [info] StringSerializerBenchmark.serializeDefault 16 ascii avgt 50 94.964 ± 0.514 ns/op [info] StringSerializerBenchmark.serializeDefault 32 ascii avgt 50 168.754 ± 1.416 ns/op [info] StringSerializerBenchmark.serializeImproved 1 ascii avgt 50 30.145 ± 0.156 ns/op [info] StringSerializerBenchmark.serializeImproved 2 ascii avgt 50 30.873 ± 0.274 ns/op [info] StringSerializerBenchmark.serializeImproved 4 ascii avgt 50 31.993 ± 0.276 ns/op [info] StringSerializerBenchmark.serializeImproved 8 ascii avgt 50 46.220 ± 0.211 ns/op [info] StringSerializerBenchmark.serializeImproved 16 ascii avgt 50 50.856 ± 0.826 ns/op [info] StringSerializerBenchmark.serializeImproved 32 ascii avgt 50 63.221 ± 1.130 ns/op{noformat} was (Author: rgrebennikov): [~arvid heise] current progress: # I've modified StringSerializationBenchmark to have 1b-128b-16kb strings measured. Also StringSerializationBenchmark and PojoSerializationBenchmark are moved to full package, so they won't be included into the regular performance regression run. # Fallback code for writeString is removed, the code modified in a way so it will work with any length of the string. {noformat} [info] Benchmark (length) (stringType) Mode Cnt Score Error Units [info] StringDeserializerBenchmark.deserializeDefault 1 ascii avgt 50 45.618 ± 0.339 ns/op [info] StringDeserializerBenchmark.deserializeDefault 2 ascii avgt 50 61.348 ± 0.579 ns/op [info] StringDeserializerBenchmark.deserializeDefault 4 ascii avgt 50 88.067 ± 1.058 ns/op [info] StringDeserializerBenchmark.deserializeDefault 8 ascii avgt 50 142.902 ± 1.121 ns/op [info] StringDeserializerBenchmark.deserializeDefault 16 ascii avgt 50 249.181 ± 1.920 ns/op [info] StringDeserializerBenchmark.deserializeDefault 32 ascii avgt 50 466.382 ± 1.502 ns/op [info] StringDeserializerBenchmark.deserializeImproved 1 ascii avgt 50 49.916 ± 0.132 ns/op [info] StringDeserializerBenchmark.deserializeImproved 2 ascii avgt 50 50.278 ± 0.064 ns/op [info] StringDeserializerBenchmark.deserializeImproved 4 ascii avgt 50 50.365 ± 0.129 ns/op [info] StringDeserializerBenchmark.deserializeImproved 8 ascii avgt 50 52.463 ± 0.301 ns/op [info] StringDeserializerBenchmark.deserializeImproved 16 ascii avgt 50 55.711 ± 0.597 ns/op [info] StringDeserializerBenchmark.deserializeImproved 32 ascii avgt 50 65.342 ± 0.555 ns/op [info] StringSerializerBenchmark.serializeDefault 1 ascii avgt 50 31.076 ± 0.192 ns/op [info] StringSerializerBenchmark.serializeDefault 2 ascii avgt 50 31.770 ± 1.811 ns/op [info] StringSerializerBenchmark.serializeDefault 4 ascii avgt 50 39.251 ± 0.189 ns/op [info] StringSerializerBenchmark.serializeDefault 8 ascii avgt 50 57.736 ± 0.253 ns/op [info] StringSerializerBenchmark.serializeDefault 16 ascii avgt 50 94.964 ± 0.514 ns/op [info] StringSerializerBenchmark.serializeDefault 32 ascii avgt 50 168.754 ± 1.416 ns/op [info] StringSerializerBenchmark.serializeImproved 1 ascii avgt 50 30.145 ± 0.156 ns/op [info] StringSerializerBenchmark.serializeImproved 2 ascii avgt 50 30.873 ± 0.274 ns/op [info] StringSerializerBenchmark.serializeImproved 4 ascii avgt 50 31.993 ± 0.276 ns/op [info] StringSerializerBenchmark.serializeImproved 8 ascii avgt 50 46.220 ± 0.211 ns/op [info] StringSerializerBenchmark.serializeImproved 16 ascii avgt 50 50.856 ± 0.826 ns/op [info] StringSerializerBenchmark.serializeImproved 32 ascii avgt 50 63.221 ± 1.130 ns/op{noformat} > Performance issue with StringSerializer > --------------------------------------- > > Key: FLINK-14346 > URL: https://issues.apache.org/jira/browse/FLINK-14346 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System, Benchmarks > Affects Versions: 1.9.0, 1.10.0, 1.9.1 > Environment: Tested on Flink 1.10.0-SNAPSHOT-20191129-034045-139, > adoptopenjdk 8u222. > Reporter: Roman Grebennikov > Priority: Major > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While doing a performance profiling for our Flink state-heavy streaming job, > we found that quite a significant amount of CPU time is spent inside > StringSerializer writing data to the underlying byte buffer. The hottest part > of the code is the StringValue.writeString function. And replacing the > default StringSerializer with the custom one (to just play with a baseline), > which is just calling DataOutput.writeUTF/readUTF surprisingly yielded to > almost 2x speedup for string serialization. > As writeUTF and writeString have incompatible wire formats, replacing latter > with former is not a good idea in general as it may break > checkpoint/savepoint compatibility. > We also did an early performance analysis of the root cause of this > performance issue, and the main reason of JDK's writeUTF being faster is that > it's code is not writing directly to output stream byte-by-byte, but instead > creating an underlying temporary byte buffer. This yields to a HotSpot almost > perfectly unrolling the main loop, which results in much better data > parallelism. > I've tried to port the ideas from the JVM's implementation of writeUTF back > to StringValue.writeString, and my current result is nice, having quite > significant speedup compared to the current implementation: > {{[info] Benchmark Mode Cnt Score Error Units}} > {{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}} > {{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}} > {{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}} > > {{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the > current upstream implementation in Flink, and the measureNew is the improved > one. }} > > {{The code for the benchmark (and the improved version of the serializer) is > here: [https://github.com/shuttie/flink-string-serializer]}} > > {{Next steps:}} > # {{More benchmarks for non-ascii strings.}} > # {{Benchmarks for long strings.}} > # {{Benchmarks for deserialization.}} > # {{Tests for old-new wire format compatibility.}} > # {{PR to the Flink codebase.}} > {{Is there an interest for this kind of performance improvement?}} -- This message was sent by Atlassian Jira (v8.3.4#803005)