[ 
https://issues.apache.org/jira/browse/FLINK-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987752#comment-16987752
 ] 

Roman Grebennikov edited comment on FLINK-14346 at 12/4/19 11:27 AM:
---------------------------------------------------------------------

[~arvid heise] current progress:
 # I've modified StringSerializationBenchmark to have 1b-128b-16kb strings 
measured. Also StringSerializationBenchmark and PojoSerializationBenchmark are 
moved to full package, so they won't be included into the regular performance 
regression run.
 # Fallback code for writeString is removed, the code modified in a way so it 
will work with any length of the string.
 # Added a test to validate binary compatibility between new and old 
implementation.

{noformat}
[info] Benchmark                                    (length)  (stringType)  
Mode  Cnt   Score   Error  Units
[info] StringDeserializerBenchmark.deserializeDefault          1         ascii  
avgt   50   45.618 ± 0.339  ns/op
[info] StringDeserializerBenchmark.deserializeDefault          2         ascii  
avgt   50   61.348 ± 0.579  ns/op
[info] StringDeserializerBenchmark.deserializeDefault          4         ascii  
avgt   50   88.067 ± 1.058  ns/op
[info] StringDeserializerBenchmark.deserializeDefault          8         ascii  
avgt   50  142.902 ± 1.121  ns/op
[info] StringDeserializerBenchmark.deserializeDefault         16         ascii  
avgt   50  249.181 ± 1.920  ns/op
[info] StringDeserializerBenchmark.deserializeDefault         32         ascii  
avgt   50  466.382 ± 1.502  ns/op
[info] StringDeserializerBenchmark.deserializeImproved         1         ascii  
avgt   50   49.916 ± 0.132  ns/op
[info] StringDeserializerBenchmark.deserializeImproved         2         ascii  
avgt   50   50.278 ± 0.064  ns/op
[info] StringDeserializerBenchmark.deserializeImproved         4         ascii  
avgt   50   50.365 ± 0.129  ns/op
[info] StringDeserializerBenchmark.deserializeImproved         8         ascii  
avgt   50   52.463 ± 0.301  ns/op
[info] StringDeserializerBenchmark.deserializeImproved        16         ascii  
avgt   50   55.711 ± 0.597  ns/op
[info] StringDeserializerBenchmark.deserializeImproved        32         ascii  
avgt   50   65.342 ± 0.555  ns/op
[info] StringSerializerBenchmark.serializeDefault              1         ascii  
avgt   50   31.076 ± 0.192  ns/op
[info] StringSerializerBenchmark.serializeDefault              2         ascii  
avgt   50   31.770 ± 1.811  ns/op
[info] StringSerializerBenchmark.serializeDefault              4         ascii  
avgt   50   39.251 ± 0.189  ns/op
[info] StringSerializerBenchmark.serializeDefault              8         ascii  
avgt   50   57.736 ± 0.253  ns/op
[info] StringSerializerBenchmark.serializeDefault             16         ascii  
avgt   50   94.964 ± 0.514  ns/op
[info] StringSerializerBenchmark.serializeDefault             32         ascii  
avgt   50  168.754 ± 1.416  ns/op
[info] StringSerializerBenchmark.serializeImproved             1         ascii  
avgt   50   30.145 ± 0.156  ns/op
[info] StringSerializerBenchmark.serializeImproved             2         ascii  
avgt   50   30.873 ± 0.274  ns/op
[info] StringSerializerBenchmark.serializeImproved             4         ascii  
avgt   50   31.993 ± 0.276  ns/op
[info] StringSerializerBenchmark.serializeImproved             8         ascii  
avgt   50   46.220 ± 0.211  ns/op
[info] StringSerializerBenchmark.serializeImproved            16         ascii  
avgt   50   50.856 ± 0.826  ns/op
[info] StringSerializerBenchmark.serializeImproved            32         ascii  
avgt   50   63.221 ± 1.130  ns/op{noformat}


was (Author: rgrebennikov):
[~arvid heise] current progress:
 # I've modified StringSerializationBenchmark to have 1b-128b-16kb strings 
measured. Also StringSerializationBenchmark and PojoSerializationBenchmark are 
moved to full package, so they won't be included into the regular performance 
regression run.
 # Fallback code for writeString is removed, the code modified in a way so it 
will work with any length of the string.

{noformat}
[info] Benchmark                                    (length)  (stringType)  
Mode  Cnt   Score   Error  Units
[info] StringDeserializerBenchmark.deserializeDefault          1         ascii  
avgt   50   45.618 ± 0.339  ns/op
[info] StringDeserializerBenchmark.deserializeDefault          2         ascii  
avgt   50   61.348 ± 0.579  ns/op
[info] StringDeserializerBenchmark.deserializeDefault          4         ascii  
avgt   50   88.067 ± 1.058  ns/op
[info] StringDeserializerBenchmark.deserializeDefault          8         ascii  
avgt   50  142.902 ± 1.121  ns/op
[info] StringDeserializerBenchmark.deserializeDefault         16         ascii  
avgt   50  249.181 ± 1.920  ns/op
[info] StringDeserializerBenchmark.deserializeDefault         32         ascii  
avgt   50  466.382 ± 1.502  ns/op
[info] StringDeserializerBenchmark.deserializeImproved         1         ascii  
avgt   50   49.916 ± 0.132  ns/op
[info] StringDeserializerBenchmark.deserializeImproved         2         ascii  
avgt   50   50.278 ± 0.064  ns/op
[info] StringDeserializerBenchmark.deserializeImproved         4         ascii  
avgt   50   50.365 ± 0.129  ns/op
[info] StringDeserializerBenchmark.deserializeImproved         8         ascii  
avgt   50   52.463 ± 0.301  ns/op
[info] StringDeserializerBenchmark.deserializeImproved        16         ascii  
avgt   50   55.711 ± 0.597  ns/op
[info] StringDeserializerBenchmark.deserializeImproved        32         ascii  
avgt   50   65.342 ± 0.555  ns/op
[info] StringSerializerBenchmark.serializeDefault              1         ascii  
avgt   50   31.076 ± 0.192  ns/op
[info] StringSerializerBenchmark.serializeDefault              2         ascii  
avgt   50   31.770 ± 1.811  ns/op
[info] StringSerializerBenchmark.serializeDefault              4         ascii  
avgt   50   39.251 ± 0.189  ns/op
[info] StringSerializerBenchmark.serializeDefault              8         ascii  
avgt   50   57.736 ± 0.253  ns/op
[info] StringSerializerBenchmark.serializeDefault             16         ascii  
avgt   50   94.964 ± 0.514  ns/op
[info] StringSerializerBenchmark.serializeDefault             32         ascii  
avgt   50  168.754 ± 1.416  ns/op
[info] StringSerializerBenchmark.serializeImproved             1         ascii  
avgt   50   30.145 ± 0.156  ns/op
[info] StringSerializerBenchmark.serializeImproved             2         ascii  
avgt   50   30.873 ± 0.274  ns/op
[info] StringSerializerBenchmark.serializeImproved             4         ascii  
avgt   50   31.993 ± 0.276  ns/op
[info] StringSerializerBenchmark.serializeImproved             8         ascii  
avgt   50   46.220 ± 0.211  ns/op
[info] StringSerializerBenchmark.serializeImproved            16         ascii  
avgt   50   50.856 ± 0.826  ns/op
[info] StringSerializerBenchmark.serializeImproved            32         ascii  
avgt   50   63.221 ± 1.130  ns/op{noformat}

> Performance issue with StringSerializer
> ---------------------------------------
>
>                 Key: FLINK-14346
>                 URL: https://issues.apache.org/jira/browse/FLINK-14346
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Type Serialization System, Benchmarks
>    Affects Versions: 1.9.0, 1.10.0, 1.9.1
>         Environment: Tested on Flink 1.10.0-SNAPSHOT-20191129-034045-139, 
> adoptopenjdk 8u222.
>            Reporter: Roman Grebennikov
>            Priority: Major
>              Labels: performance, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While doing a performance profiling for our Flink state-heavy streaming job, 
> we found that quite  a significant amount of CPU time is spent inside 
> StringSerializer writing data to the underlying byte buffer. The hottest part 
> of the code is the StringValue.writeString function. And replacing the 
> default StringSerializer with the custom one (to just play with a baseline), 
> which is just calling DataOutput.writeUTF/readUTF surprisingly yielded to 
> almost 2x speedup for string serialization.
> As writeUTF and writeString have incompatible wire formats, replacing latter 
> with former is not a good idea in general as it may break 
> checkpoint/savepoint compatibility.
> We also did an early performance analysis of the root cause of this 
> performance issue, and the main reason of JDK's writeUTF being faster is that 
> it's code is not writing directly to output stream byte-by-byte, but instead 
> creating an underlying temporary byte buffer. This yields to a HotSpot almost 
> perfectly unrolling the main loop, which results in much better data 
> parallelism.
> I've tried to port the ideas from the JVM's implementation of writeUTF back 
> to StringValue.writeString, and my current result is nice, having quite 
> significant speedup compared to the current implementation:
> {{[info] Benchmark Mode Cnt Score Error Units}}
> {{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}}
> {{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}}
> {{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}}
>  
> {{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the 
> current upstream implementation in Flink, and the measureNew is the improved 
> one. }}
>  
> {{The code for the benchmark (and the improved version of the serializer) is 
> here: [https://github.com/shuttie/flink-string-serializer]}}
>  
> {{Next steps:}}
>  # {{More benchmarks for non-ascii strings.}}
>  # {{Benchmarks for long strings.}}
>  # {{Benchmarks for deserialization.}}
>  # {{Tests for old-new wire format compatibility.}}
>  # {{PR to the Flink codebase.}}
> {{Is there an interest for this kind of performance improvement?}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to