[jira] [Updated] (FLINK-9913) Improve output serialization only once in RecordWriter
[ https://issues.apache.org/jira/browse/FLINK-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhijiang updated FLINK-9913: Issue Type: Sub-task (was: Improvement) Parent: FLINK-10745 > Improve output serialization only once in RecordWriter > -- > > Key: FLINK-9913 > URL: https://issues.apache.org/jira/browse/FLINK-9913 > Project: Flink > Issue Type: Sub-task > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.6.0 >Reporter: zhijiang >Assignee: zhijiang >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > Currently the {{RecordWriter}} emits output into multi channels via > {{ChannelSelector}} or broadcasts output to all channels directly. Each > channel has a separate {{RecordSerializer}} for serializing outputs, that > means the output will be serialized as many times as the number of selected > channels. > As we know, data serialization is a high cost operation, so we can get good > benefits by improving the serialization only once. > I would suggest the following changes for realizing it. > # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the > channels. > # The output is serialized into the intermediate data buffer only once for > different channels. > # The intermediate serialization results are copied into different > {{BufferBuilder}}s for different channels. > An additional benefit by using a single serializer for all channels is that > we get a potentially significant reduction on heap space overhead from fewer > intermediate serialization buffers (only once we got over 5MiB, these buffers > were pruned back to 128B!). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9913) Improve output serialization only once in RecordWriter
[ https://issues.apache.org/jira/browse/FLINK-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhijiang updated FLINK-9913: Affects Version/s: 1.5.0 1.5.1 1.5.2 1.5.3 > Improve output serialization only once in RecordWriter > -- > > Key: FLINK-9913 > URL: https://issues.apache.org/jira/browse/FLINK-9913 > Project: Flink > Issue Type: Improvement > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.6.0 >Reporter: zhijiang >Assignee: zhijiang >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > Currently the {{RecordWriter}} emits output into multi channels via > {{ChannelSelector}} or broadcasts output to all channels directly. Each > channel has a separate {{RecordSerializer}} for serializing outputs, that > means the output will be serialized as many times as the number of selected > channels. > As we know, data serialization is a high cost operation, so we can get good > benefits by improving the serialization only once. > I would suggest the following changes for realizing it. > # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the > channels. > # The output is serialized into the intermediate data buffer only once for > different channels. > # The intermediate serialization results are copied into different > {{BufferBuilder}}s for different channels. > An additional benefit by using a single serializer for all channels is that > we get a potentially significant reduction on heap space overhead from fewer > intermediate serialization buffers (only once we got over 5MiB, these buffers > were pruned back to 128B!). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9913) Improve output serialization only once in RecordWriter
[ https://issues.apache.org/jira/browse/FLINK-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-9913: --- Description: Currently the {{RecordWriter}} emits output into multi channels via {{ChannelSelector}} or broadcasts output to all channels directly. Each channel has a separate {{RecordSerializer}} for serializing outputs, that means the output will be serialized as many times as the number of selected channels. As we know, data serialization is a high cost operation, so we can get good benefits by improving the serialization only once. I would suggest the following changes for realizing it. # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the channels. # The output is serialized into the intermediate data buffer only once for different channels. # The intermediate serialization results are copied into different {{BufferBuilder}}s for different channels. An additional benefit by using a single serializer for all channels is that we get a potentially significant reduction on heap space overhead from fewer intermediate serialization buffers (only once we got over 5MiB, these buffers were pruned back to 128B!). was: Currently the {{RecordWriter}} emits output into multi channels via {{ChannelSelector}} or broadcasts output to all channels directly. Each channel has a separate {{RecordSerializer}} for serializing outputs, that means the output will be serialized as many times as the number of selected channels. As we know, data serialization is a high cost operation, so we can get good benefits by improving the serialization only once. I would suggest the following changes for realizing it. # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the channels. # The output is serialized into the intermediate data buffer only once for different channels. # The intermediate serialization results are copied into different {{BufferBuilder}}s for different channels. > Improve output serialization only once in RecordWriter > -- > > Key: FLINK-9913 > URL: https://issues.apache.org/jira/browse/FLINK-9913 > Project: Flink > Issue Type: Improvement > Components: Network >Affects Versions: 1.6.0 >Reporter: zhijiang >Assignee: zhijiang >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > Currently the {{RecordWriter}} emits output into multi channels via > {{ChannelSelector}} or broadcasts output to all channels directly. Each > channel has a separate {{RecordSerializer}} for serializing outputs, that > means the output will be serialized as many times as the number of selected > channels. > As we know, data serialization is a high cost operation, so we can get good > benefits by improving the serialization only once. > I would suggest the following changes for realizing it. > # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the > channels. > # The output is serialized into the intermediate data buffer only once for > different channels. > # The intermediate serialization results are copied into different > {{BufferBuilder}}s for different channels. > An additional benefit by using a single serializer for all channels is that > we get a potentially significant reduction on heap space overhead from fewer > intermediate serialization buffers (only once we got over 5MiB, these buffers > were pruned back to 128B!). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9913) Improve output serialization only once in RecordWriter
[ https://issues.apache.org/jira/browse/FLINK-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber updated FLINK-9913: --- Priority: Major (was: Minor) > Improve output serialization only once in RecordWriter > -- > > Key: FLINK-9913 > URL: https://issues.apache.org/jira/browse/FLINK-9913 > Project: Flink > Issue Type: Improvement > Components: Network >Affects Versions: 1.6.0 >Reporter: zhijiang >Assignee: zhijiang >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > Currently the {{RecordWriter}} emits output into multi channels via > {{ChannelSelector}} or broadcasts output to all channels directly. Each > channel has a separate {{RecordSerializer}} for serializing outputs, that > means the output will be serialized as many times as the number of selected > channels. > As we know, data serialization is a high cost operation, so we can get good > benefits by improving the serialization only once. > I would suggest the following changes for realizing it. > # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the > channels. > # The output is serialized into the intermediate data buffer only once for > different channels. > # The intermediate serialization results are copied into different > {{BufferBuilder}}s for different channels. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9913) Improve output serialization only once in RecordWriter
[ https://issues.apache.org/jira/browse/FLINK-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9913: - Fix Version/s: (was: 1.6.0) 1.7.0 > Improve output serialization only once in RecordWriter > -- > > Key: FLINK-9913 > URL: https://issues.apache.org/jira/browse/FLINK-9913 > Project: Flink > Issue Type: Improvement > Components: Network >Affects Versions: 1.6.0 >Reporter: zhijiang >Assignee: zhijiang >Priority: Minor > Labels: pull-request-available > Fix For: 1.7.0 > > > Currently the {{RecordWriter}} emits output into multi channels via > {{ChannelSelector}} or broadcasts output to all channels directly. Each > channel has a separate {{RecordSerializer}} for serializing outputs, that > means the output will be serialized as many times as the number of selected > channels. > As we know, data serialization is a high cost operation, so we can get good > benefits by improving the serialization only once. > I would suggest the following changes for realizing it. > # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the > channels. > # The output is serialized into the intermediate data buffer only once for > different channels. > # The intermediate serialization results are copied into different > {{BufferBuilder}}s for different channels. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9913) Improve output serialization only once in RecordWriter
[ https://issues.apache.org/jira/browse/FLINK-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-9913: -- Labels: pull-request-available (was: ) > Improve output serialization only once in RecordWriter > -- > > Key: FLINK-9913 > URL: https://issues.apache.org/jira/browse/FLINK-9913 > Project: Flink > Issue Type: Improvement > Components: Network >Affects Versions: 1.6.0 >Reporter: zhijiang >Assignee: zhijiang >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Currently the {{RecordWriter}} emits output into multi channels via > {{ChannelSelector}} or broadcasts output to all channels directly. Each > channel has a separate {{RecordSerializer}} for serializing outputs, that > means the output will be serialized as many times as the number of selected > channels. > As we know, data serialization is a high cost operation, so we can get good > benefits by improving the serialization only once. > I would suggest the following changes for realizing it. > # Only one {{RecordSerializer}} is created in {{RecordWriter}} for all the > channels. > # The output is serialized into the intermediate data buffer only once for > different channels. > # The intermediate serialization results are copied into different > {{BufferBuilder}}s for different channels. -- This message was sent by Atlassian JIRA (v7.6.3#76005)