[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060698#comment-17060698 ] Tzu-Li (Gordon) Tai commented on FLINK-6764: [~NicoK] This is obsolete now, since we no longer serialize serializers. Closing this. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai >Priority: Major > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060376#comment-17060376 ] Nico Kruber commented on FLINK-6764: [~tzulitai] any updates? > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai >Priority: Major > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383266#comment-16383266 ] Tzu-Li (Gordon) Tai commented on FLINK-6764: Yes, moving to 1.6.0. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Fix For: 1.6.0 > > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382140#comment-16382140 ] Aljoscha Krettek commented on FLINK-6764: - [~tzulitai] Did we decide to move this to 1.6.0? Or at least make it non-blocking? > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Fix For: 1.5.0 > > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325062#comment-16325062 ] Tzu-Li (Gordon) Tai commented on FLINK-6764: It seems like we forgot completely about adding this to 1.4.0. Since the previous conclusion was that we do not want to change serialization formats across minor releases, we can't include this for 1.4.1. We should make sure we include this change in 1.5.0 (as soon as possible), as serialization formats will affect us a long way ahead. Marking this as a blocker for 1.5.0. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Fix For: 1.5.0 > > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062547#comment-16062547 ] ASF GitHub Bot commented on FLINK-6764: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/4026 @zentol there was a recent offline discussion regarding the savepoint formats that will probably void this change. Will close this PR for now. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062548#comment-16062548 ] ASF GitHub Bot commented on FLINK-6764: --- Github user tzulitai closed the pull request at: https://github.com/apache/flink/pull/4026 > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062359#comment-16062359 ] ASF GitHub Bot commented on FLINK-6764: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/4026 @tzulitai What's the state of this PR? > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032839#comment-16032839 ] ASF GitHub Bot commented on FLINK-6764: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/4026 Thanks a lot for the helpful review @StefanRRichter. I've addressed your comments. This PR also needs one more additional change for backwards compatibility with 1.3.0 before I'll merge it (will ping you on the follow-up). > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031237#comment-16031237 ] ASF GitHub Bot commented on FLINK-6764: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/4026#discussion_r119370073 --- Diff: flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java --- @@ -795,13 +782,15 @@ public PojoSerializerConfigSnapshot( this.nonRegisteredSubclassesToSerializerConfigSnapshots = Preconditions.checkNotNull(nonRegisteredSubclassesToSerializerConfigSnapshots); - this.ignoreTypeSerializerSerialization = ignoreTypeSerializerSerialization; + this.excludeSerializers = excludeSerializers; } @Override public void write(DataOutputView out) throws IOException { super.write(out); + out.writeBoolean(excludeSerializers); + // --- write fields and their serializers, in order out.writeInt(fieldToSerializerConfigSnapshot.size()); --- End diff -- It might not be possible to fully deduplicate this code, because the first map is a `Map>`. I'll still try to do deduplicate as must as possible. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031216#comment-16031216 ] ASF GitHub Bot commented on FLINK-6764: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/4026#discussion_r119364516 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/typeutils/CompositeTypeSerializerConfigSnapshot.java --- @@ -110,4 +123,20 @@ public boolean equals(Object obj) { public int hashCode() { return nestedSerializersAndConfigs.hashCode(); } + + private MapbuildSerializerIndices() { --- End diff -- I see that the concrete implementation is always an identity hash map. since it behaves different from what you expect by a normal map (using equals/hashcode) and in that sense violates LSP, I suggest you wrap it in an own class called `SerializerIndex`. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031202#comment-16031202 ] ASF GitHub Bot commented on FLINK-6764: --- Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/4026 Overall, very nice work! I just had few minor comments. After they are addressed, this is good to merge. +1 > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031200#comment-16031200 ] ASF GitHub Bot commented on FLINK-6764: --- Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/4026 Overall, very nice work! I just had few minor comments. After they are addressed, this is good to merge. +1 > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031183#comment-16031183 ] ASF GitHub Bot commented on FLINK-6764: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/4026#discussion_r119359879 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/typeutils/CompositeTypeSerializerConfigSnapshot.java --- @@ -110,4 +123,20 @@ public boolean equals(Object obj) { public int hashCode() { return nestedSerializersAndConfigs.hashCode(); } + + private MapbuildSerializerIndices() { --- End diff -- While the idea makes sense, I am not sure if we can rely on all serializer having a correct implementation of equals and hash code that will do what we want. We might be better of with collection of `serializer -> index` pairs and using equality for deduplication, but even there equals might be tricky. While it is ok to accidentally write serializer twice, it should never happen that a serializer gets de-duplicated by accident. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031168#comment-16031168 ] ASF GitHub Bot commented on FLINK-6764: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/4026#discussion_r119358003 --- Diff: flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java --- @@ -795,13 +782,15 @@ public PojoSerializerConfigSnapshot( this.nonRegisteredSubclassesToSerializerConfigSnapshots = Preconditions.checkNotNull(nonRegisteredSubclassesToSerializerConfigSnapshots); - this.ignoreTypeSerializerSerialization = ignoreTypeSerializerSerialization; + this.excludeSerializers = excludeSerializers; } @Override public void write(DataOutputView out) throws IOException { super.write(out); + out.writeBoolean(excludeSerializers); + // --- write fields and their serializers, in order out.writeInt(fieldToSerializerConfigSnapshot.size()); --- End diff -- The same might hold for reads. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031166#comment-16031166 ] ASF GitHub Bot commented on FLINK-6764: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/4026#discussion_r119357811 --- Diff: flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java --- @@ -795,13 +782,15 @@ public PojoSerializerConfigSnapshot( this.nonRegisteredSubclassesToSerializerConfigSnapshots = Preconditions.checkNotNull(nonRegisteredSubclassesToSerializerConfigSnapshots); - this.ignoreTypeSerializerSerialization = ignoreTypeSerializerSerialization; + this.excludeSerializers = excludeSerializers; } @Override public void write(DataOutputView out) throws IOException { super.write(out); + out.writeBoolean(excludeSerializers); + // --- write fields and their serializers, in order out.writeInt(fieldToSerializerConfigSnapshot.size()); --- End diff -- What I just noticed is that this is almost 3x duplicated code. I suggest to create a private method that writes `Map>` and call it 3 times. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031117#comment-16031117 ] ASF GitHub Bot commented on FLINK-6764: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/4026#discussion_r119348235 --- Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializerTest.java --- @@ -606,6 +665,45 @@ public void testSerializerSerializationFailureResilience() throws Exception { verifyPojoSerializerConfigSnapshotWithEmptyNestedSerializers(config, deserializedConfig); } + @SuppressWarnings("unchecked") + @Test + public void testNonDuplicatedStatelessSerializersInConfigSnapshot() throws IOException { + PojoSerializer pojoSerializer = (PojoSerializer) + TypeExtractor.getForClass(TestUserClassWithSameFieldTypes.class).createSerializer(new ExecutionConfig()); + + // snapshot configuration and serialize to bytes + PojoSerializer.PojoSerializerConfigSnapshot config = pojoSerializer.snapshotConfiguration(); + byte[] serializedConfig; + try ( + ByteArrayOutputStream out = new ByteArrayOutputStream()) { + TypeSerializerSerializationUtil.writeSerializerConfigSnapshot(new DataOutputViewStreamWrapper(out), config); + serializedConfig = out.toByteArray(); + } + + // read configuration from bytes + PojoSerializer.PojoSerializerConfigSnapshot deserializedConfig; + try(ByteArrayInputStream in = new ByteArrayInputStream(serializedConfig)) { --- End diff -- There is also a `ByteArrayInputStreamWithPos`. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031116#comment-16031116 ] ASF GitHub Bot commented on FLINK-6764: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/4026#discussion_r119348073 --- Diff: flink-core/src/test/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializerTest.java --- @@ -606,6 +665,45 @@ public void testSerializerSerializationFailureResilience() throws Exception { verifyPojoSerializerConfigSnapshotWithEmptyNestedSerializers(config, deserializedConfig); } + @SuppressWarnings("unchecked") + @Test + public void testNonDuplicatedStatelessSerializersInConfigSnapshot() throws IOException { + PojoSerializer pojoSerializer = (PojoSerializer) + TypeExtractor.getForClass(TestUserClassWithSameFieldTypes.class).createSerializer(new ExecutionConfig()); + + // snapshot configuration and serialize to bytes + PojoSerializer.PojoSerializerConfigSnapshot config = pojoSerializer.snapshotConfiguration(); + byte[] serializedConfig; + try ( + ByteArrayOutputStream out = new ByteArrayOutputStream()) { --- End diff -- You could use Flink's `ByteArrayOutputStreamWithPos`, which can be more efficient because it does not use `synchronized` and you can access the write position and the internal array to immediately write it to the stream instead of creating a copy in the `out.toByteArray()` part. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers
[ https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031060#comment-16031060 ] ASF GitHub Bot commented on FLINK-6764: --- GitHub user tzulitai opened a pull request: https://github.com/apache/flink/pull/4026 [FLINK-6764] Deduplicate stateless serializers in checkpoints This PR is based on #4014, so only the last commit 39ffe7e is relevant. Prior to this PR, we would write multiple instances of the same serializer even if it was stateless. This commit changes that by first writing a serializer index at the head of the stream, and only write the index of a serializer when one needs to be written. The index map is built using `IdentitiyHashMap`s, so that stateful serializers are considered as separate entries in the index. ## Test New tests are added to `PojoSerializerTest` and `SerializationProxiesTest` to test that stateless serializers are not duplicated on restore. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tzulitai/flink FLINK-6764 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4026.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4026 commit 852d7b569f02b1783c8410e111455d02127d0b5d Author: Tzu-Li (Gordon) TaiDate: 2017-05-29T18:07:04Z [FLINK-6763] [core] Make serialization of composite serializer configs more efficient This commit affects the serialization formats of configuration snapshots of composite serializers, most notably the PojoSerializer, as well as others such as MapSerializer, GenericArraySerializer, TupleSerializer, etc. It also affects the serialization formats of the OperatorBackendSerializationProxy and KeyedBackendSerializationProxy. Prior to this commit, whenever we write a serializer and its config snapshot into a checkpoint, we always write the start offset and end offset of the serializer bytes, effectively indexing every serializer and its config. This required buffering the whole list of serializer and config snapshot pairs when writing the checkpoint. This commit changes this to be more efficient by just writing the length of the serializer bytes prior to writing the serializer. This allows lesser buffering for the writes. commit 19b2f6abfd780d2456a0a7f7bb5dc0de3001ee78 Author: Tzu-Li (Gordon) Tai Date: 2017-05-30T14:33:36Z [FLINK-6763] Include excludeSerializer flag in PojoSerializerConfigSnapshot commit f002db90e80bbec4641c3baa4501d69b546b71b9 Author: Tzu-Li (Gordon) Tai Date: 2017-05-30T17:07:54Z [FLINK-6763] Include excludeSerializers flag in CompositeTypeSerializerConfigSnapshot and state backend serialization proxies commit 39ffe7ea1fbe289090ee72d97f2ccef3cdec049f Author: Tzu-Li (Gordon) Tai Date: 2017-05-31T12:00:45Z [FLINK-6474] Deduplicate stateless serializers from checkpoints Prior to this commit, we would write multiple instances of the same serializer even if it was stateless. This commit changes that by first writing a serializer index at the head of the stream, and only write the index of a serializer when one needs to be written. The index map is built using IdentitiyHashMaps, so that stateful serializers are considered as separate entries in the index. > Deduplicate stateless TypeSerializers when serializing composite > TypeSerializers > > > Key: FLINK-6764 > URL: https://issues.apache.org/jira/browse/FLINK-6764 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai > > Composite type serializer, such as the {{PojoSerializer}}, could be improved > by deduplicating stateless {{TypeSerializer}} when being serialized. This > would decrease their serialization size. -- This message was sent by Atlassian JIRA (v6.3.15#6346)