[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2020-03-17 Thread Tzu-Li (Gordon) Tai (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060698#comment-17060698
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-6764:


[~NicoK] This is obsolete now, since we no longer serialize serializers.
Closing this.

> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2020-03-16 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060376#comment-17060376
 ] 

Nico Kruber commented on FLINK-6764:


[~tzulitai] any updates?

> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2018-03-01 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383266#comment-16383266
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-6764:


Yes, moving to 1.6.0.

> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2018-03-01 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382140#comment-16382140
 ] 

Aljoscha Krettek commented on FLINK-6764:
-

[~tzulitai] Did we decide to move this to 1.6.0? Or at least make it 
non-blocking?

> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2018-01-13 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325062#comment-16325062
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-6764:


It seems like we forgot completely about adding this to 1.4.0.

Since the previous conclusion was that we do not want to change serialization 
formats across minor releases, we can't include this for 1.4.1.
We should make sure we include this change in 1.5.0 (as soon as possible), as 
serialization formats will affect us a long way ahead.

Marking this as a blocker for 1.5.0.

> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062547#comment-16062547
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/4026
  
@zentol there was a recent offline discussion regarding the savepoint 
formats that will probably void this change. Will close this PR for now.


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062548#comment-16062548
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user tzulitai closed the pull request at:

https://github.com/apache/flink/pull/4026


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-06-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062359#comment-16062359
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4026
  
@tzulitai What's the state of this PR?


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032839#comment-16032839
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/4026
  
Thanks a lot for the helpful review @StefanRRichter. I've addressed your 
comments.

This PR also needs one more additional change for backwards compatibility 
with 1.3.0 before I'll merge it (will ping you on the follow-up).


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031237#comment-16031237
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/4026#discussion_r119370073
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java
 ---
@@ -795,13 +782,15 @@ public PojoSerializerConfigSnapshot(
this.nonRegisteredSubclassesToSerializerConfigSnapshots 
=

Preconditions.checkNotNull(nonRegisteredSubclassesToSerializerConfigSnapshots);
 
-   this.ignoreTypeSerializerSerialization = 
ignoreTypeSerializerSerialization;
+   this.excludeSerializers = excludeSerializers;
}
 
@Override
public void write(DataOutputView out) throws IOException {
super.write(out);
 
+   out.writeBoolean(excludeSerializers);
+
// --- write fields and their serializers, in order
 
out.writeInt(fieldToSerializerConfigSnapshot.size());
--- End diff --

It might not be possible to fully deduplicate this code, because the first 
map is a `Map>`. 
I'll still try to do deduplicate as must as possible.


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031216#comment-16031216
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/4026#discussion_r119364516
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/typeutils/CompositeTypeSerializerConfigSnapshot.java
 ---
@@ -110,4 +123,20 @@ public boolean equals(Object obj) {
public int hashCode() {
return nestedSerializersAndConfigs.hashCode();
}
+
+   private Map buildSerializerIndices() {
--- End diff --

I see that the concrete implementation is always an identity hash map. 
since it behaves different from what you expect by a normal map (using 
equals/hashcode) and in that sense violates LSP, I suggest you wrap it in an 
own class called `SerializerIndex`.


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031202#comment-16031202
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/4026
  
Overall, very nice work! I just had few minor comments. After they are 
addressed, this is good to merge. +1


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031200#comment-16031200
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/4026
  
Overall, very nice work! I just had few minor comments. After they are 
addressed, this is good to merge. +1


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031183#comment-16031183
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/4026#discussion_r119359879
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/typeutils/CompositeTypeSerializerConfigSnapshot.java
 ---
@@ -110,4 +123,20 @@ public boolean equals(Object obj) {
public int hashCode() {
return nestedSerializersAndConfigs.hashCode();
}
+
+   private Map buildSerializerIndices() {
--- End diff --

While the idea makes sense, I am not sure if we can rely on all serializer 
having a correct implementation of equals and hash code that will do what we 
want. We might be better of with collection of `serializer -> index` pairs and 
using equality for deduplication, but even there equals might be tricky. While 
it is ok to accidentally write serializer twice, it should never happen that a 
serializer gets de-duplicated by accident.


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031168#comment-16031168
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/4026#discussion_r119358003
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java
 ---
@@ -795,13 +782,15 @@ public PojoSerializerConfigSnapshot(
this.nonRegisteredSubclassesToSerializerConfigSnapshots 
=

Preconditions.checkNotNull(nonRegisteredSubclassesToSerializerConfigSnapshots);
 
-   this.ignoreTypeSerializerSerialization = 
ignoreTypeSerializerSerialization;
+   this.excludeSerializers = excludeSerializers;
}
 
@Override
public void write(DataOutputView out) throws IOException {
super.write(out);
 
+   out.writeBoolean(excludeSerializers);
+
// --- write fields and their serializers, in order
 
out.writeInt(fieldToSerializerConfigSnapshot.size());
--- End diff --

The same might hold for reads.


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031166#comment-16031166
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/4026#discussion_r119357811
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java
 ---
@@ -795,13 +782,15 @@ public PojoSerializerConfigSnapshot(
this.nonRegisteredSubclassesToSerializerConfigSnapshots 
=

Preconditions.checkNotNull(nonRegisteredSubclassesToSerializerConfigSnapshots);
 
-   this.ignoreTypeSerializerSerialization = 
ignoreTypeSerializerSerialization;
+   this.excludeSerializers = excludeSerializers;
}
 
@Override
public void write(DataOutputView out) throws IOException {
super.write(out);
 
+   out.writeBoolean(excludeSerializers);
+
// --- write fields and their serializers, in order
 
out.writeInt(fieldToSerializerConfigSnapshot.size());
--- End diff --

What I just noticed is that this is almost 3x duplicated code. I suggest to 
create a private method that writes `Map>` and call it 3 times.


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031117#comment-16031117
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/4026#discussion_r119348235
  
--- Diff: 
flink-core/src/test/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializerTest.java
 ---
@@ -606,6 +665,45 @@ public void 
testSerializerSerializationFailureResilience() throws Exception {

verifyPojoSerializerConfigSnapshotWithEmptyNestedSerializers(config, 
deserializedConfig);
}
 
+   @SuppressWarnings("unchecked")
+   @Test
+   public void testNonDuplicatedStatelessSerializersInConfigSnapshot() 
throws IOException {
+   PojoSerializer pojoSerializer 
= (PojoSerializer)
+   
TypeExtractor.getForClass(TestUserClassWithSameFieldTypes.class).createSerializer(new
 ExecutionConfig());
+
+   // snapshot configuration and serialize to bytes
+   
PojoSerializer.PojoSerializerConfigSnapshot 
config = pojoSerializer.snapshotConfiguration();
+   byte[] serializedConfig;
+   try (
+   ByteArrayOutputStream out = new 
ByteArrayOutputStream()) {
+   
TypeSerializerSerializationUtil.writeSerializerConfigSnapshot(new 
DataOutputViewStreamWrapper(out), config);
+   serializedConfig = out.toByteArray();
+   }
+
+   // read configuration from bytes
+   PojoSerializer.PojoSerializerConfigSnapshot 
deserializedConfig;
+   try(ByteArrayInputStream in = new 
ByteArrayInputStream(serializedConfig)) {
--- End diff --

There is also a `ByteArrayInputStreamWithPos`.


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031116#comment-16031116
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/4026#discussion_r119348073
  
--- Diff: 
flink-core/src/test/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializerTest.java
 ---
@@ -606,6 +665,45 @@ public void 
testSerializerSerializationFailureResilience() throws Exception {

verifyPojoSerializerConfigSnapshotWithEmptyNestedSerializers(config, 
deserializedConfig);
}
 
+   @SuppressWarnings("unchecked")
+   @Test
+   public void testNonDuplicatedStatelessSerializersInConfigSnapshot() 
throws IOException {
+   PojoSerializer pojoSerializer 
= (PojoSerializer)
+   
TypeExtractor.getForClass(TestUserClassWithSameFieldTypes.class).createSerializer(new
 ExecutionConfig());
+
+   // snapshot configuration and serialize to bytes
+   
PojoSerializer.PojoSerializerConfigSnapshot 
config = pojoSerializer.snapshotConfiguration();
+   byte[] serializedConfig;
+   try (
+   ByteArrayOutputStream out = new 
ByteArrayOutputStream()) {
--- End diff --

You could use Flink's `ByteArrayOutputStreamWithPos`, which can be more 
efficient because it does not use `synchronized` and you can access the write 
position and the internal array to immediately write it to the stream instead 
of creating a copy in the `out.toByteArray()` part.


> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6764) Deduplicate stateless TypeSerializers when serializing composite TypeSerializers

2017-05-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031060#comment-16031060
 ] 

ASF GitHub Bot commented on FLINK-6764:
---

GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/4026

[FLINK-6764] Deduplicate stateless serializers in checkpoints

This PR is based on #4014, so only the last commit 39ffe7e is relevant.

Prior to this PR, we would write multiple instances of the same serializer 
even if it was stateless. This commit changes that by first writing a 
serializer index at the head of the stream, and only write the index of a 
serializer when one needs to be written. The index map is built using 
`IdentitiyHashMap`s, so that stateful serializers are considered as separate 
entries in the index.

## Test

New tests are added to `PojoSerializerTest` and `SerializationProxiesTest` 
to test that stateless serializers are not duplicated on restore.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-6764

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4026.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4026


commit 852d7b569f02b1783c8410e111455d02127d0b5d
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-29T18:07:04Z

[FLINK-6763] [core] Make serialization of composite serializer configs more 
efficient

This commit affects the serialization formats of configuration snapshots
of composite serializers, most notably the PojoSerializer, as well as
others such as MapSerializer, GenericArraySerializer, TupleSerializer,
etc. It also affects the serialization formats of the
OperatorBackendSerializationProxy and KeyedBackendSerializationProxy.

Prior to this commit, whenever we write a serializer and its config
snapshot into a checkpoint, we always write the start offset and end
offset of the serializer bytes, effectively indexing every serializer
and its config. This required buffering the whole list of serializer and
config snapshot pairs when writing the checkpoint.

This commit changes this to be more efficient by just writing the length
of the serializer bytes prior to writing the serializer. This allows
lesser buffering for the writes.

commit 19b2f6abfd780d2456a0a7f7bb5dc0de3001ee78
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-30T14:33:36Z

[FLINK-6763] Include excludeSerializer flag in PojoSerializerConfigSnapshot

commit f002db90e80bbec4641c3baa4501d69b546b71b9
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-30T17:07:54Z

[FLINK-6763] Include excludeSerializers flag in 
CompositeTypeSerializerConfigSnapshot and state backend serialization proxies

commit 39ffe7ea1fbe289090ee72d97f2ccef3cdec049f
Author: Tzu-Li (Gordon) Tai 
Date:   2017-05-31T12:00:45Z

[FLINK-6474] Deduplicate stateless serializers from checkpoints

Prior to this commit, we would write multiple instances of the same
serializer even if it was stateless. This commit changes that by first
writing a serializer index at the head of the stream, and only write the
index of a serializer when one needs to be written. The index map is
built using IdentitiyHashMaps, so that stateful serializers are
considered as separate entries in the index.




> Deduplicate stateless TypeSerializers when serializing composite 
> TypeSerializers
> 
>
> Key: FLINK-6764
> URL: https://issues.apache.org/jira/browse/FLINK-6764
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>
> Composite type serializer, such as the {{PojoSerializer}}, could be improved 
> by deduplicating stateless {{TypeSerializer}} when being serialized. This 
> would decrease their serialization size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)