[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656831#comment-16656831 ] ASF GitHub Bot commented on FLINK-10412: asfgit closed pull request #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/flink-core/src/main/java/org/apache/flink/util/AbstractID.java b/flink-core/src/main/java/org/apache/flink/util/AbstractID.java index 41083c60783..b7be21a3a57 100644 --- a/flink-core/src/main/java/org/apache/flink/util/AbstractID.java +++ b/flink-core/src/main/java/org/apache/flink/util/AbstractID.java @@ -47,7 +47,7 @@ protected final long lowerPart; /** The memoized value returned by toString(). */ - private String toString; + private transient String toString; // diff --git a/flink-core/src/test/java/org/apache/flink/util/AbstractIDTest.java b/flink-core/src/test/java/org/apache/flink/util/AbstractIDTest.java index 93def48a28d..65bd6b22093 100644 --- a/flink-core/src/test/java/org/apache/flink/util/AbstractIDTest.java +++ b/flink-core/src/test/java/org/apache/flink/util/AbstractIDTest.java @@ -22,7 +22,12 @@ import org.junit.Test; +import java.io.InputStream; + +import static org.hamcrest.Matchers.equalTo; +import static org.hamcrest.Matchers.is; import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThat; import static org.junit.Assert.assertTrue; /** @@ -101,6 +106,35 @@ public void testCompare() throws Exception { assertCompare(id8, id10, -1); } + /** +* FLINK-10412 marks the {@link AbstractID#toString} field as transient. This tests ensures +* that {@link AbstractID} which have been serialized with the toString field can still +* be deserialized. For that purpose the files abstractID-with-toString-field and +* abstractID-with-toString-field-set have been created with the serialized data. +*/ + @Test + public void testOldAbstractIDDeserialization() throws Exception { + final long lowerPart = 42L; + final long upperPart = 1337L; + final AbstractID expectedAbstractId = new AbstractID(lowerPart, upperPart); + + final String resourceName1 = "abstractID-with-toString-field"; + try (final InputStream resourceAsStream = getClass().getClassLoader().getResourceAsStream(resourceName1)) { + final AbstractID deserializedAbstractId = InstantiationUtil.deserializeObject( + resourceAsStream, + getClass().getClassLoader()); + assertThat(deserializedAbstractId, is(equalTo(expectedAbstractId))); + } + + final String resourceName2 = "abstractID-with-toString-field-set"; + try (final InputStream resourceAsStream = getClass().getClassLoader().getResourceAsStream(resourceName2)) { + final AbstractID deserializedAbstractId = InstantiationUtil.deserializeObject( + resourceAsStream, + getClass().getClassLoader()); + assertThat(deserializedAbstractId, is(equalTo(expectedAbstractId))); + } + } + private static void assertCompare(AbstractID a, AbstractID b, int signum) { int cmpAB = a.compareTo(b); int cmpBA = b.compareTo(a); diff --git a/flink-core/src/test/resources/abstractID-with-toString-field b/flink-core/src/test/resources/abstractID-with-toString-field new file mode 100644 index 000..92564b487ce Binary files /dev/null and b/flink-core/src/test/resources/abstractID-with-toString-field differ diff --git a/flink-core/src/test/resources/abstractID-with-toString-field-set b/flink-core/src/test/resources/abstractID-with-toString-field-set new file mode 100644 index 000..bdf9b81b33d Binary files /dev/null and b/flink-core/src/test/resources/abstractID-with-toString-field-set differ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656558#comment-16656558 ] ASF GitHub Bot commented on FLINK-10412: tillrohrmann edited a comment on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-431311897 As of the [Java documentation](https://docs.oracle.com/javase/8/docs/platform/serialization/spec/version.html#a6519), making a nontransient field transient is identical as removing a field from a class. What happens is that an earlier version won't find the field data in the serialized stream and will initialize the field with the default value (`null`). In general this is an incompatible change when relying on Java serialization, however in our case if `toString == null`, then we will simply regenerate the `toString` field when calling `toString`. Therefore, I think it is ok to merge this change. I will proceed with merging. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656557#comment-16656557 ] ASF GitHub Bot commented on FLINK-10412: tillrohrmann commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-431311897 As of this [documentation](https://docs.oracle.com/javase/8/docs/platform/serialization/spec/version.html#a6519), making a nontransient field transient is identical as removing a field from a class. What happens is that an earlier version won't find the field data in the serialized stream and will initialize the field with the default value (`null`). In general this is an incompatible change when relying on Java serialization, however in our case if `toString == null`, then we will simply regenerate the `toString` field when calling `toString`. Therefore, I think it is ok to merge this change. I will proceed with merging. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656468#comment-16656468 ] ASF GitHub Bot commented on FLINK-10412: yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-431293396 @tillrohrmann Anything else that needs to be changed? Or can we merge it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655101#comment-16655101 ] ASF GitHub Bot commented on FLINK-10412: StephanEwen commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-430979442 If we are sure that this does not break serialization that I am good with changing it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650294#comment-16650294 ] ASF GitHub Bot commented on FLINK-10412: yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-429875321 Hi @tillrohrmann and @StephanEwen Any progress about this issue? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641507#comment-16641507 ] ASF GitHub Bot commented on FLINK-10412: yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427753765 Oh, @tillrohrmann I understood. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641501#comment-16641501 ] ASF GitHub Bot commented on FLINK-10412: tillrohrmann edited a comment on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427753255 I created the resource files with the serialized `AbstractID` before applying your changes. That way we still have serialized data of the old `AbstractID` available. The files are `abstractID-with-toString-field` and `abstractID-with-toString-field-set`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641500#comment-16641500 ] ASF GitHub Bot commented on FLINK-10412: tillrohrmann commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427753255 I created the resource files with the serialized `AbstractID` before applying your changes. That way we still have serialized data of the old `AbstractID` available. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641498#comment-16641498 ] ASF GitHub Bot commented on FLINK-10412: yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427753059 > If existing savepoints have an AbstractID serialized in it, will the deserialization fail if that field changes? In order to verify the scenario described by @StephanEwen in the test, don't we need to define two versions of AbstractID? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641496#comment-16641496 ] ASF GitHub Bot commented on FLINK-10412: tillrohrmann commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427752657 Since @StephanEwen initially rejected this PR I would still like to hear his opinion before merging this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641494#comment-16641494 ] ASF GitHub Bot commented on FLINK-10412: tillrohrmann edited a comment on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427752299 This is the commit: https://github.com/tillrohrmann/flink/commit/5179eac7d19b342e161cb188f78e51458075aaff @yanghua This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641493#comment-16641493 ] ASF GitHub Bot commented on FLINK-10412: tillrohrmann commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427752299 This is the branch: https://github.com/tillrohrmann/flink/commit/5179eac7d19b342e161cb188f78e51458075aaff @yanghua This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641491#comment-16641491 ] ASF GitHub Bot commented on FLINK-10412: tillrohrmann commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427751527 Why are the fully qualified names different @yanghua? I actually just tried to do the same and it works. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641490#comment-16641490 ] ASF GitHub Bot commented on FLINK-10412: yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427751092 @tillrohrmann I have tried to do it the way you said it, but because the fully qualified names of the two classes are different, an exception is thrown when the type conversion is done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641480#comment-16641480 ] ASF GitHub Bot commented on FLINK-10412: tillrohrmann commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427747927 I took a look at the different sizes of a serialized `AbstractID` with and without the transient keyword: * old `AbstractID`: 126 bytes * old `AbstractID` with `toString` being called before serialization: 160 bytes * transient `AbstractID`: 93 bytes The issue has been created because ZhuZhu observed scalability issues when deploying very large jobs. I think he has a point given the different object sizes, because the `InputChannelDeploymentDescriptor` consists to a good part of `AbstractIDs`. Given this and that serialization is not a problem, I'm actually in favour of this improvement. What we could do @yanghua is to serialize the old `AbstractID` and store the serialized data as a file in the testing resource folder. Then we could add a test, that the new AbstractID can be deserialized from this file. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640034#comment-16640034 ] ASF GitHub Bot commented on FLINK-10412: yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-427420831 @tillrohrmann and @zentol What's your opinion? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629901#comment-16629901 ] ASF GitHub Bot commented on FLINK-10412: yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-424997312 Hi @StephanEwen , I verified the scene you said and did not lead to failure. Proceed as follows: 1. Serialize the current version of AbstractID to a local file; 2. Deserialize from the local file and convert the deserialized Object type to a new AbstractID (mark the toString field as transient); 3. Compare the fields and hashcode to be consistent; Since this test comes from two separate methods and the AbstractID is modified in the middle, it seems impossible to write a directly running test case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629024#comment-16629024 ] ASF GitHub Bot commented on FLINK-10412: StephanEwen commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-424784058 If existing savepoints have an AbstractID serialized in it, will the deserialization fail if that field changes? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626897#comment-16626897 ] ASF GitHub Bot commented on FLINK-10412: yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-424231570 hi @StephanEwen How can it "break setup" without serializing it? It's just an internal variable that improves the performance of the `toString()` method. Other than that, it hasn't been used elsewhere. And when `toString()` is called again, it will be generated again, and `toString()` sets the behavior of this variable to be idempotent. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626885#comment-16626885 ] ASF GitHub Bot commented on FLINK-10412: StephanEwen commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-424229947 I would like to not do this change. I fear that there are some corner cases in which Java serialization is still used and where this can break setups. The benefit of this change seems not big enough to risk that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626837#comment-16626837 ] ASF GitHub Bot commented on FLINK-10412: yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755#issuecomment-424221442 cc @zentol and @tillrohrmann This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626767#comment-16626767 ] ASF GitHub Bot commented on FLINK-10412: yanghua opened a new pull request #6755: [FLINK-10412] toString field in AbstractID should be transient to avoid been serialized URL: https://github.com/apache/flink/pull/6755 ## What is the purpose of the change *This pull request mark toString field in AbstractID as transient to avoid been serialized * ## Brief change log - *Mark toString field in AbstractID as transient to avoid been serialized* ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / **not documented**) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, pull-request-available, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized
[ https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626712#comment-16626712 ] vinoyang commented on FLINK-10412: -- [~zhuzh] Good catch! Will fix it soon. > toString field in AbstractID should be transient to avoid been serialized > - > > Key: FLINK-10412 > URL: https://issues.apache.org/jira/browse/FLINK-10412 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Zhu Zhu >Assignee: vinoyang >Priority: Major > Labels: deploy,deployment, serialization > > The toString field in AbstractID will be serialized currently, which makes > RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo > larger (50%+). > It adds more pressure to JM memory especially in large scale job scheduling > (1x1 ALL-to-ALL connection). -- This message was sent by Atlassian JIRA (v7.6.3#76005)