[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656831#comment-16656831
 ] 

ASF GitHub Bot commented on FLINK-10412:


asfgit closed pull request #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/flink-core/src/main/java/org/apache/flink/util/AbstractID.java 
b/flink-core/src/main/java/org/apache/flink/util/AbstractID.java
index 41083c60783..b7be21a3a57 100644
--- a/flink-core/src/main/java/org/apache/flink/util/AbstractID.java
+++ b/flink-core/src/main/java/org/apache/flink/util/AbstractID.java
@@ -47,7 +47,7 @@
protected final long lowerPart;
 
/** The memoized value returned by toString(). */
-   private String toString;
+   private transient String toString;
 
// 

 
diff --git a/flink-core/src/test/java/org/apache/flink/util/AbstractIDTest.java 
b/flink-core/src/test/java/org/apache/flink/util/AbstractIDTest.java
index 93def48a28d..65bd6b22093 100644
--- a/flink-core/src/test/java/org/apache/flink/util/AbstractIDTest.java
+++ b/flink-core/src/test/java/org/apache/flink/util/AbstractIDTest.java
@@ -22,7 +22,12 @@
 
 import org.junit.Test;
 
+import java.io.InputStream;
+
+import static org.hamcrest.Matchers.equalTo;
+import static org.hamcrest.Matchers.is;
 import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertThat;
 import static org.junit.Assert.assertTrue;
 
 /**
@@ -101,6 +106,35 @@ public void testCompare() throws Exception {
assertCompare(id8, id10, -1);
}
 
+   /**
+* FLINK-10412 marks the {@link AbstractID#toString} field as 
transient. This tests ensures
+* that {@link AbstractID} which have been serialized with the toString 
field can still
+* be deserialized. For that purpose the files 
abstractID-with-toString-field and
+* abstractID-with-toString-field-set have been created with the 
serialized data.
+*/
+   @Test
+   public void testOldAbstractIDDeserialization() throws Exception {
+   final long lowerPart = 42L;
+   final long upperPart = 1337L;
+   final AbstractID expectedAbstractId = new AbstractID(lowerPart, 
upperPart);
+
+   final String resourceName1 = "abstractID-with-toString-field";
+   try (final InputStream resourceAsStream = 
getClass().getClassLoader().getResourceAsStream(resourceName1)) {
+   final AbstractID deserializedAbstractId = 
InstantiationUtil.deserializeObject(
+   resourceAsStream,
+   getClass().getClassLoader());
+   assertThat(deserializedAbstractId, 
is(equalTo(expectedAbstractId)));
+   }
+
+   final String resourceName2 = 
"abstractID-with-toString-field-set";
+   try (final InputStream resourceAsStream = 
getClass().getClassLoader().getResourceAsStream(resourceName2)) {
+   final AbstractID deserializedAbstractId = 
InstantiationUtil.deserializeObject(
+   resourceAsStream,
+   getClass().getClassLoader());
+   assertThat(deserializedAbstractId, 
is(equalTo(expectedAbstractId)));
+   }
+   }
+
private static void assertCompare(AbstractID a, AbstractID b, int 
signum) {
int cmpAB = a.compareTo(b);
int cmpBA = b.compareTo(a);
diff --git a/flink-core/src/test/resources/abstractID-with-toString-field 
b/flink-core/src/test/resources/abstractID-with-toString-field
new file mode 100644
index 000..92564b487ce
Binary files /dev/null and 
b/flink-core/src/test/resources/abstractID-with-toString-field differ
diff --git a/flink-core/src/test/resources/abstractID-with-toString-field-set 
b/flink-core/src/test/resources/abstractID-with-toString-field-set
new file mode 100644
index 000..bdf9b81b33d
Binary files /dev/null and 
b/flink-core/src/test/resources/abstractID-with-toString-field-set differ


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been

[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656558#comment-16656558
 ] 

ASF GitHub Bot commented on FLINK-10412:


tillrohrmann edited a comment on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-431311897
 
 
   As of the [Java 
documentation](https://docs.oracle.com/javase/8/docs/platform/serialization/spec/version.html#a6519),
 making a nontransient field transient is identical as removing a field from a 
class. What happens is that an earlier version won't find the field data in the 
serialized stream and will initialize the field with the default value 
(`null`). In general this is an incompatible change when relying on Java 
serialization, however in our case if `toString == null`, then we will simply 
regenerate the `toString` field when calling `toString`. Therefore, I think it 
is ok to merge this change. 
   
   I will proceed with merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656557#comment-16656557
 ] 

ASF GitHub Bot commented on FLINK-10412:


tillrohrmann commented on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-431311897
 
 
   As of this 
[documentation](https://docs.oracle.com/javase/8/docs/platform/serialization/spec/version.html#a6519),
 making a nontransient field transient is identical as removing a field from a 
class. What happens is that an earlier version won't find the field data in the 
serialized stream and will initialize the field with the default value 
(`null`). In general this is an incompatible change when relying on Java 
serialization, however in our case if `toString == null`, then we will simply 
regenerate the `toString` field when calling `toString`. Therefore, I think it 
is ok to merge this change. 
   
   I will proceed with merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656468#comment-16656468
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-431293396
 
 
   @tillrohrmann Anything else that needs to be changed? Or can we merge it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655101#comment-16655101
 ] 

ASF GitHub Bot commented on FLINK-10412:


StephanEwen commented on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-430979442
 
 
   If we are sure that this does not break serialization that I am good with 
changing it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650294#comment-16650294
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-429875321
 
 
   Hi @tillrohrmann and @StephanEwen Any progress about this issue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641507#comment-16641507
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427753765
 
 
   Oh, @tillrohrmann I understood.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641501#comment-16641501
 ] 

ASF GitHub Bot commented on FLINK-10412:


tillrohrmann edited a comment on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427753255
 
 
   I created the resource files with the serialized `AbstractID` before 
applying your changes. That way we still have serialized data of the old 
`AbstractID` available. The files are `abstractID-with-toString-field` and 
`abstractID-with-toString-field-set`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641500#comment-16641500
 ] 

ASF GitHub Bot commented on FLINK-10412:


tillrohrmann commented on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427753255
 
 
   I created the resource files with the serialized `AbstractID` before 
applying your changes. That way we still have serialized data of the old 
`AbstractID` available.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641498#comment-16641498
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427753059
 
 
   
   > If existing savepoints have an AbstractID serialized in it, will the 
deserialization fail if that field changes?
   
   In order to verify the scenario described by @StephanEwen  in the test, 
don't we need to define two versions of AbstractID?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641496#comment-16641496
 ] 

ASF GitHub Bot commented on FLINK-10412:


tillrohrmann commented on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427752657
 
 
   Since @StephanEwen initially rejected this PR I would still like to hear his 
opinion before merging this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641494#comment-16641494
 ] 

ASF GitHub Bot commented on FLINK-10412:


tillrohrmann edited a comment on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427752299
 
 
   This is the commit: 
https://github.com/tillrohrmann/flink/commit/5179eac7d19b342e161cb188f78e51458075aaff
 @yanghua 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641493#comment-16641493
 ] 

ASF GitHub Bot commented on FLINK-10412:


tillrohrmann commented on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427752299
 
 
   This is the branch: 
https://github.com/tillrohrmann/flink/commit/5179eac7d19b342e161cb188f78e51458075aaff
 @yanghua 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641491#comment-16641491
 ] 

ASF GitHub Bot commented on FLINK-10412:


tillrohrmann commented on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427751527
 
 
   Why are the fully qualified names different @yanghua? I actually just tried 
to do the same and it works.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641490#comment-16641490
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427751092
 
 
   @tillrohrmann I have tried to do it the way you said it, but because the 
fully qualified names of the two classes are different, an exception is thrown 
when the type conversion is done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641480#comment-16641480
 ] 

ASF GitHub Bot commented on FLINK-10412:


tillrohrmann commented on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427747927
 
 
   I took a look at the different sizes of a serialized `AbstractID` with and 
without the transient keyword:
   
   * old `AbstractID`: 126 bytes
   * old `AbstractID` with `toString` being called before serialization: 160 
bytes
   * transient `AbstractID`: 93 bytes
   
   The issue has been created because ZhuZhu observed scalability issues when 
deploying very large jobs. I think he has a point given the different object 
sizes, because the `InputChannelDeploymentDescriptor` consists to a good part 
of `AbstractIDs`. Given this and that serialization is not a problem, I'm 
actually in favour of this improvement.
   
   What we could do @yanghua is to serialize the old `AbstractID` and store the 
serialized data as a file in the testing resource folder. Then we could add a 
test, that the new AbstractID can be deserialized from this file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-10-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640034#comment-16640034
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-427420831
 
 
   @tillrohrmann and @zentol What's your opinion?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-09-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629901#comment-16629901
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-424997312
 
 
   Hi @StephanEwen , I verified the scene you said and did not lead to failure. 
Proceed as follows:
   
   1. Serialize the current version of AbstractID to a local file;
   2. Deserialize from the local file and convert the deserialized Object type 
to a new AbstractID (mark the toString field as transient);
   3. Compare the fields and hashcode to be consistent;
   
   Since this test comes from two separate methods and the AbstractID is 
modified in the middle, it seems impossible to write a directly running test 
case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-09-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629024#comment-16629024
 ] 

ASF GitHub Bot commented on FLINK-10412:


StephanEwen commented on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-424784058
 
 
   If existing savepoints have an AbstractID serialized in it, will the 
deserialization fail if that field changes?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-09-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626897#comment-16626897
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-424231570
 
 
   hi @StephanEwen How can it "break setup" without serializing it? It's just 
an internal variable that improves the performance of the `toString()` method. 
Other than that, it hasn't been used elsewhere. And when `toString()` is called 
again, it will be generated again, and `toString()` sets the behavior of this 
variable to be idempotent.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-09-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626885#comment-16626885
 ] 

ASF GitHub Bot commented on FLINK-10412:


StephanEwen commented on issue #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-424229947
 
 
   I would like to not do this change.
   
   I fear that there are some corner cases in which Java serialization is still 
used and where this can break setups. The benefit of this change seems not big 
enough to risk that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-09-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626837#comment-16626837
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua commented on issue #6755: [FLINK-10412] toString field in AbstractID 
should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755#issuecomment-424221442
 
 
   cc @zentol and @tillrohrmann 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-09-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626767#comment-16626767
 ] 

ASF GitHub Bot commented on FLINK-10412:


yanghua opened a new pull request #6755: [FLINK-10412] toString field in 
AbstractID should be transient to avoid been serialized
URL: https://github.com/apache/flink/pull/6755
 
 
   ## What is the purpose of the change
   
   *This pull request mark toString field in AbstractID as transient to avoid 
been serialized *
   
   
   ## Brief change log
   
 - *Mark toString field in AbstractID as transient to avoid been serialized*
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ **not documented**)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, pull-request-available, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10412) toString field in AbstractID should be transient to avoid been serialized

2018-09-24 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626712#comment-16626712
 ] 

vinoyang commented on FLINK-10412:
--

[~zhuzh] Good catch! Will fix it soon.

> toString field in AbstractID should be transient to avoid been serialized
> -
>
> Key: FLINK-10412
> URL: https://issues.apache.org/jira/browse/FLINK-10412
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.7.0
>Reporter: Zhu Zhu
>Assignee: vinoyang
>Priority: Major
>  Labels: deploy,deployment, serialization
>
> The toString field in AbstractID will be serialized currently, which makes 
> RPC messages body like InputChannelDeploymentDescriptor and PartitionInfo 
> larger (50%+).
> It adds more pressure to JM memory especially in large scale job scheduling 
> (1x1 ALL-to-ALL connection).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)