[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834460#comment-15834460 ] ASF GitHub Bot commented on FLINK-5118: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3106 > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826057#comment-15826057 ] ASF GitHub Bot commented on FLINK-5118: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/3106 @StephanEwen I'll add the default initialization of ```counter``` while merging. Typing the counters to a ```SimpleCounter``` would affect a lot of other classes though (especially since we can extend this to the ```OperatorIOMetricGroup```), so I would like to do that as part of another issue. > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823968#comment-15823968 ] ASF GitHub Bot commented on FLINK-5118: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3106 +1 to this approach. Some suggestions for small performance improvements, not specific to this case, but also applicable to other cases: - It may make sense to type the counters at that level to the "SimpleCounter", if we anyways provide them. The TaskIOMetricGroup could type its counters to "SimpleCounter" as well. Since that class it not final, and future mocks/checks can be implemented on that class as well - We can make sure that the field `counter` is always initialized by initially assigning a standalone SimpleCounter. That way we could drop the null checks in the code. > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823662#comment-15823662 ] ASF GitHub Bot commented on FLINK-5118: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/3106 The code checks for null since there is no *technical* contract that the returned value is null. They aren't strictly necessary, and are only meant to guard against programming errors in the metrics system. Using a ```NullCounter``` would indeed do the same, would however introduce effectively dead code. > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823272#comment-15823272 ] ASF GitHub Bot commented on FLINK-5118: --- Github user xhumanoid commented on the issue: https://github.com/apache/flink/pull/3106 @zentol I asked because you always check on null when you try writing to Counter or is it prevent uninitialized state? > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823265#comment-15823265 ] ASF GitHub Bot commented on FLINK-5118: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/3106 @xhumanoid The metrics returned by the TaskIOMetricGroup can't actually be null, so I wouldn't put too much thought into dealing with that case. > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823250#comment-15823250 ] ASF GitHub Bot commented on FLINK-5118: --- Github user xhumanoid commented on the issue: https://github.com/apache/flink/pull/3106 @zentol what do you think about remove if (numBytesOut != null) { and replace numBytesOut = metrics.getNumBytesOutCounter(); with + if (metrics.getNumBytesOutCounter() != null) { +numBytesOut = metrics.getNumBytesOutCounter(); + } else { +numBytesOut = new NullCounter(); + } where NullCounter have empty implementation for every method, prof: we do null check in one place, because sometime we may forget to do it cons: sometimes we broke devirtualization and inlining for counter.inc(..) method > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821625#comment-15821625 ] ASF GitHub Bot commented on FLINK-5118: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/3106 @uce Could you take another look? I moved the bytesOut counter into the RecordWriter. > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821239#comment-15821239 ] ASF GitHub Bot commented on FLINK-5118: --- Github user uce commented on the issue: https://github.com/apache/flink/pull/3106 Good catch. I was wondering whether it is better to count in a consistent way, too, e.g. count the size of each buffer or each record on both A and B (now we have per record on the output side and per buffer on the input side). > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821232#comment-15821232 ] ASF GitHub Bot commented on FLINK-5118: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/3106#discussion_r95812448 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/serialization/SpanningRecordSerializer.java --- @@ -99,7 +99,7 @@ public SerializationResult addRecord(T record) throws IOException { this.lengthBuffer.putInt(0, len); if (numBytesOut != null) { - numBytesOut.inc(len); + numBytesOut.inc(len + 4); --- End diff -- I think this warrants both a comment and a test. > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5118) Inconsistent records sent/received metrics
[ https://issues.apache.org/jira/browse/FLINK-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821166#comment-15821166 ] Chesnay Schepler commented on FLINK-5118: - PR at https://github.com/apache/flink/pull/3106 > Inconsistent records sent/received metrics > -- > > Key: FLINK-5118 > URL: https://issues.apache.org/jira/browse/FLINK-5118 > Project: Flink > Issue Type: Bug > Components: Metrics, Webfrontend >Affects Versions: 1.2.0, 1.3.0 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler > Fix For: 1.2.0, 1.3.0 > > > In 1.2-SNAPSHOT running a large scale job you see that the counts for > send/received records are inconsistent, e.g. in a simple word count job we > see more received records/bytes than we see sent. This is a regression from > 1.1 where everything works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)