[jira] [Commented] (FLINK-7310) always use HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196305#comment-16196305 ] ASF GitHub Bot commented on FLINK-7310: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/4445 > always use HybridMemorySegment > -- > > Key: FLINK-7310 > URL: https://issues.apache.org/jira/browse/FLINK-7310 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > Fix For: 1.4.0 > > > For future changes to the network buffers (sending our own off-heap buffers > through to netty), we cannot use {{HeapMemorySegment}} anymore and need to > rely on {{HybridMemorySegment}} instead. > We should thus drop any code that loads the {{HeapMemorySegment}} (it is > still available if needed) in favour of the {{HybridMemorySegment}} which is > able to work on both heap and off-heap memory. > FYI: For the performance penalty of this change compared to using > {{HeapMemorySegment}} alone, see this interesting blob article (from 2015): > https://flink.apache.org/news/2015/09/16/off-heap-memory.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7310) always use HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195032#comment-16195032 ] ASF GitHub Bot commented on FLINK-7310: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/4445 Agree with @KurtYoung. Merging this... > always use HybridMemorySegment > -- > > Key: FLINK-7310 > URL: https://issues.apache.org/jira/browse/FLINK-7310 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > > For future changes to the network buffers (sending our own off-heap buffers > through to netty), we cannot use {{HeapMemorySegment}} anymore and need to > rely on {{HybridMemorySegment}} instead. > We should thus drop any code that loads the {{HeapMemorySegment}} (it is > still available if needed) in favour of the {{HybridMemorySegment}} which is > able to work on both heap and off-heap memory. > FYI: For the performance penalty of this change compared to using > {{HeapMemorySegment}} alone, see this interesting blob article (from 2015): > https://flink.apache.org/news/2015/09/16/off-heap-memory.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7310) always use HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154711#comment-16154711 ] ASF GitHub Bot commented on FLINK-7310: --- Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/4445 I would bet on deserialization for it. And why sorter suffers more regression than hash join is that sorter will cause more deserializations during compare records than hash join. Despite the regression we will face, i think it's still worthy since we can avoid an extra copy from network to runtime. It's better if we can take the extra copy into account during benchmark, but it's ok we don't have it. +1 to merge this. > always use HybridMemorySegment > -- > > Key: FLINK-7310 > URL: https://issues.apache.org/jira/browse/FLINK-7310 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > > For future changes to the network buffers (sending our own off-heap buffers > through to netty), we cannot use {{HeapMemorySegment}} anymore and need to > rely on {{HybridMemorySegment}} instead. > We should thus drop any code that loads the {{HeapMemorySegment}} (it is > still available if needed) in favour of the {{HybridMemorySegment}} which is > able to work on both heap and off-heap memory. > FYI: For the performance penalty of this change compared to using > {{HeapMemorySegment}} alone, see this interesting blob article (from 2015): > https://flink.apache.org/news/2015/09/16/off-heap-memory.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7310) always use HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143910#comment-16143910 ] ASF GitHub Bot commented on FLINK-7310: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/4445 Thanks! I am currently trying to pinpoint what part of the code exactly suffers most from the regression. If that is for example specific to the microbenchmark, we can merge this without concern... > always use HybridMemorySegment > -- > > Key: FLINK-7310 > URL: https://issues.apache.org/jira/browse/FLINK-7310 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > > For future changes to the network buffers (sending our own off-heap buffers > through to netty), we cannot use {{HeapMemorySegment}} anymore and need to > rely on {{HybridMemorySegment}} instead. > We should thus drop any code that loads the {{HeapMemorySegment}} (it is > still available if needed) in favour of the {{HybridMemorySegment}} which is > able to work on both heap and off-heap memory. > FYI: For the performance penalty of this change compared to using > {{HeapMemorySegment}} alone, see this interesting blob article (from 2015): > https://flink.apache.org/news/2015/09/16/off-heap-memory.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7310) always use HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123110#comment-16123110 ] ASF GitHub Bot commented on FLINK-7310: --- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/4445 FYI: I just rebased this PR onto current `master` to make this mergable and support further extensions > always use HybridMemorySegment > -- > > Key: FLINK-7310 > URL: https://issues.apache.org/jira/browse/FLINK-7310 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > > For future changes to the network buffers (sending our own off-heap buffers > through to netty), we cannot use {{HeapMemorySegment}} anymore and need to > rely on {{HybridMemorySegment}} instead. > We should thus drop any code that loads the {{HeapMemorySegment}} (it is > still available if needed) in favour of the {{HybridMemorySegment}} which is > able to work on both heap and off-heap memory. > FYI: For the performance penalty of this change compared to using > {{HeapMemorySegment}} alone, see this interesting blob article (from 2015): > https://flink.apache.org/news/2015/09/16/off-heap-memory.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7310) always use HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116530#comment-16116530 ] ASF GitHub Bot commented on FLINK-7310: --- Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/4445 I think the implementation of the change is good, but the performance impact seems noticeable, at least in some cases. I think the additional bounds checking in the hybrid case shows. Out of curiosity I deactivated the index bounds checks and this closed all gaps between `HeapMemorySegment` and `HybridMemorySegment` in the benchmarks that @NicoK mentioned. If @StephanEwen has no concerns about the performance regression, I think this could be merged. > always use HybridMemorySegment > -- > > Key: FLINK-7310 > URL: https://issues.apache.org/jira/browse/FLINK-7310 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > > For future changes to the network buffers (sending our own off-heap buffers > through to netty), we cannot use {{HeapMemorySegment}} anymore and need to > rely on {{HybridMemorySegment}} instead. > We should thus drop any code that loads the {{HeapMemorySegment}} (it is > still available if needed) in favour of the {{HybridMemorySegment}} which is > able to work on both heap and off-heap memory. > FYI: For the performance penalty of this change compared to using > {{HeapMemorySegment}} alone, see this interesting blob article (from 2015): > https://flink.apache.org/news/2015/09/16/off-heap-memory.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7310) always use HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114322#comment-16114322 ] ASF GitHub Bot commented on FLINK-7310: --- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/4445 in a non-exhaustive mini benchmark, I ran `HashVsSortMiniBenchmark` and got the following results: # Best out of 5 (in ms) Test | `master` | `Flink-7310` | -- | -- Hash Build First | 5541 | 5629 Sort-Merge | 6194 | 6816 Hash Build | 3587 | 3629 # All results ## `master` Test | 1 | 2 | 3 | 4 | 5 | - | - | - | - | - Hash Build First | 5772.0 | 5541.0 | 5707.0 | 5733.0 | 5751.0 Sort-Merge | 6704.0 | 7146.0 | 6194.0 | 6915.0 | 6445.0 Hash Build Second | 3834.0 | 3805.0 | 3811.0 | 3587.0 | 3563.0 ## `FLINK-7310` Test | 1 | 2 | 3 | 4 | 5 | - | - | - | - | - Hash Build First | 5816.0 | 5770.0 | 5629.0 | 5656.0 | 5745.0 Sort-Merge | 7284.0 | 7233.0 | 6816.0 | 6861.0 | 7218.0 Hash Build Second | 3802.0 | 3836.0 | 3629.0 | 3782.0 | 3804.0 > always use HybridMemorySegment > -- > > Key: FLINK-7310 > URL: https://issues.apache.org/jira/browse/FLINK-7310 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > > For future changes to the network buffers (sending our own off-heap buffers > through to netty), we cannot use {{HeapMemorySegment}} anymore and need to > rely on {{HybridMemorySegment}} instead. > We should thus drop any code that loads the {{HeapMemorySegment}} (it is > still available if needed) in favour of the {{HybridMemorySegment}} which is > able to work on both heap and off-heap memory. > FYI: For the performance penalty of this change compared to using > {{HeapMemorySegment}} alone, see this interesting blob article (from 2015): > https://flink.apache.org/news/2015/09/16/off-heap-memory.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7310) always use HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108870#comment-16108870 ] ASF GitHub Bot commented on FLINK-7310: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/4445 These changes look good to me! There is in fact a potential performance impact of this change. It would be cool to get an understanding of the potential performance impact of only using the HybridMemorySegment now. We could run something like a Hash Join Performance test with key/value pairs of String keys (which are the most performance sensitive to serialize / deserialize with individual byte operations) and see if this has a measurable impact there. > always use HybridMemorySegment > -- > > Key: FLINK-7310 > URL: https://issues.apache.org/jira/browse/FLINK-7310 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > > For future changes to the network buffers (sending our own off-heap buffers > through to netty), we cannot use {{HeapMemorySegment}} anymore and need to > rely on {{HybridMemorySegment}} instead. > We should thus drop any code that loads the {{HeapMemorySegment}} (it is > still available if needed) in favour of the {{HybridMemorySegment}} which is > able to work on both heap and off-heap memory. > FYI: For the performance penalty of this change compared to using > {{HeapMemorySegment}} alone, see this interesting blob article (from 2015): > https://flink.apache.org/news/2015/09/16/off-heap-memory.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7310) always use HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108578#comment-16108578 ] ASF GitHub Bot commented on FLINK-7310: --- GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/4445 [FLINK-7310][core] always use the HybridMemorySegment ## What is the purpose of the change Since we'd like to use our own off-heap buffers for network communication, we cannot use `HeapMemorySegment` anymore and need to rely on `HybridMemorySegment`. We thus drop any code that loads the `HeapMemorySegment` (it is still available if needed) in favour of the `HybridMemorySegment` which is able to work on both heap and off-heap memory. For the performance penalty of this change compared to using `HeapMemorySegment` alone, see this interesting blob article (from 2015): https://flink.apache.org/news/2015/09/16/off-heap-memory.html ## Brief change log - drop any use of the `HeapMemorySegment` (however, for now, keep the class and its factory) - integrate `HybridMemorySegmentFactory` into `MemorySegmentFactory` (with hard-coded use of `HybridMemorySegment`) ## Verifying this change This change is already covered by existing tests, such as: memory-backend specific tests under `flink/core/memory` or actually all other tests running programs on Flink. Actually, the `HybridMemorySegment` was not really tested much in integration tests so far because most tests used on-heap memory and thus `HeapMemorySegment`. Since we now only use `HybridMemorySegment`, we do add a lot of tests for this. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (yes) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-7310 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4445.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4445 commit c62e793712effbfb53ea6442b5d714a68081f7ec Author: Nico KruberDate: 2017-07-31T10:06:14Z [hotfix] fix some typos commit d3c4e231a96b6ae133576a74646294749ab3809a Author: Nico Kruber Date: 2017-07-31T12:18:42Z [FLINK-7310][core] always use the HybridMemorySegment Since we'd like to use our own off-heap buffers for network communication, we cannot use HeapMemorySegment anymore and need to rely on HybridMemorySegment. We thus drop any code that loads the HeapMemorySegment (it is still available if needed) in favour of the HybridMemorySegment which is able to work on both heap and off-heap memory. For the performance penalty of this change compared to using HeapMemorySegment alone, see this interesting blob article (from 2015): https://flink.apache.org/news/2015/09/16/off-heap-memory.html > always use HybridMemorySegment > -- > > Key: FLINK-7310 > URL: https://issues.apache.org/jira/browse/FLINK-7310 > Project: Flink > Issue Type: Sub-task > Components: Core >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Assignee: Nico Kruber > > For future changes to the network buffers (sending our own off-heap buffers > through to netty), we cannot use {{HeapMemorySegment}} anymore and need to > rely on {{HybridMemorySegment}} instead. > We should thus drop any code that loads the {{HeapMemorySegment}} (it is > still available if needed) in favour of the {{HybridMemorySegment}} which is > able to work on both heap and off-heap memory. > FYI: For the performance penalty of this change compared to using > {{HeapMemorySegment}} alone, see this interesting blob article (from 2015): > https://flink.apache.org/news/2015/09/16/off-heap-memory.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)