[jira] [Updated] (FLINK-12529) Release record-deserializer buffers timely to improve the efficiency of heap usage on taskmanager

2019-08-16 Thread Piotr Nowojski (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-12529:
---
Affects Version/s: 1.9.0

> Release record-deserializer buffers timely to improve the efficiency of heap 
> usage on taskmanager
> -
>
> Key: FLINK-12529
> URL: https://issues.apache.org/jira/browse/FLINK-12529
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Haibo Sun
>Assignee: Haibo Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In input processors (`StreamInputProcessor` and `StreamTwoInputProcessor`), 
> each input channel has a corresponding record deserializer. Currently, these 
> record deserializers are cleaned up at the end of the task (look at 
> `StreamInputProcessor#cleanup()` and `StreamTwoInputProcessor#cleanup()`). 
> This is not a problem for unbounded streams, but it may reduce the efficiency 
> of heap memory usage on taskmanger when input is bounded stream.
> For example, in case that all inputs are bounded streams, some of them end 
> very early because of the small amount of data, and the other end very late 
> because of the large amount of data, then the buffers of the record 
> deserializers corresponding to the input channels finished early is idle for 
> a long time and no longer used.
> In another case, when both unbounded and bounded streams exist in the inputs, 
> the buffers of the record deserializers corresponding to the bounded stream 
> are idle for ever (no longer used) after the bounded streams are finished. 
> Especially when the record and the parallelism of upstream are large, the 
> total size of `SpanningWrapper#buffer` are very large. The size of 
> `SpanningWrapper#buffer` is allowed to reach up to 5 MB, and if the 
> parallelism of upstream is 100, the maximum total size will reach 500 MB (in 
> our production, there are jobs with the record size up to hundreds of KB and 
> the parallelism of upstream up to 1000).
> Overall, after receiving `EndOfPartitionEvent` from the input channel, the 
> corresponding record deserializer should be cleared immediately to improve 
> the efficiency of heap memory usage on taskmanager.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-12529) Release record-deserializer buffers timely to improve the efficiency of heap usage on taskmanager

2019-05-17 Thread Robert Metzger (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-12529:
---
Component/s: Runtime / Operators

> Release record-deserializer buffers timely to improve the efficiency of heap 
> usage on taskmanager
> -
>
> Key: FLINK-12529
> URL: https://issues.apache.org/jira/browse/FLINK-12529
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Operators
>Affects Versions: 1.8.0
>Reporter: Haibo Sun
>Assignee: Haibo Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In input processors (`StreamInputProcessor` and `StreamTwoInputProcessor`), 
> each input channel has a corresponding record deserializer. Currently, these 
> record deserializers are cleaned up at the end of the task (look at 
> `StreamInputProcessor#cleanup()` and `StreamTwoInputProcessor#cleanup()`). 
> This is not a problem for unbounded streams, but it may reduce the efficiency 
> of heap memory usage on taskmanger when input is bounded stream.
> For example, in case that all inputs are bounded streams, some of them end 
> very early because of the small amount of data, and the other end very late 
> because of the large amount of data, then the buffers of the record 
> deserializers corresponding to the input channels finished early is idle for 
> a long time and no longer used.
> In another case, when both unbounded and bounded streams exist in the inputs, 
> the buffers of the record deserializers corresponding to the bounded stream 
> are idle for ever (no longer used) after the bounded streams are finished. 
> Especially when the record and the parallelism of upstream are large, the 
> total size of `SpanningWrapper#buffer` are very large. The size of 
> `SpanningWrapper#buffer` is allowed to reach up to 5 MB, and if the 
> parallelism of upstream is 100, the maximum total size will reach 500 MB (in 
> our production, there are jobs with the record size up to hundreds of KB and 
> the parallelism of upstream up to 1000).
> Overall, after receiving `EndOfPartitionEvent` from the input channel, the 
> corresponding record deserializer should be cleared immediately to improve 
> the efficiency of heap memory usage on taskmanager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12529) Release record-deserializer buffers timely to improve the efficiency of heap usage on taskmanager

2019-05-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12529:
---
Labels: pull-request-available  (was: )

> Release record-deserializer buffers timely to improve the efficiency of heap 
> usage on taskmanager
> -
>
> Key: FLINK-12529
> URL: https://issues.apache.org/jira/browse/FLINK-12529
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Haibo Sun
>Assignee: Haibo Sun
>Priority: Major
>  Labels: pull-request-available
>
> In input processors (`StreamInputProcessor` and `StreamTwoInputProcessor`), 
> each input channel has a corresponding record deserializer. Currently, these 
> record deserializers are cleaned up at the end of the task (look at 
> `StreamInputProcessor#cleanup()` and `StreamTwoInputProcessor#cleanup()`). 
> This is not a problem for unbounded streams, but it may reduce the efficiency 
> of heap memory usage on taskmanger when input is bounded stream.
> For example, in case that all inputs are bounded streams, some of them end 
> very early because of the small amount of data, and the other end very late 
> because of the large amount of data, then the buffers of the record 
> deserializers corresponding to the input channels finished early is idle for 
> a long time and no longer used.
> In another case, when both unbounded and bounded streams exist in the inputs, 
> the buffers of the record deserializers corresponding to the bounded stream 
> are idle for ever (no longer used) after the bounded streams are finished. 
> Especially when the record and the parallelism of upstream are large, the 
> total size of `SpanningWrapper#buffer` are very large. The size of 
> `SpanningWrapper#buffer` is allowed to reach up to 5 MB, and if the 
> parallelism of upstream is 100, the maximum total size will reach 500 MB (in 
> our production, there are jobs with the record size up to hundreds of KB and 
> the parallelism of upstream up to 1000).
> Overall, after receiving `EndOfPartitionEvent` from the input channel, the 
> corresponding record deserializer should be cleared immediately to improve 
> the efficiency of heap memory usage on taskmanager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12529) Release record-deserializer buffers timely to improve the efficiency of heap usage on taskmanager

2019-05-16 Thread Haibo Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Sun updated FLINK-12529:
--
Summary: Release record-deserializer buffers timely to improve the 
efficiency of heap usage on taskmanager  (was: Release buffers of the record 
deserializers timely to improve the efficiency of heap memory usage on 
taskmanager)

> Release record-deserializer buffers timely to improve the efficiency of heap 
> usage on taskmanager
> -
>
> Key: FLINK-12529
> URL: https://issues.apache.org/jira/browse/FLINK-12529
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Haibo Sun
>Assignee: Haibo Sun
>Priority: Major
>
> In input processors (`StreamInputProcessor` and `StreamTwoInputProcessor`), 
> each input channel has a corresponding record deserializer. Currently, these 
> record deserializers are cleaned up at the end of the task (look at 
> `StreamInputProcessor#cleanup()` and `StreamTwoInputProcessor#cleanup()`). 
> This is not a problem for unbounded streams, but it may reduce the efficiency 
> of heap memory usage on taskmanger when input is bounded stream.
> For example, in case that all inputs are bounded streams, some of them end 
> very early because of the small amount of data, and the other end very late 
> because of the large amount of data, then the buffers of the record 
> deserializers corresponding to the input channels finished early is idle for 
> a long time and no longer used.
> In another case, when both unbounded and bounded streams exist in the inputs, 
> the buffers of the record deserializers corresponding to the bounded stream 
> are idle for ever (no longer used) after the bounded streams are finished. 
> Especially when the record and the parallelism of upstream are large, the 
> total size of `SpanningWrapper#buffer` are very large. The size of 
> `SpanningWrapper#buffer` is allowed to reach up to 5 MB, and if the 
> parallelism of upstream is 100, the maximum total size will reach 500 MB (in 
> our production, there are jobs with the record size up to hundreds of KB and 
> the parallelism of upstream up to 1000).
> Overall, after receiving `EndOfPartitionEvent` from the input channel, the 
> corresponding record deserializer should be cleared immediately to improve 
> the efficiency of heap memory usage on taskmanager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)