[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014030#comment-16014030 ] ASF GitHub Bot commented on FLINK-6589: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3912 > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013765#comment-16013765 ] ASF GitHub Bot commented on FLINK-6589: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3912 Thank you, I'll merge this then. > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013760#comment-16013760 ] ASF GitHub Bot commented on FLINK-6589: --- Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/3912 Yes, there is a tradeoff, and probably the nicest way would be to restore the internal array size as it was when we serialized the original. However, I guess that is too much effort for the effect. Since the sweetspot also depends on the access pattern, I think everything from +1 to some percentage could be justified and as there is no clear "best strategy" I would leave this up to your judgment call. +1 from me then. > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013756#comment-16013756 ] ASF GitHub Bot commented on FLINK-6589: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3912 If the only case we currently know of right now is in the Table API with 1 additional element to be added, then I think this is good for now. I think the improvement should be bounded to specific cases anyway, because the problem will always exist no matter how much extra space we add. For example the `ArrayListSerializer` used for list state could have any arbitrary amounts of elements added. > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013722#comment-16013722 ] ASF GitHub Bot commented on FLINK-6589: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3912 At least in the Table API / SQL operators, we use a ProcessFunction add elements in the `processElement()` method. So, in the case that I want to improve just adds a single element. My rational was that this is the most common use case (I might be wrong here). I would be fine with adding more, but the question is how often is that actually used. > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013717#comment-16013717 ] ASF GitHub Bot commented on FLINK-6589: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3912 Share the same question as Stefan. Otherwise LGTM. > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013716#comment-16013716 ] ASF GitHub Bot commented on FLINK-6589: --- Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/3912 No concern, just curious if there is any reason for the 1 additional. The problem also still exists now for a second element. So is it, in general, unlikely that new elements are added? Or should we add some more additional space, like 10%? > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013714#comment-16013714 ] ASF GitHub Bot commented on FLINK-6589: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3912 Any concerns regarding this change @tzulitai @StephanEwen @StefanRRichter ? > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011678#comment-16011678 ] ASF GitHub Bot commented on FLINK-6589: --- Github user sunjincheng121 commented on the issue: https://github.com/apache/flink/pull/3912 +1 > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState}} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1
[ https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011428#comment-16011428 ] ASF GitHub Bot commented on FLINK-6589: --- GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/3912 [FLINK-6589] [core] Deserialize ArrayList with capacity of size+1 to prevent growth. Several Table API / SQL operators hold records in a `MapState[Long, List[X]]` keyed on a timestamp. When a new record arrives, the corresponding list is fetched and the record is added to the list. Currently, the `ListSerializer` deserializes lists as `ArrayList` with capacity exactly equal to the number of serialized elements. Hence, the `ArrayList` will grow when a new element is added which is an expensive operation. This PR changes the capacity of the deserialized `ArrayList` to #elements + 1 to avoid the growing the list when a single element is added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink listSer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3912.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3912 commit 1fa86b4e282ab0594d8dc768840de4c798e29591 Author: Fabian HueskeDate: 2017-05-15T19:41:51Z [FLINK-6589] [core] Deserialize ArrayList with capacity of size+1 to prevent growth. > ListSerializer should deserialize as ArrayList with size + 1 > > > Key: FLINK-6589 > URL: https://issues.apache.org/jira/browse/FLINK-6589 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.3.0, 1.4.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the > required capacity, i.e., number of serialized objects. > Several operators in the Table API have a {{MapState }} to > store received elements in a list per timestamp. Hence, retrieving the list > and adding one element to the list is a very common operation. > Since the list which is deserialized has no room left for adding elements, > the first insertion into the list will result in growing the {{ArrayList}} > which is expensive. > I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} > with numberOfSerializedElements + 1. This will only marginally increase the > size of the list and allow for one insertion without growing the list. -- This message was sent by Atlassian JIRA (v6.3.15#6346)