[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014030#comment-16014030
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3912


> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013765#comment-16013765
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/3912
  
Thank you, I'll merge this then.


> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013760#comment-16013760
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/3912
  
Yes, there is a tradeoff, and probably the nicest way would be to restore 
the internal array size as it was when we serialized the original. However, I 
guess that is too much effort for the effect. Since the sweetspot also depends 
on the access pattern, I think everything from +1 to some percentage could be 
justified and as there is no clear "best strategy" I would leave this up to 
your judgment call.

+1 from me then.


> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013756#comment-16013756
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/3912
  
If the only case we currently know of right now is in the Table API with 1 
additional element to be added, then I think this is good for now.

I think the improvement should be bounded to specific cases anyway, because 
the problem will always exist no matter how much extra space we add. For 
example the `ArrayListSerializer` used for list state could have any arbitrary 
amounts of elements added.


> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013722#comment-16013722
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/3912
  
At least in the Table API / SQL operators, we use a ProcessFunction add 
elements in the `processElement()` method. So, in the case that I want to 
improve just adds a single element.

My rational was that this is the most common use case (I might be wrong 
here). I would be fine with adding more, but the question is how often is that 
actually used.


> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013717#comment-16013717
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/3912
  
Share the same question as Stefan. Otherwise LGTM.


> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013716#comment-16013716
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/3912
  
No concern, just curious if there is any reason for the 1 additional. The 
problem also still exists now for a second element. So is it, in general, 
unlikely that new elements are added? Or should we add some more additional 
space, like 10%?


> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013714#comment-16013714
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/3912
  
Any concerns regarding this change @tzulitai @StephanEwen @StefanRRichter ?


> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011678#comment-16011678
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

Github user sunjincheng121 commented on the issue:

https://github.com/apache/flink/pull/3912
  
+1


> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6589) ListSerializer should deserialize as ArrayList with size + 1

2017-05-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011428#comment-16011428
 ] 

ASF GitHub Bot commented on FLINK-6589:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/3912

[FLINK-6589] [core] Deserialize ArrayList with capacity of size+1 to 
prevent growth.

Several Table API / SQL operators hold records in a `MapState[Long, 
List[X]]` keyed on a timestamp.
When a new record arrives, the corresponding list is fetched and the record 
is added to the list. 

Currently, the `ListSerializer` deserializes lists as `ArrayList` with 
capacity exactly equal to the number of serialized elements. Hence, the 
`ArrayList` will grow when a new element is added which is an expensive 
operation.

This PR changes the capacity of the deserialized `ArrayList` to #elements + 
1 to avoid the growing the list when a single element is added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink listSer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3912.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3912


commit 1fa86b4e282ab0594d8dc768840de4c798e29591
Author: Fabian Hueske 
Date:   2017-05-15T19:41:51Z

[FLINK-6589] [core] Deserialize ArrayList with capacity of size+1 to 
prevent growth.




> ListSerializer should deserialize as ArrayList with size + 1
> 
>
> Key: FLINK-6589
> URL: https://issues.apache.org/jira/browse/FLINK-6589
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> The {{ListSerializer}} deserializes a list as {{ArrayList}} with exactly the 
> required capacity, i.e., number of serialized objects.
> Several operators in the Table API have a {{MapState}} to 
> store received elements in a list per timestamp. Hence, retrieving the list 
> and adding one element to the list is a very common operation.
> Since the list which is deserialized has no room left for adding elements, 
> the first insertion into the list will result in growing the {{ArrayList}} 
> which is expensive.
> I propose to initialize the {{ArrayList}} returned by the {{ListSerializer}} 
> with numberOfSerializedElements + 1. This will only marginally increase the 
> size of the list and allow for one insertion without growing the list.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)