[jira] [Commented] (FLINK-7684) Avoid multiple data copies in MergingWindowSet

2017-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281775#comment-16281775
 ] 

ASF GitHub Bot commented on FLINK-7684:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4723
  
I think this needs a lot more description of what is happening.
For example, this introduces a prominent new configuration setting on the 
`ExecutionConfig` that is nowhere described, making it hard to review this. The 
javadocs on the `OptimizationTarget` are minimal, and no other docs have been 
added.

As a general thought: I am very skeptical about adding such new setting, I 
would actually like us to go the opposite way and reduce the number of nobs 
further and further over time. Unless there are vast differences in the 
behavior, I find that an opinionated good implementation or choice of technique 
is better for users than offering a knob.


> Avoid multiple data copies in MergingWindowSet
> --
>
> Key: FLINK-7684
> URL: https://issues.apache.org/jira/browse/FLINK-7684
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.2.0, 1.3.0, 1.2.1, 1.3.1, 1.3.2
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>
> Currently MergingWindowSet uses ListState of tuples to persists it's mapping. 
> This is inefficient because this ListState of tuples must be converted to a 
> HashMap on each access.
> Furthermore, for some cases it might be inefficient to check whether mapping 
> has changed before saving it on state.
> Those two issues are causing multiple data copies and constructing multiple 
> Lists/Maps per each processed element, which is a reason for noticeable 
> performance issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7684) Avoid multiple data copies in MergingWindowSet

2017-09-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179162#comment-16179162
 ] 

ASF GitHub Bot commented on FLINK-7684:
---

GitHub user pnowojski opened a pull request:

https://github.com/apache/flink/pull/4723

[FLINK-7684] Avoid data copies in MergingWindowSet

## What is the purpose of the change

Previously MergingWindowSet uses ListState of tuples to persists it's 
mapping. This is inefficient because this ListState of tuples must be converted 
to a HashMap on each access.

Furthermore, for some cases it might be inefficient to check whether 
mapping has changed before saving it on state.

Fixing those two issues improve session windows 
[benchmarks](https://github.com/dataArtisans/flink-benchmarks) results by 10 - 
20%

First commit comes from different PR #4722 

## Verifying this change

This change is already covered by existing tests, such as *(please describe 
tests)*.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): no
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
  - The serializers: **yes** (it changes how WindowOperator is being 
serialized)
  - The runtime per-record code paths (performance sensitive): **yes**
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no

## Documentation

  - Does this pull request introduce a new feature? yes
  - If yes, how is the feature documented? JavaDocs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pnowojski/flink window

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4723


commit ac41b26cdb5c2341bb25e520c4d449ec4e956a8f
Author: Piotr Nowojski 
Date:   2017-09-14T10:39:30Z

[FLINK-7683] Iterate over keys in KeyedStateBackend

commit 20ed7f51399752d620696d9502b61233d47f6bcb
Author: Piotr Nowojski 
Date:   2017-09-25T12:18:52Z

[FLINK-7684] Add OptimizationTarget to the ExecutionConfig

commit c5505e1252ea5b70cff9cf76ff89d7dc52f45057
Author: Piotr Nowojski 
Date:   2017-07-06T11:56:13Z

[FLINK-7684] Serialize MergingWindowSet to ValueState>

This avoids an unnecessary data copy

commit 00c044c30e95087b65abe65da67a77841a3c7740
Author: Piotr Nowojski 
Date:   2017-09-25T13:28:01Z

[FLINK-7684] Add always persist flag to MegringWindowSet




> Avoid multiple data copies in MergingWindowSet
> --
>
> Key: FLINK-7684
> URL: https://issues.apache.org/jira/browse/FLINK-7684
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.2.0, 1.3.0, 1.2.1, 1.3.1, 1.3.2
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>
> Currently MergingWindowSet uses ListState of tuples to persists it's mapping. 
> This is inefficient because this ListState of tuples must be converted to a 
> HashMap on each access.
> Furthermore, for some cases it might be inefficient to check whether mapping 
> has changed before saving it on state.
> Those two issues are causing multiple data copies and constructing multiple 
> Lists/Maps per each processed element, which is a reason for noticeable 
> performance issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)