[jira] [Commented] (FLINK-4557) Table API Stream Aggregations

2016-09-17 Thread shijinkui (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500099#comment-15500099
 ] 

shijinkui commented on FLINK-4557:
--

Does this plan started? 
What's the progress now?

thanks

> Table API Stream Aggregations
> -
>
> Key: FLINK-4557
> URL: https://issues.apache.org/jira/browse/FLINK-4557
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Timo Walther
>
> The Table API is a declarative API to define queries on static and streaming 
> tables. So far, only projection, selection, and union are supported 
> operations on streaming tables.
> This issue and the corresponding FLIP proposes to add support for different 
> types of aggregations on top of streaming tables. In particular, we seek to 
> support:
> *Group-window aggregates*, i.e., aggregates which are computed for a group of 
> elements. A (time or row-count) window is required to bound the infinite 
> input stream into a finite group.
> *Row-window aggregates*, i.e., aggregates which are computed for each row, 
> based on a window (range) of preceding and succeeding rows.
> Each type of aggregate shall be supported on keyed/grouped or 
> non-keyed/grouped data streams for streaming tables as well as batch tables.
> Since time-windowed aggregates will be the first operation that require the 
> definition of time, we also need to discuss how the Table API handles time 
> characteristics, timestamps, and watermarks.
> The FLIP can be found here: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-4557) Table API Stream Aggregations

2016-09-17 Thread shijinkui (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500099#comment-15500099
 ] 

shijinkui edited comment on FLINK-4557 at 9/18/16 2:52 AM:
---

Does this plan started? 
What's the progress now?
Can someone create nine subtasks mentioned in the FLIP-11?

thanks


was (Author: shijinkui):
Does this plan started? 
What's the progress now?

thanks

> Table API Stream Aggregations
> -
>
> Key: FLINK-4557
> URL: https://issues.apache.org/jira/browse/FLINK-4557
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Timo Walther
>
> The Table API is a declarative API to define queries on static and streaming 
> tables. So far, only projection, selection, and union are supported 
> operations on streaming tables.
> This issue and the corresponding FLIP proposes to add support for different 
> types of aggregations on top of streaming tables. In particular, we seek to 
> support:
> *Group-window aggregates*, i.e., aggregates which are computed for a group of 
> elements. A (time or row-count) window is required to bound the infinite 
> input stream into a finite group.
> *Row-window aggregates*, i.e., aggregates which are computed for each row, 
> based on a window (range) of preceding and succeeding rows.
> Each type of aggregate shall be supported on keyed/grouped or 
> non-keyed/grouped data streams for streaming tables as well as batch tables.
> Since time-windowed aggregates will be the first operation that require the 
> definition of time, we also need to discuss how the Table API handles time 
> characteristics, timestamps, and watermarks.
> The FLIP can be found here: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-4598) Support NULLIF in Table API

2016-09-17 Thread Jark Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-4598.
--
Resolution: Won't Fix

> Support NULLIF  in Table API
> 
>
> Key: FLINK-4598
> URL: https://issues.apache.org/jira/browse/FLINK-4598
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Jark Wu
>Assignee: Jark Wu
>
> This could be a subtask of [FLINK-4549]. As Flink SQL has supported 
> {{NULLIF}} implicitly. We should support it in Table API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4589) Fix Merging of Covering Window in MergingWindowSet

2016-09-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15498512#comment-15498512
 ] 

Aljoscha Krettek commented on FLINK-4589:
-

Ah good, I saw that you also put this on the 1.1 branch. Closing this issue now.

> Fix Merging of Covering Window in MergingWindowSet
> --
>
> Key: FLINK-4589
> URL: https://issues.apache.org/jira/browse/FLINK-4589
> Project: Flink
>  Issue Type: Bug
>  Components: Windowing Operators
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.0.4, 1.2.0, 1.1.3
>
>
> Right now, when a new window gets merged that covers all of the existing 
> window {{MergingWindowSet}} does not correctly set the state window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-4589) Fix Merging of Covering Window in MergingWindowSet

2016-09-17 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-4589.
-
Resolution: Fixed

Fixed on master and release 1.1 branch.

> Fix Merging of Covering Window in MergingWindowSet
> --
>
> Key: FLINK-4589
> URL: https://issues.apache.org/jira/browse/FLINK-4589
> Project: Flink
>  Issue Type: Bug
>  Components: Windowing Operators
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.0.4, 1.2.0, 1.1.3
>
>
> Right now, when a new window gets merged that covers all of the existing 
> window {{MergingWindowSet}} does not correctly set the state window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3322) MemoryManager creates too much GC pressure with iterative jobs

2016-09-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15498937#comment-15498937
 ] 

ASF GitHub Bot commented on FLINK-3322:
---

Github user ggevay commented on the issue:

https://github.com/apache/flink/pull/2495
  
Btw. before submitting a pull request, it is good practice to run `mvn 
clean verify` locally (or do a Travis build), and make sure that there are no 
test failures (or, at least no test failures related to the current change).


> MemoryManager creates too much GC pressure with iterative jobs
> --
>
> Key: FLINK-3322
> URL: https://issues.apache.org/jira/browse/FLINK-3322
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: FLINK-3322.docx, FLINK-3322_reusingmemoryfordrivers.docx
>
>
> When taskmanager.memory.preallocate is false (the default), released memory 
> segments are not added to a pool, but the GC is expected to take care of 
> them. This puts too much pressure on the GC with iterative jobs, where the 
> operators reallocate all memory at every superstep.
> See the following discussion on the mailing list:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html
> Reproducing the issue:
> https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc
> The class to start is malom.Solver. If you increase the memory given to the 
> JVM from 1 to 50 GB, performance gradually degrades by more than 10 times. 
> (It will generate some lookuptables to /tmp on first run for a few minutes.) 
> (I think the slowdown might also depend somewhat on 
> taskmanager.memory.fraction, because more unused non-managed memory results 
> in rarer GCs.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4629) Kafka v 0.10 Support

2016-09-17 Thread Simone Robutti (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15498948#comment-15498948
 ] 

Simone Robutti commented on FLINK-4629:
---

There's already an issue that covers the update to Kafka 0.10 of the Kafka 
Source and Kafka Sink for Flink: 
https://issues.apache.org/jira/browse/FLINK-4035

This will be probably included in Flink 1.2 for what I've heard in the past 
weeks.

> Kafka v 0.10 Support
> 
>
> Key: FLINK-4629
> URL: https://issues.apache.org/jira/browse/FLINK-4629
> Project: Flink
>  Issue Type: Wish
>Reporter: Mariano Gonzalez
>Priority: Minor
>
> I couldn't find any repo or documentation about when Flink will start 
> supporting Kafka v 0.10.
> Is there any document that you can point me out where i can see Flink's 
> roadmap?
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3322) MemoryManager creates too much GC pressure with iterative jobs

2016-09-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15498928#comment-15498928
 ] 

ASF GitHub Bot commented on FLINK-3322:
---

Github user ggevay commented on the issue:

https://github.com/apache/flink/pull/2495
  
The tests that are failing are in `flink-gelly-examples`, for example 
`SingleSourceShortestPathsITCase`. They have error msgs that indicate memory 
management errors, e.g.
`Caused by: java.lang.RuntimeException: Error obtaining the sorted input: 
Thread 'SortMerger Reading Thread' terminated due to an exception: segment has 
been freed`.

But I would say that before fixing this pull request, let's concentrate on 
https://github.com/apache/flink/pull/2496.


> MemoryManager creates too much GC pressure with iterative jobs
> --
>
> Key: FLINK-3322
> URL: https://issues.apache.org/jira/browse/FLINK-3322
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: FLINK-3322.docx, FLINK-3322_reusingmemoryfordrivers.docx
>
>
> When taskmanager.memory.preallocate is false (the default), released memory 
> segments are not added to a pool, but the GC is expected to take care of 
> them. This puts too much pressure on the GC with iterative jobs, where the 
> operators reallocate all memory at every superstep.
> See the following discussion on the mailing list:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html
> Reproducing the issue:
> https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc
> The class to start is malom.Solver. If you increase the memory given to the 
> JVM from 1 to 50 GB, performance gradually degrades by more than 10 times. 
> (It will generate some lookuptables to /tmp on first run for a few minutes.) 
> (I think the slowdown might also depend somewhat on 
> taskmanager.memory.fraction, because more unused non-managed memory results 
> in rarer GCs.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2495: FLINK-3322 - Make sorters to reuse the memory pages alloc...

2016-09-17 Thread ggevay
Github user ggevay commented on the issue:

https://github.com/apache/flink/pull/2495
  
The tests that are failing are in `flink-gelly-examples`, for example 
`SingleSourceShortestPathsITCase`. They have error msgs that indicate memory 
management errors, e.g.
`Caused by: java.lang.RuntimeException: Error obtaining the sorted input: 
Thread 'SortMerger Reading Thread' terminated due to an exception: segment has 
been freed`.

But I would say that before fixing this pull request, let's concentrate on 
https://github.com/apache/flink/pull/2496.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2495: FLINK-3322 - Make sorters to reuse the memory pages alloc...

2016-09-17 Thread ggevay
Github user ggevay commented on the issue:

https://github.com/apache/flink/pull/2495
  
Btw. before submitting a pull request, it is good practice to run `mvn 
clean verify` locally (or do a Travis build), and make sure that there are no 
test failures (or, at least no test failures related to the current change).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4280) New Flink-specific option to set starting position of Kafka consumer without respecting external offsets in ZK / Broker

2016-09-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15499135#comment-15499135
 ] 

ASF GitHub Bot commented on FLINK-4280:
---

GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/2509

[FLINK-4280][kafka-connector] Explicit start position configuration for 
Kafka Consumer

This PR adds the following new explicit setter methods to configure the 
starting position for the Kafka Consumer connector:

```
FlinkKafkaConsumer08 kafka = new FlinkKafkaConsumer08(...) // or 09
kafka.setStartFromEarliest(); // start from earliest without respecting any 
committed offsets
kafka.setStartFromLatest(); // start from latest without respecting any 
committed offsets
kafka.setStartFromGroupOffsets(); // respects committed offsets in ZK / 
Kafka as starting points
```

The default is to start from group offsets, so we won't be breaking 
existing user code.

One thing to note is that this PR also includes some refactoring to 
consolidate all start offset assigning logic for partitions within the fetcher. 
For example, in 0.8 version, with this change the `SimpleConsumerThread` no 
longer deals with deciding where a partition needs to start from; all 
partitions should already be assigned starting offsets by the fetcher, and it 
simply needs to start consuming the partition.This is a pre-preparation for 
transparent partition discovery for the Kafka consumers in 
[FLINK-4022](https://issues.apache.org/jira/browse/FLINK-4022).

I suggest to review this PR after #2369 to reduce effort in getting the 
0.10 Kafka consumer in first. Tests for the new function will be added in 
follow-up commits.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-4280

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2509.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2509


commit f1d24806d902a45f66fc9b42a19a303a031b81b1
Author: Tzu-Li (Gordon) Tai 
Date:   2016-09-17T13:41:50Z

[FLINK-4280][kafka-connector] Explicit start position configuration for 
Kafka Consumer




> New Flink-specific option to set starting position of Kafka consumer without 
> respecting external offsets in ZK / Broker
> ---
>
> Key: FLINK-4280
> URL: https://issues.apache.org/jira/browse/FLINK-4280
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.2.0
>
>
> Currently, to start reading from the "earliest" and "latest" position in 
> topics for the Flink Kafka consumer, users set the Kafka config 
> {{auto.offset.reset}} in the provided properties configuration.
> However, the way this config actually works might be a bit misleading if 
> users were trying to find a way to "read topics from a starting position". 
> The way the {{auto.offset.reset}} config works in the Flink Kafka consumer 
> resembles Kafka's original intent for the setting: first, existing external 
> offsets committed to the ZK / brokers will be checked; if none exists, then 
> will {{auto.offset.reset}} be respected.
> I propose to add Flink-specific ways to define the starting position, without 
> taking into account the external offsets. The original behaviour (reference 
> external offsets first) can be changed to be a user option, so that the 
> behaviour can be retained for frequent Kafka users that may need some 
> collaboration with existing non-Flink Kafka consumer applications.
> How users will interact with the Flink Kafka consumer after this is added, 
> with a newly introduced {{flink.starting-position}} config:
> {code}
> Properties props = new Properties();
> props.setProperty("flink.starting-position", "earliest/latest");
> props.setProperty("auto.offset.reset", "..."); // this will be ignored (log a 
> warning)
> props.setProperty("group.id", "...") // this won't have effect on the 
> starting position anymore (may still be used in external offset committing)
> ...
> {code}
> Or, reference external offsets in ZK / broker:
> {code}
> Properties props = new Properties();
> props.setProperty("flink.starting-position", "external-offsets");
> props.setProperty("auto.offset.reset", "earliest/latest"); // default will be 
> latest
> props.setProperty("group.id", "..."); // will be used to lookup external 
> offsets in ZK / broker on startup
> ...
> {code}
> A thing we would need to decide on is what would the default 

[GitHub] flink pull request #2509: [FLINK-4280][kafka-connector] Explicit start posit...

2016-09-17 Thread tzulitai
GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/2509

[FLINK-4280][kafka-connector] Explicit start position configuration for 
Kafka Consumer

This PR adds the following new explicit setter methods to configure the 
starting position for the Kafka Consumer connector:

```
FlinkKafkaConsumer08 kafka = new FlinkKafkaConsumer08(...) // or 09
kafka.setStartFromEarliest(); // start from earliest without respecting any 
committed offsets
kafka.setStartFromLatest(); // start from latest without respecting any 
committed offsets
kafka.setStartFromGroupOffsets(); // respects committed offsets in ZK / 
Kafka as starting points
```

The default is to start from group offsets, so we won't be breaking 
existing user code.

One thing to note is that this PR also includes some refactoring to 
consolidate all start offset assigning logic for partitions within the fetcher. 
For example, in 0.8 version, with this change the `SimpleConsumerThread` no 
longer deals with deciding where a partition needs to start from; all 
partitions should already be assigned starting offsets by the fetcher, and it 
simply needs to start consuming the partition.This is a pre-preparation for 
transparent partition discovery for the Kafka consumers in 
[FLINK-4022](https://issues.apache.org/jira/browse/FLINK-4022).

I suggest to review this PR after #2369 to reduce effort in getting the 
0.10 Kafka consumer in first. Tests for the new function will be added in 
follow-up commits.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-4280

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2509.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2509


commit f1d24806d902a45f66fc9b42a19a303a031b81b1
Author: Tzu-Li (Gordon) Tai 
Date:   2016-09-17T13:41:50Z

[FLINK-4280][kafka-connector] Explicit start position configuration for 
Kafka Consumer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-4499) Introduce findbugs maven plugin

2016-09-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-4499:
--
Description: 
As suggested by Stephan in FLINK-4482, this issue is to add 
findbugs-maven-plugin into the build process so that we can detect lack of 
proper locking and other defects automatically.

We can begin with small set of rules.

  was:As suggested by Stephan in FLINK-4482, this issue is to add 
findbugs-maven-plugin into the build process so that we can detect lack of 
proper locking and other defects automatically.


> Introduce findbugs maven plugin
> ---
>
> Key: FLINK-4499
> URL: https://issues.apache.org/jira/browse/FLINK-4499
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> As suggested by Stephan in FLINK-4482, this issue is to add 
> findbugs-maven-plugin into the build process so that we can detect lack of 
> proper locking and other defects automatically.
> We can begin with small set of rules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4482) numUnsuccessfulCheckpointsTriggers is accessed without holding triggerLock

2016-09-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-4482:
--
Description: 
In CheckpointCoordinator#stopCheckpointScheduler() :
{code}
synchronized (lock) {
...
  numUnsuccessfulCheckpointsTriggers = 0;
{code}
triggerLock is not held in the above operation.
See comment for triggerLock earlier in triggerCheckpoint():
{code}
// we lock with a special lock to make sure that trigger requests do not 
overtake each other.
// this is not done with the coordinator-wide lock, because the 
'checkpointIdCounter'
// may issue blocking operations. Using a different lock than teh 
coordinator-wide lock,
// we avoid blocking the processing of 'acknowledge/decline' messages 
during that time.
synchronized (triggerLock) {
{code}

  was:
In CheckpointCoordinator#stopCheckpointScheduler() :
{code}
synchronized (lock) {
...
  numUnsuccessfulCheckpointsTriggers = 0;
{code}

triggerLock is not held in the above operation.
See comment for triggerLock earlier in triggerCheckpoint():
{code}
// we lock with a special lock to make sure that trigger requests do not 
overtake each other.
// this is not done with the coordinator-wide lock, because the 
'checkpointIdCounter'
// may issue blocking operations. Using a different lock than teh 
coordinator-wide lock,
// we avoid blocking the processing of 'acknowledge/decline' messages 
during that time.
synchronized (triggerLock) {
{code}


> numUnsuccessfulCheckpointsTriggers is accessed without holding triggerLock
> --
>
> Key: FLINK-4482
> URL: https://issues.apache.org/jira/browse/FLINK-4482
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> In CheckpointCoordinator#stopCheckpointScheduler() :
> {code}
> synchronized (lock) {
> ...
>   numUnsuccessfulCheckpointsTriggers = 0;
> {code}
> triggerLock is not held in the above operation.
> See comment for triggerLock earlier in triggerCheckpoint():
> {code}
> // we lock with a special lock to make sure that trigger requests do not 
> overtake each other.
> // this is not done with the coordinator-wide lock, because the 
> 'checkpointIdCounter'
> // may issue blocking operations. Using a different lock than teh 
> coordinator-wide lock,
> // we avoid blocking the processing of 'acknowledge/decline' messages 
> during that time.
> synchronized (triggerLock) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)