date:20160523

[jira] [Created] (FLINK-3961) Possibility to write output to multiple locations based on partitions

2016-05-23 Thread Kirsti Laurila (JIRA)

Kirsti Laurila created FLINK-3961:
-

 Summary: Possibility to write output to multiple locations based 
on  partitions
 Key: FLINK-3961
 URL: https://issues.apache.org/jira/browse/FLINK-3961
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.0.2
Reporter: Kirsti Laurila
Priority: Minor


At the moment, one cannot write output to different folders based on partitions 
based on data e.g. if data is partitioned by day, there is no direct way to 
write data to 

path/date=20160520/data
path/date=20160521/data
...

Add support for this, prefereably to all write possibilities (Write, 
WriteAsCsv) etc.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3941) Add support for UNION (with duplicate elimination)

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297679#comment-15297679
 ] 

ASF GitHub Bot commented on FLINK-3941:
---

GitHub user yjshen opened a pull request:

https://github.com/apache/flink/pull/2025

[FLINK-3941][TableAPI]Add support for UNION (with duplicate elimination)

This PR aims at adding `UNION` support in TableAPI and SQL by:
- Extending Table API with a new `union()` method
- Relaxing `DataSetUnionRule` to enable union conversion
- a `distinct` after `union` in flink execution plan to eliminate duplicate 
rows

Note: Currently, I think `Union` do not has its counterpart in DataStream, 
therefore left unsupported. If it's not true, I'd like to adapt this PR.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yjshen/flink union

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2025


commit fb8b61b5638f0b52f5857341c7acc95e8985b2d4
Author: Yijie Shen 
Date:   2016-05-24T04:46:21Z

add union support




> Add support for UNION (with duplicate elimination)
> --
>
> Key: FLINK-3941
> URL: https://issues.apache.org/jira/browse/FLINK-3941
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Assignee: Yijie Shen
>Priority: Minor
>
> Currently, only UNION ALL is supported by Table API and SQL.
> UNION (with duplicate elimination) can be supported by applying a 
> {{DataSet.distinct()}} after the union on all fields. This issue includes:
> - Extending {{DataSetUnion}}
> - Relaxing {{DataSetUnionRule}} to translated non-all unions.
> - Extend the Table API with union() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3941][TableAPI]Add support for UNION (w...

2016-05-23 Thread yjshen

GitHub user yjshen opened a pull request:

https://github.com/apache/flink/pull/2025

[FLINK-3941][TableAPI]Add support for UNION (with duplicate elimination)

This PR aims at adding `UNION` support in TableAPI and SQL by:
- Extending Table API with a new `union()` method
- Relaxing `DataSetUnionRule` to enable union conversion
- a `distinct` after `union` in flink execution plan to eliminate duplicate 
rows

Note: Currently, I think `Union` do not has its counterpart in DataStream, 
therefore left unsupported. If it's not true, I'd like to adapt this PR.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yjshen/flink union

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2025


commit fb8b61b5638f0b52f5857341c7acc95e8985b2d4
Author: Yijie Shen 
Date:   2016-05-24T04:46:21Z

add union support




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-3753) KillerWatchDog should not use kill on toKill thread

2016-05-23 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3753:
--
Description: 
{code}
// this is harsh, but this watchdog is a last resort
if (toKill.isAlive()) {
  toKill.stop();
}
{code}
stop() is deprecated.

See:
https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads

  was:
{code}
// this is harsh, but this watchdog is a last resort
if (toKill.isAlive()) {
  toKill.stop();
}
{code}
stop() is deprecated.


See:
https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads


> KillerWatchDog should not use kill on toKill thread
> ---
>
> Key: FLINK-3753
> URL: https://issues.apache.org/jira/browse/FLINK-3753
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> // this is harsh, but this watchdog is a last resort
> if (toKill.isAlive()) {
>   toKill.stop();
> }
> {code}
> stop() is deprecated.
> See:
> https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop()+to+terminate+threads



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3960) EventTimeWindowCheckpointingITCase fails with a segmentation fault

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296758#comment-15296758
 ] 

ASF GitHub Bot commented on FLINK-3960:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/2022#issuecomment-221046822
  
+1 to skip the test until the RocksDB config fix is in.
Will greatly help other builds...


> EventTimeWindowCheckpointingITCase fails with a segmentation fault
> --
>
> Key: FLINK-3960
> URL: https://issues.apache.org/jira/browse/FLINK-3960
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> As a follow-up issue of FLINK-3909, our tests fail with the following. I 
> believe [~aljoscha] is working on a fix.
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7fae0c62a264, pid=72720, tid=140385528268544
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_76-b13) (build 
> 1.7.0_76-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [librocksdbjni78704726610339516..so+0x13c264]  
> rocksdb_iterator_helper(rocksdb::DB*, rocksdb::ReadOptions, 
> rocksdb::ColumnFamilyHandle*)+0x4
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /home/travis/build/mxm/flink/flink-tests/target/hs_err_pid72720.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> Aborted (core dumped)
> {noformat}
> I propose to disable the test case in the meantime because it is blocking our 
> test execution which we need for pull requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3960] ignore EventTimeWindowCheckpointi...

2016-05-23 Thread StephanEwen

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/2022#issuecomment-221046822
  
+1 to skip the test until the RocksDB config fix is in.
Will greatly help other builds...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-05-23 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296733#comment-15296733
 ] 

ramkrishna.s.vasudevan commented on FLINK-3650:
---

[~till.rohrmann] - Any comments on the latest pull.

> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3586) Risk of data overflow while use sum/count to calculate AVG value

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296731#comment-15296731
 ] 

ASF GitHub Bot commented on FLINK-3586:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/2024

[FLINK-3586] Fix potential overflow of Long AVG aggregation.

Fixes a potential overflow of Long `AVG` aggregates in the Table API 
(intermediate sum is computed using `BigInteger` instead of `Long`).

Aggregates are refactored to specify their intermediate types as 
`TypeInformation` instead of SQL types. Intermediate results are not exposed to 
Calcite and Flink internal. So SQL types are not required and need to be 
converted into `TypeInformation` in any case.

Adds unit tests for `MIN`, `MAX´, `COUNT`, `SUM`, and `AVG` aggregates.

- [X] General
- [X] Documentation
  - No functionality added
  - Some ScalaDocs extended

- [X] Tests & Build
  - Unit tests for existing Aggregates added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableLongAvgOverflow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2024.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2024


commit a887d1d7edb2b1b96652ca5021beec123011e03a
Author: Fabian Hueske 
Date:   2016-05-22T14:46:43Z

[FLINK-3586] Fix potential overflow of Long AVG aggregation.

- Add unit tests for Aggretates.




> Risk of data overflow while use sum/count to calculate AVG value
> 
>
> Key: FLINK-3586
> URL: https://issues.apache.org/jira/browse/FLINK-3586
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Chengxiang Li
>Assignee: Fabian Hueske
>Priority: Minor
>
> Now, we use {{(sum: Long, count: Long}} to store AVG partial aggregate data, 
> which may have data overflow risk, we should use unbounded data type(such as 
> BigInteger) to store them for necessary data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3586] Fix potential overflow of Long AV...

2016-05-23 Thread fhueske

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/2024

[FLINK-3586] Fix potential overflow of Long AVG aggregation.

Fixes a potential overflow of Long `AVG` aggregates in the Table API 
(intermediate sum is computed using `BigInteger` instead of `Long`).

Aggregates are refactored to specify their intermediate types as 
`TypeInformation` instead of SQL types. Intermediate results are not exposed to 
Calcite and Flink internal. So SQL types are not required and need to be 
converted into `TypeInformation` in any case.

Adds unit tests for `MIN`, `MAXÂ´, `COUNT`, `SUM`, and `AVG` aggregates.

- [X] General
- [X] Documentation
  - No functionality added
  - Some ScalaDocs extended

- [X] Tests & Build
  - Unit tests for existing Aggregates added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableLongAvgOverflow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2024.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2024


commit a887d1d7edb2b1b96652ca5021beec123011e03a
Author: Fabian Hueske 
Date:   2016-05-22T14:46:43Z

[FLINK-3586] Fix potential overflow of Long AVG aggregation.

- Add unit tests for Aggretates.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3955) Change Table.toSink() to Table.writeToSink()

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296709#comment-15296709
 ] 

ASF GitHub Bot commented on FLINK-3955:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/2023

[FLINK-3955] [tableAPI] Rename Table.toSink() to Table.writeToSink().

Renames `Table.toSink()` to `Table.writeToSink()`.

`Table.toSink()` indicates that the `Table` is converted into a sink which 
is not the case.

- [X] General
- [X] Documentation
- [X] Tests & Build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableToSink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2023.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2023






> Change Table.toSink() to Table.writeToSink()
> 
>
> Key: FLINK-3955
> URL: https://issues.apache.org/jira/browse/FLINK-3955
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Currently, a {{Table}} can be emitted to a {{TableSink}} using the 
> {{Table.toSink()}} method.
> However, the name of the method indicates that the {{Table}} is converted 
> into a {{Sink}}.
> Therefore, I propose to change the method to {{Table.writeToSink()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3955] [tableAPI] Rename Table.toSink() ...

2016-05-23 Thread fhueske

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/2023

[FLINK-3955] [tableAPI] Rename Table.toSink() to Table.writeToSink().

Renames `Table.toSink()` to `Table.writeToSink()`.

`Table.toSink()` indicates that the `Table` is converted into a sink which 
is not the case.

- [X] General
- [X] Documentation
- [X] Tests & Build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableToSink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2023.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2023






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3923) Unify configuration conventions of the Kinesis producer to the same as the consumer

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296706#comment-15296706
 ] 

ASF GitHub Bot commented on FLINK-3923:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64257599
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -160,12 +168,13 @@ public void 
setCustomPartitioner(KinesisPartitioner partitioner) {
public void open(Configuration parameters) throws Exception {
super.open(parameters);
 
-   KinesisProducerConfiguration config = new 
KinesisProducerConfiguration();
-   config.setRegion(this.region);
-   config.setCredentialsProvider(new StaticCredentialsProvider(new 
BasicAWSCredentials(this.accessKey, this.secretKey)));
+   KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
+
+   
producerConfig.setRegion(configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION));
+   
producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
//config.setCollectionMaxCount(1);
//config.setAggregationMaxCount(1);
--- End diff --

Okay, nice.
One thing that would be really helpful would be additional testing. One big 
issue is that the Kinesis connector doesn't handle "flow control" nicely.
If I have a Flink job that is producing data at a higher rate than the 
number of shards permits, I'm getting a lot of failures. Ideally, the producer 
should only accept as much data as it can handle and block otherwise.
Do you have any ideas how to achieve that?


> Unify configuration conventions of the Kinesis producer to the same as the 
> consumer
> ---
>
> Key: FLINK-3923
> URL: https://issues.apache.org/jira/browse/FLINK-3923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Abdullah Ozturk
>
> Currently, the Kinesis consumer and producer are configured differently.
> The producer expects a list of arguments for the access key, secret, region, 
> stream. The consumer is accepting properties (similar to the Kafka connector).
> The objective of this issue is to change the producer so that it is also 
> using a properties-based configuration (including an input validation step)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3923] [connector-kinesis] Unify configu...

2016-05-23 Thread rmetzger

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64257599
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -160,12 +168,13 @@ public void 
setCustomPartitioner(KinesisPartitioner partitioner) {
public void open(Configuration parameters) throws Exception {
super.open(parameters);
 
-   KinesisProducerConfiguration config = new 
KinesisProducerConfiguration();
-   config.setRegion(this.region);
-   config.setCredentialsProvider(new StaticCredentialsProvider(new 
BasicAWSCredentials(this.accessKey, this.secretKey)));
+   KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
+
+   
producerConfig.setRegion(configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION));
+   
producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
//config.setCollectionMaxCount(1);
//config.setAggregationMaxCount(1);
--- End diff --

Okay, nice.
One thing that would be really helpful would be additional testing. One big 
issue is that the Kinesis connector doesn't handle "flow control" nicely.
If I have a Flink job that is producing data at a higher rate than the 
number of shards permits, I'm getting a lot of failures. Ideally, the producer 
should only accept as much data as it can handle and block otherwise.
Do you have any ideas how to achieve that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3960) EventTimeWindowCheckpointingITCase fails with a segmentation fault

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296668#comment-15296668
 ] 

ASF GitHub Bot commented on FLINK-3960:
---

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/2022

[FLINK-3960] ignore EventTimeWindowCheckpointingITCase for now

Until FLINK-3960 is fixed, we need to disable this test to allow other
tests to execute properly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink FLINK-3960

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2022.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2022


commit 02692a12e90c0c7441878f5c1f79b5f6ab0d259b
Author: Maximilian Michels 
Date:   2016-05-23T16:29:52Z

[FLINK-3960] ignore EventTimeWindowCheckpointingITCase for now

Until FLINK-3960 is fixed, we need to disable this test to allow other
tests to execute properly.




> EventTimeWindowCheckpointingITCase fails with a segmentation fault
> --
>
> Key: FLINK-3960
> URL: https://issues.apache.org/jira/browse/FLINK-3960
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> As a follow-up issue of FLINK-3909, our tests fail with the following. I 
> believe [~aljoscha] is working on a fix.
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7fae0c62a264, pid=72720, tid=140385528268544
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_76-b13) (build 
> 1.7.0_76-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [librocksdbjni78704726610339516..so+0x13c264]  
> rocksdb_iterator_helper(rocksdb::DB*, rocksdb::ReadOptions, 
> rocksdb::ColumnFamilyHandle*)+0x4
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /home/travis/build/mxm/flink/flink-tests/target/hs_err_pid72720.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> Aborted (core dumped)
> {noformat}
> I propose to disable the test case in the meantime because it is blocking our 
> test execution which we need for pull requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3945] [gelly] Degree annotation for dir...

2016-05-23 Thread greghogan

GitHub user greghogan opened a pull request:

https://github.com/apache/flink/pull/2021

[FLINK-3945] [gelly] Degree annotation for directed graphs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/greghogan/flink 
3945_degree_annotation_for_directed_graphs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2021.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2021


commit 2f79f88cbf32b84fe544232390ab2b6deee1ed0f
Author: Greg Hogan 
Date:   2016-05-20T16:54:16Z

[FLINK-3945] [gelly] Degree annotation for directed graphs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3945) Degree annotation for directed graphs

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296634#comment-15296634
 ] 

ASF GitHub Bot commented on FLINK-3945:
---

GitHub user greghogan opened a pull request:

https://github.com/apache/flink/pull/2021

[FLINK-3945] [gelly] Degree annotation for directed graphs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/greghogan/flink 
3945_degree_annotation_for_directed_graphs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2021.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2021


commit 2f79f88cbf32b84fe544232390ab2b6deee1ed0f
Author: Greg Hogan 
Date:   2016-05-20T16:54:16Z

[FLINK-3945] [gelly] Degree annotation for directed graphs




> Degree annotation for directed graphs
> -
>
> Key: FLINK-3945
> URL: https://issues.apache.org/jira/browse/FLINK-3945
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.1.0
>
>
> There is a third degree count for vertices in directed graphs which is the 
> distinct count of out- and in-neighbors. This also adds edge annotation of 
> the vertex degrees for directed graphs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-3960) EventTimeWindowCheckpointingITCase fails with a segmentation fault

2016-05-23 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-3960:
--
Summary: EventTimeWindowCheckpointingITCase fails with a segmentation fault 
 (was: EventTimeCheckpointingITCase fails with a segmentation fault)

> EventTimeWindowCheckpointingITCase fails with a segmentation fault
> --
>
> Key: FLINK-3960
> URL: https://issues.apache.org/jira/browse/FLINK-3960
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> As a follow-up issue of FLINK-3909, our tests fail with the following. I 
> believe [~aljoscha] is working on a fix.
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7fae0c62a264, pid=72720, tid=140385528268544
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_76-b13) (build 
> 1.7.0_76-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [librocksdbjni78704726610339516..so+0x13c264]  
> rocksdb_iterator_helper(rocksdb::DB*, rocksdb::ReadOptions, 
> rocksdb::ColumnFamilyHandle*)+0x4
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /home/travis/build/mxm/flink/flink-tests/target/hs_err_pid72720.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> Aborted (core dumped)
> {noformat}
> I propose to disable the test case in the meantime because it is blocking our 
> test execution which we need for pull requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3960) EventTimeCheckpointingITCase fails with a segmentation fault

2016-05-23 Thread Maximilian Michels (JIRA)

Maximilian Michels created FLINK-3960:
-

 Summary: EventTimeCheckpointingITCase fails with a segmentation 
fault
 Key: FLINK-3960
 URL: https://issues.apache.org/jira/browse/FLINK-3960
 Project: Flink
  Issue Type: Bug
  Components: Streaming, Tests
Affects Versions: 1.1.0
Reporter: Maximilian Michels
Assignee: Aljoscha Krettek
 Fix For: 1.1.0


As a follow-up issue of FLINK-3909, our tests fail with the following. I 
believe [~aljoscha] is working on a fix.

{noformat}
#

# A fatal error has been detected by the Java Runtime Environment:

#

#  SIGSEGV (0xb) at pc=0x7fae0c62a264, pid=72720, tid=140385528268544

#

# JRE version: Java(TM) SE Runtime Environment (7.0_76-b13) (build 1.7.0_76-b13)

# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode linux-amd64 
compressed oops)

# Problematic frame:

# C  [librocksdbjni78704726610339516..so+0x13c264]  
rocksdb_iterator_helper(rocksdb::DB*, rocksdb::ReadOptions, 
rocksdb::ColumnFamilyHandle*)+0x4

#

# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again

#

# An error report file with more information is saved as:

# /home/travis/build/mxm/flink/flink-tests/target/hs_err_pid72720.log

#

# If you would like to submit a bug report, please visit:

#   http://bugreport.java.com/bugreport/crash.jsp

# The crash happened outside the Java Virtual Machine in native code.

# See problematic frame for where to report the bug.

#

Aborted (core dumped)
{noformat}

I propose to disable the test case in the meantime because it is blocking our 
test execution which we need for pull requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-3941) Add support for UNION (with duplicate elimination)

2016-05-23 Thread Yijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yijie Shen reassigned FLINK-3941:
-

Assignee: Yijie Shen

> Add support for UNION (with duplicate elimination)
> --
>
> Key: FLINK-3941
> URL: https://issues.apache.org/jira/browse/FLINK-3941
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Assignee: Yijie Shen
>Priority: Minor
>
> Currently, only UNION ALL is supported by Table API and SQL.
> UNION (with duplicate elimination) can be supported by applying a 
> {{DataSet.distinct()}} after the union on all fields. This issue includes:
> - Extending {{DataSetUnion}}
> - Relaxing {{DataSetUnionRule}} to translated non-all unions.
> - Extend the Table API with union() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3741) Travis Compile Error: MissingRequirementError: object scala.runtime in compiler mirror not found.

2016-05-23 Thread Todd Lisonbee (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296549#comment-15296549
 ] 

Todd Lisonbee commented on FLINK-3741:
--

This issue has possibly been fixed in the Scala compiler, 
https://issues.scala-lang.org/browse/SI-5463

Version: Scala 2.12.0-M5

https://github.com/scala/scala/pull/5153

> Travis Compile Error: MissingRequirementError: object scala.runtime in 
> compiler mirror not found.
> -
>
> Key: FLINK-3741
> URL: https://issues.apache.org/jira/browse/FLINK-3741
> Project: Flink
>  Issue Type: Bug
>Reporter: Todd Lisonbee
>  Labels: CI, build
>
> Build failed on one of my pull requests and at least 3 others from other 
> people.
> Seems like problem is in latest master as of 4/11/2016 (my pull request only 
> had a Javadoc comment added).
> OpenJDK 7, hadoop.profile=1 and other profiles
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/122460456/log.txt
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/122381837/log.txt
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/122589293/log.txt
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/122589296/log.txt
> Error:
> [INFO] 
> /home/travis/build/apache/flink/flink-libraries/flink-ml/src/main/scala:-1: 
> info: compiling
> [INFO] Compiling 43 source files to 
> /home/travis/build/apache/flink/flink-libraries/flink-ml/target/classes at 
> 1460421450188
> [ERROR] error: error while loading , error in opening zip file
> [ERROR] error: scala.reflect.internal.MissingRequirementError: object 
> scala.runtime in compiler mirror not found.
> [ERROR]   at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
> [ERROR]   at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
> [INFO]at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
> [INFO]at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO]at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
> [INFO]at 
> scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:172)
> [INFO]at 
> scala.reflect.internal.Mirrors$RootsBase.getRequiredPackage(Mirrors.scala:175)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackage$lzycompute(Definitions.scala:183)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackage(Definitions.scala:183)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackageClass$lzycompute(Definitions.scala:184)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.RuntimePackageClass(Definitions.scala:184)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnnotationDefaultAttr$lzycompute(Definitions.scala:1024)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnnotationDefaultAttr(Definitions.scala:1023)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1153)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196)
> [INFO]at 
> scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261)
> [INFO]at scala.tools.nsc.Global$Run.(Global.scala:1290)
> [INFO]at scala.tools.nsc.Driver.doCompile(Driver.scala:32)
> [INFO]at scala.tools.nsc.Main$.doCompile(Main.scala:79)
> [INFO]at scala.tools.nsc.Driver.process(Driver.scala:54)
> [INFO]at scala.tools.nsc.Driver.main(Driver.scala:67)
> [INFO]at scala.tools.nsc.Main.main(Main.scala)
> [INFO]at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [INFO]at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> [INFO]at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [INFO]at java.lang.reflect.Method.invoke(Method.java:606)
> [INFO]at 
> org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
> [INFO]at 
> org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3956) Make FileInputFormats independent from Configuration

2016-05-23 Thread Kostas Kloudas (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296491#comment-15296491
 ] 

Kostas Kloudas commented on FLINK-3956:
---

Thanks [~aljoscha]

> Make FileInputFormats independent from Configuration
> 
>
> Key: FLINK-3956
> URL: https://issues.apache.org/jira/browse/FLINK-3956
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>
> Currently many input file input formats rely on a user specified 
> Configuration object for their correct configuration. As an example, the 
> DelimitedInputFormat can take its delimiter character from this object. 
> This is happening for legacy reasons, and does not work in the streaming 
> codebase. In streaming, the user-specified configuration object is 
> over-written at the TaskManagers by a new Configuration object.
> This issue aims at solving this by refactoring current file input formats to 
> be 
> able to set their parameters through setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3959) Remove implicit sinks

2016-05-23 Thread Aljoscha Krettek (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296475#comment-15296475
 ] 

Aljoscha Krettek commented on FLINK-3959:
-

We should think about this. Most other streaming APIs frameworks have the 
"implicit sink" functionality so Flink would have the unexpected behavior for 
most people. 

> Remove implicit sinks
> -
>
> Key: FLINK-3959
> URL: https://issues.apache.org/jira/browse/FLINK-3959
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Stephan Ewen
> Fix For: 2.0.0
>
>
> Right now, streaming programs need not specify a sink. All transformations 
> that are not consumed are implicitly sinks.
> That behavior makes it hard to lazily construct the stream graph on 
> execution, and prevents interactive sessions on streams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-3956) Make FileInputFormats independent from Configuration

2016-05-23 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-3956:

Summary: Make FileInputFormats independent from Configuration  (was: Make 
FileInputFormats in Streaming independent from the Configuration object)

> Make FileInputFormats independent from Configuration
> 
>
> Key: FLINK-3956
> URL: https://issues.apache.org/jira/browse/FLINK-3956
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>
> Currently many input file input formats rely on a user specified 
> Configuration object for their correct configuration. As an example, the 
> DelimitedInputFormat can take its delimiter character from this object. 
> This is happening for legacy reasons, and does not work in the streaming 
> codebase. In streaming, the user-specified configuration object is 
> over-written at the TaskManagers by a new Configuration object.
> This issue aims at solving this by refactoring current file input formats to 
> be 
> able to set their parameters through setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3956) Make FileInputFormats independent from Configuration

2016-05-23 Thread Aljoscha Krettek (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296471#comment-15296471
 ] 

Aljoscha Krettek commented on FLINK-3956:
-

The title very exactly specified what we want to accomplish, I generalized it a 
bit to make it shorter. :-)

> Make FileInputFormats independent from Configuration
> 
>
> Key: FLINK-3956
> URL: https://issues.apache.org/jira/browse/FLINK-3956
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>
> Currently many input file input formats rely on a user specified 
> Configuration object for their correct configuration. As an example, the 
> DelimitedInputFormat can take its delimiter character from this object. 
> This is happening for legacy reasons, and does not work in the streaming 
> codebase. In streaming, the user-specified configuration object is 
> over-written at the TaskManagers by a new Configuration object.
> This issue aims at solving this by refactoring current file input formats to 
> be 
> able to set their parameters through setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3923) Unify configuration conventions of the Kinesis producer to the same as the consumer

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296462#comment-15296462
 ] 

ASF GitHub Bot commented on FLINK-3923:
---

Github user aozturk commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64235578
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -160,12 +168,13 @@ public void 
setCustomPartitioner(KinesisPartitioner partitioner) {
public void open(Configuration parameters) throws Exception {
super.open(parameters);
 
-   KinesisProducerConfiguration config = new 
KinesisProducerConfiguration();
-   config.setRegion(this.region);
-   config.setCredentialsProvider(new StaticCredentialsProvider(new 
BasicAWSCredentials(this.accessKey, this.secretKey)));
+   KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
+
+   
producerConfig.setRegion(configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION));
+   
producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
//config.setCollectionMaxCount(1);
//config.setAggregationMaxCount(1);
--- End diff --

Sure, no need for a new JIRA I can add these keys while addressing your 
review. I am actually willing to do more work on Kinesis connector. You can 
freely name things here, or create an issue and assign to me.


> Unify configuration conventions of the Kinesis producer to the same as the 
> consumer
> ---
>
> Key: FLINK-3923
> URL: https://issues.apache.org/jira/browse/FLINK-3923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Abdullah Ozturk
>
> Currently, the Kinesis consumer and producer are configured differently.
> The producer expects a list of arguments for the access key, secret, region, 
> stream. The consumer is accepting properties (similar to the Kafka connector).
> The objective of this issue is to change the producer so that it is also 
> using a properties-based configuration (including an input validation step)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3923] [connector-kinesis] Unify configu...

2016-05-23 Thread aozturk

Github user aozturk commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64235578
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -160,12 +168,13 @@ public void 
setCustomPartitioner(KinesisPartitioner partitioner) {
public void open(Configuration parameters) throws Exception {
super.open(parameters);
 
-   KinesisProducerConfiguration config = new 
KinesisProducerConfiguration();
-   config.setRegion(this.region);
-   config.setCredentialsProvider(new StaticCredentialsProvider(new 
BasicAWSCredentials(this.accessKey, this.secretKey)));
+   KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
+
+   
producerConfig.setRegion(configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION));
+   
producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
//config.setCollectionMaxCount(1);
//config.setAggregationMaxCount(1);
--- End diff --

Sure, no need for a new JIRA I can add these keys while addressing your 
review. I am actually willing to do more work on Kinesis connector. You can 
freely name things here, or create an issue and assign to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3952) Bump Netty to 4.1

2016-05-23 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296460#comment-15296460
 ] 

Stephan Ewen commented on FLINK-3952:
-

Good idea in general. Last time we bumped Netty, however, we had a bad 
surprise, because the memory allocation pattern of the pools changed, and 
people were getting OOM exceptions.

We need to carefully validate the change.

> Bump Netty to 4.1
> -
>
> Key: FLINK-3952
> URL: https://issues.apache.org/jira/browse/FLINK-3952
> Project: Flink
>  Issue Type: Improvement
>  Components: Core, Hadoop Compatibility
>Reporter: rektide de la fey
>  Labels: netty
>
> Netty 4.1 is about to release final. This release has [a number of 
> significant 
> enhancements|http://netty.io/wiki/new-and-noteworthy-in-4.1.html], and in 
> particular I find HTTP/2 codecs to be incredibly desirable to have. 
> Additionally, hopefully, the [Hadoop patches for Netty 
> 4.1|https://issues.apache.org/jira/browse/HADOOP-11716] get some tests and 
> get merged, & I believe if/when that happens it'll be important for Flink to 
> also be using the new Netty minor version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3959) Remove implicit sinks

2016-05-23 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-3959:
---

 Summary: Remove implicit sinks
 Key: FLINK-3959
 URL: https://issues.apache.org/jira/browse/FLINK-3959
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Reporter: Stephan Ewen
 Fix For: 2.0.0


Right now, streaming programs need not specify a sink. All transformations that 
are not consumed are implicitly sinks.

That behavior makes it hard to lazily construct the stream graph on execution, 
and prevents interactive sessions on streams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3923) Unify configuration conventions of the Kinesis producer to the same as the consumer

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296443#comment-15296443
 ] 

ASF GitHub Bot commented on FLINK-3923:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/2016#issuecomment-220997800
  
The change looks good overall. I think we can soon merge it.

Please let me know once you've addressed my comments.


> Unify configuration conventions of the Kinesis producer to the same as the 
> consumer
> ---
>
> Key: FLINK-3923
> URL: https://issues.apache.org/jira/browse/FLINK-3923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Abdullah Ozturk
>
> Currently, the Kinesis consumer and producer are configured differently.
> The producer expects a list of arguments for the access key, secret, region, 
> stream. The consumer is accepting properties (similar to the Kafka connector).
> The objective of this issue is to change the producer so that it is also 
> using a properties-based configuration (including an input validation step)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3953] rename unit-tests execution to de...

2016-05-23 Thread StephanEwen

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/2019#issuecomment-220997798
  
Good fix, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3923] [connector-kinesis] Unify configu...

2016-05-23 Thread rmetzger

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/2016#issuecomment-220997800
  
The change looks good overall. I think we can soon merge it.

Please let me know once you've addressed my comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3953) Surefire plugin executes unit tests twice

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296442#comment-15296442
 ] 

ASF GitHub Bot commented on FLINK-3953:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/2019#issuecomment-220997798
  
Good fix, thanks!


> Surefire plugin executes unit tests twice
> -
>
> Key: FLINK-3953
> URL: https://issues.apache.org/jira/browse/FLINK-3953
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> After FLINK-3909 the unit tests are executed twice. There are now two 
> executions defined for the Surefire plugin: {{unit-tests}} and 
> {{integration-tests}}. In addition, there is a default execution called 
> {{default-test}}. This leads to the unit tests to be executed twice. Either 
> renaming unit-tests to default-test or skipping default-test would fix the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-3632) Clean up Table API exceptions

2016-05-23 Thread Fabian Hueske (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-3632.

Resolution: Done

Done with 9cc629662a34bd9cc6310556a321dd5144a60439

Thanks for the contribution!

> Clean up Table API exceptions
> -
>
> Key: FLINK-3632
> URL: https://issues.apache.org/jira/browse/FLINK-3632
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Reporter: Fabian Hueske
>Assignee: Yijie Shen
> Fix For: 1.1.0
>
>
> The Table API throws many different exception types including:
> - {{IllegalArgumentException}}
> - {{TableException}}
> - {{CodeGenException}}
> - {{PlanGenException}}
> - {{ExpressionParserException}}
> from various places of the query translation code. 
> This needs to be cleaned up. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-3939) Prevent distinct aggregates and grouping sets from being translated

2016-05-23 Thread Fabian Hueske (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-3939.

Resolution: Fixed

Fixed with 173d24dfdd79c497dc31c6f04dcda47a83ede494

> Prevent distinct aggregates and grouping sets from being translated
> ---
>
> Key: FLINK-3939
> URL: https://issues.apache.org/jira/browse/FLINK-3939
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
> Fix For: 1.1.0
>
>
> Flink's SQL interface is currently not capable of executing distinct 
> aggregates and grouping sets.
> We need to prevent that queries with these operations are translated by 
> adapting the DataSetAggregateRule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3632][TableAPI]Clean up Table API excep...

2016-05-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2015


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3632) Clean up Table API exceptions

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296429#comment-15296429
 ] 

ASF GitHub Bot commented on FLINK-3632:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2015


> Clean up Table API exceptions
> -
>
> Key: FLINK-3632
> URL: https://issues.apache.org/jira/browse/FLINK-3632
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Reporter: Fabian Hueske
>Assignee: Yijie Shen
> Fix For: 1.1.0
>
>
> The Table API throws many different exception types including:
> - {{IllegalArgumentException}}
> - {{TableException}}
> - {{CodeGenException}}
> - {{PlanGenException}}
> - {{ExpressionParserException}}
> from various places of the query translation code. 
> This needs to be cleaned up. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3939) Prevent distinct aggregates and grouping sets from being translated

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296430#comment-15296430
 ] 

ASF GitHub Bot commented on FLINK-3939:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2014


> Prevent distinct aggregates and grouping sets from being translated
> ---
>
> Key: FLINK-3939
> URL: https://issues.apache.org/jira/browse/FLINK-3939
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
> Fix For: 1.1.0
>
>
> Flink's SQL interface is currently not capable of executing distinct 
> aggregates and grouping sets.
> We need to prevent that queries with these operations are translated by 
> adapting the DataSetAggregateRule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3939] [tableAPI] Prevent translation of...

2016-05-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2014


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3923) Unify configuration conventions of the Kinesis producer to the same as the consumer

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296411#comment-15296411
 ] 

ASF GitHub Bot commented on FLINK-3923:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64230633
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -30,14 +28,20 @@
 import org.apache.flink.api.java.ClosureCleaner;
 import org.apache.flink.configuration.Configuration;
 import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import 
org.apache.flink.streaming.connectors.kinesis.config.KinesisConfigConstants;
 import 
org.apache.flink.streaming.connectors.kinesis.serialization.KinesisSerializationSchema;
+import org.apache.flink.streaming.connectors.kinesis.util.AWSUtil;
+import 
org.apache.flink.streaming.connectors.kinesis.util.KinesisConfigUtil;
 import org.apache.flink.streaming.util.serialization.SerializationSchema;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.nio.ByteBuffer;
 import java.util.List;
 import java.util.Objects;
+import java.util.Properties;
+
+import static com.google.common.base.Preconditions.checkNotNull;
--- End diff --

Actually, just learned that Flink has `org.apache.flink.util.Preconditions` 
as well


> Unify configuration conventions of the Kinesis producer to the same as the 
> consumer
> ---
>
> Key: FLINK-3923
> URL: https://issues.apache.org/jira/browse/FLINK-3923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Abdullah Ozturk
>
> Currently, the Kinesis consumer and producer are configured differently.
> The producer expects a list of arguments for the access key, secret, region, 
> stream. The consumer is accepting properties (similar to the Kafka connector).
> The objective of this issue is to change the producer so that it is also 
> using a properties-based configuration (including an input validation step)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3923) Unify configuration conventions of the Kinesis producer to the same as the consumer

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296421#comment-15296421
 ] 

ASF GitHub Bot commented on FLINK-3923:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64231495
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -160,12 +168,13 @@ public void 
setCustomPartitioner(KinesisPartitioner partitioner) {
public void open(Configuration parameters) throws Exception {
super.open(parameters);
 
-   KinesisProducerConfiguration config = new 
KinesisProducerConfiguration();
-   config.setRegion(this.region);
-   config.setCredentialsProvider(new StaticCredentialsProvider(new 
BasicAWSCredentials(this.accessKey, this.secretKey)));
+   KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
+
+   
producerConfig.setRegion(configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION));
+   
producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
//config.setCollectionMaxCount(1);
//config.setAggregationMaxCount(1);
--- End diff --

I know this is out of scope for the JIRA, but maybe it would make sense to 
introduce some producer specific configuration keys for setting relevant 
producer configuration settings here.

Would you like to add this as part of the PR or do you think we should file 
a follow up JIRA?


> Unify configuration conventions of the Kinesis producer to the same as the 
> consumer
> ---
>
> Key: FLINK-3923
> URL: https://issues.apache.org/jira/browse/FLINK-3923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Abdullah Ozturk
>
> Currently, the Kinesis consumer and producer are configured differently.
> The producer expects a list of arguments for the access key, secret, region, 
> stream. The consumer is accepting properties (similar to the Kafka connector).
> The objective of this issue is to change the producer so that it is also 
> using a properties-based configuration (including an input validation step)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3923] [connector-kinesis] Unify configu...

2016-05-23 Thread rmetzger

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64231495
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -160,12 +168,13 @@ public void 
setCustomPartitioner(KinesisPartitioner partitioner) {
public void open(Configuration parameters) throws Exception {
super.open(parameters);
 
-   KinesisProducerConfiguration config = new 
KinesisProducerConfiguration();
-   config.setRegion(this.region);
-   config.setCredentialsProvider(new StaticCredentialsProvider(new 
BasicAWSCredentials(this.accessKey, this.secretKey)));
+   KinesisProducerConfiguration producerConfig = new 
KinesisProducerConfiguration();
+
+   
producerConfig.setRegion(configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION));
+   
producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
//config.setCollectionMaxCount(1);
//config.setAggregationMaxCount(1);
--- End diff --

I know this is out of scope for the JIRA, but maybe it would make sense to 
introduce some producer specific configuration keys for setting relevant 
producer configuration settings here.

Would you like to add this as part of the PR or do you think we should file 
a follow up JIRA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3923] [connector-kinesis] Unify configu...

2016-05-23 Thread rmetzger

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64230633
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -30,14 +28,20 @@
 import org.apache.flink.api.java.ClosureCleaner;
 import org.apache.flink.configuration.Configuration;
 import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import 
org.apache.flink.streaming.connectors.kinesis.config.KinesisConfigConstants;
 import 
org.apache.flink.streaming.connectors.kinesis.serialization.KinesisSerializationSchema;
+import org.apache.flink.streaming.connectors.kinesis.util.AWSUtil;
+import 
org.apache.flink.streaming.connectors.kinesis.util.KinesisConfigUtil;
 import org.apache.flink.streaming.util.serialization.SerializationSchema;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.nio.ByteBuffer;
 import java.util.List;
 import java.util.Objects;
+import java.util.Properties;
+
+import static com.google.common.base.Preconditions.checkNotNull;
--- End diff --

Actually, just learned that Flink has `org.apache.flink.util.Preconditions` 
as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-3958) Access to MetricRegistry doesn't have proper synchronization in some classes

2016-05-23 Thread Ted Yu (JIRA)

Ted Yu created FLINK-3958:
-

 Summary: Access to MetricRegistry doesn't have proper 
synchronization in some classes
 Key: FLINK-3958
 URL: https://issues.apache.org/jira/browse/FLINK-3958
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu


In GraphiteReporter#getReporter():
{code}
com.codahale.metrics.graphite.GraphiteReporter.Builder builder =
  com.codahale.metrics.graphite.GraphiteReporter.forRegistry(registry);
{code}
Access to registry should be protected by lock on 
ScheduledDropwizardReporter.this

Similar issue exists in GangliaReporter#getReporter()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3957) Breaking API changes for Flink 2.0

2016-05-23 Thread Robert Metzger (JIRA)

Robert Metzger created FLINK-3957:
-

 Summary: Breaking API changes for Flink 2.0
 Key: FLINK-3957
 URL: https://issues.apache.org/jira/browse/FLINK-3957
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.0.0
Reporter: Robert Metzger
 Fix For: 2.0.0


>From time to time, we find APIs in Flink (1.x.y) marked as stable, even though 
>we would like to change them at some point.

This JIRA is to track all planned breaking API changes.

I would suggest to add subtasks to this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3923) Unify configuration conventions of the Kinesis producer to the same as the consumer

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296391#comment-15296391
 ] 

ASF GitHub Bot commented on FLINK-3923:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64227104
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -30,14 +28,20 @@
 import org.apache.flink.api.java.ClosureCleaner;
 import org.apache.flink.configuration.Configuration;
 import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import 
org.apache.flink.streaming.connectors.kinesis.config.KinesisConfigConstants;
 import 
org.apache.flink.streaming.connectors.kinesis.serialization.KinesisSerializationSchema;
+import org.apache.flink.streaming.connectors.kinesis.util.AWSUtil;
+import 
org.apache.flink.streaming.connectors.kinesis.util.KinesisConfigUtil;
 import org.apache.flink.streaming.util.serialization.SerializationSchema;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.nio.ByteBuffer;
 import java.util.List;
 import java.util.Objects;
+import java.util.Properties;
+
+import static com.google.common.base.Preconditions.checkNotNull;
--- End diff --

Can you use `Objects.requireNonNull()` instead?

We are trying to get rid of Guava in the long term


> Unify configuration conventions of the Kinesis producer to the same as the 
> consumer
> ---
>
> Key: FLINK-3923
> URL: https://issues.apache.org/jira/browse/FLINK-3923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Abdullah Ozturk
>
> Currently, the Kinesis consumer and producer are configured differently.
> The producer expects a list of arguments for the access key, secret, region, 
> stream. The consumer is accepting properties (similar to the Kafka connector).
> The objective of this issue is to change the producer so that it is also 
> using a properties-based configuration (including an input validation step)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2314] Make Streaming File Sources Persi...

2016-05-23 Thread aljoscha

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/2020#issuecomment-220988362
  
Thanks, the changes look good. 

R: @StephanEwen for taking a look at the API, you would only look at 
`StreamExecutionEnvironment`, for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3923) Unify configuration conventions of the Kinesis producer to the same as the consumer

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296388#comment-15296388
 ] 

ASF GitHub Bot commented on FLINK-3923:
---

Github user aozturk commented on the pull request:

https://github.com/apache/flink/pull/2016#issuecomment-220988206
  
Sure. Thanks for the review @rmetzger


> Unify configuration conventions of the Kinesis producer to the same as the 
> consumer
> ---
>
> Key: FLINK-3923
> URL: https://issues.apache.org/jira/browse/FLINK-3923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Abdullah Ozturk
>
> Currently, the Kinesis consumer and producer are configured differently.
> The producer expects a list of arguments for the access key, secret, region, 
> stream. The consumer is accepting properties (similar to the Kafka connector).
> The objective of this issue is to change the producer so that it is also 
> using a properties-based configuration (including an input validation step)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296390#comment-15296390
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/2020#issuecomment-220988362
  
Thanks, the changes look good. 

R: @StephanEwen for taking a look at the API, you would only look at 
`StreamExecutionEnvironment`, for this.


> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3923] [connector-kinesis] Unify configu...

2016-05-23 Thread rmetzger

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2016#discussion_r64227104
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java
 ---
@@ -30,14 +28,20 @@
 import org.apache.flink.api.java.ClosureCleaner;
 import org.apache.flink.configuration.Configuration;
 import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import 
org.apache.flink.streaming.connectors.kinesis.config.KinesisConfigConstants;
 import 
org.apache.flink.streaming.connectors.kinesis.serialization.KinesisSerializationSchema;
+import org.apache.flink.streaming.connectors.kinesis.util.AWSUtil;
+import 
org.apache.flink.streaming.connectors.kinesis.util.KinesisConfigUtil;
 import org.apache.flink.streaming.util.serialization.SerializationSchema;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.nio.ByteBuffer;
 import java.util.List;
 import java.util.Objects;
+import java.util.Properties;
+
+import static com.google.common.base.Preconditions.checkNotNull;
--- End diff --

Can you use `Objects.requireNonNull()` instead?

We are trying to get rid of Guava in the long term


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3923] [connector-kinesis] Unify configu...

2016-05-23 Thread aozturk

Github user aozturk commented on the pull request:

https://github.com/apache/flink/pull/2016#issuecomment-220988206
  
Sure. Thanks for the review @rmetzger


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296383#comment-15296383
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user kl0u commented on the pull request:

https://github.com/apache/flink/pull/2020#issuecomment-220986678
  
@aljoscha Thanks a lot for the comments! 
I integrated them already.


> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2314] Make Streaming File Sources Persi...

2016-05-23 Thread kl0u

Github user kl0u commented on the pull request:

https://github.com/apache/flink/pull/2020#issuecomment-220986678
  
@aljoscha Thanks a lot for the comments! 
I integrated them already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3923] [connector-kinesis] Unify configu...

2016-05-23 Thread rmetzger

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/2016#issuecomment-220986366
  
I noticed that you didn't update the documentation of the Kinesis Producer.
Can you update the page to reflect the changed usage.
The file is located here: docs/apis/streaming/connectors/kinesis.md


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3923) Unify configuration conventions of the Kinesis producer to the same as the consumer

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296382#comment-15296382
 ] 

ASF GitHub Bot commented on FLINK-3923:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/2016#issuecomment-220986366
  
I noticed that you didn't update the documentation of the Kinesis Producer.
Can you update the page to reflect the changed usage.
The file is located here: docs/apis/streaming/connectors/kinesis.md


> Unify configuration conventions of the Kinesis producer to the same as the 
> consumer
> ---
>
> Key: FLINK-3923
> URL: https://issues.apache.org/jira/browse/FLINK-3923
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>Assignee: Abdullah Ozturk
>
> Currently, the Kinesis consumer and producer are configured differently.
> The producer expects a list of arguments for the access key, secret, region, 
> stream. The consumer is accepting properties (similar to the Kafka connector).
> The objective of this issue is to change the producer so that it is also 
> using a properties-based configuration (including an input validation step)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296362#comment-15296362
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/2020#issuecomment-220981894
  
All in all, very good work!

One thing I'd like to change is the order of parameters in the `readFile` 
methods. For these telescoping methods is usual to append new parameters to the 
end. For example in these two methods, where you add the additional filter 
method the filter would go to the end on the second method because it is an 
additional parameter. This way, users can just append additional parameters to 
existing ones without changing the order.

```
public  DataStreamSource readFile(FileInputFormat 
inputFormat,

String filePath,

WatchType watchType,

long interval)
```

```
public  DataStreamSource readFile(FileInputFormat 
inputFormat,

String filePath,

WatchType watchType,

FilePathFilter filter,

long interval)
```


> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2314] Make Streaming File Sources Persi...

2016-05-23 Thread aljoscha

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/2020#issuecomment-220981894
  
All in all, very good work!

One thing I'd like to change is the order of parameters in the `readFile` 
methods. For these telescoping methods is usual to append new parameters to the 
end. For example in these two methods, where you add the additional filter 
method the filter would go to the end on the second method because it is an 
additional parameter. This way, users can just append additional parameters to 
existing ones without changing the order.

```
public  DataStreamSource readFile(FileInputFormat 
inputFormat,

String filePath,

WatchType watchType,

long interval)
```

```
public  DataStreamSource readFile(FileInputFormat 
inputFormat,

String filePath,

WatchType watchType,

FilePathFilter filter,

long interval)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296357#comment-15296357
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2020#discussion_r64221544
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSource.java
 ---
@@ -44,6 +44,11 @@ public DataStreamSource(StreamExecutionEnvironment 
environment,
}
}
 
+   public DataStreamSource(SingleOutputStreamOperator operator) {
--- End diff --

Here, we should always set `isParallel` to `true`. It is not quite obvious 
but the field is used to disallow changing the parallelism for a 
`SourceFunction` that cannot be parallelized. Our new operator can always run 
in parallel.


> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2314] Make Streaming File Sources Persi...

2016-05-23 Thread aljoscha

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2020#discussion_r64221544
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamSource.java
 ---
@@ -44,6 +44,11 @@ public DataStreamSource(StreamExecutionEnvironment 
environment,
}
}
 
+   public DataStreamSource(SingleOutputStreamOperator operator) {
--- End diff --

Here, we should always set `isParallel` to `true`. It is not quite obvious 
but the field is used to disallow changing the parallelism for a 
`SourceFunction` that cannot be parallelized. Our new operator can always run 
in parallel.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Issue Comment Deleted] (FLINK-3955) Change Table.toSink() to Table.writeToSink()

2016-05-23 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated FLINK-3955:
---
Comment: was deleted

(was: I am not sure if this name solve the issue. What about 
{{Table.toDataSet()}} ?)

> Change Table.toSink() to Table.writeToSink()
> 
>
> Key: FLINK-3955
> URL: https://issues.apache.org/jira/browse/FLINK-3955
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Currently, a {{Table}} can be emitted to a {{TableSink}} using the 
> {{Table.toSink()}} method.
> However, the name of the method indicates that the {{Table}} is converted 
> into a {{Sink}}.
> Therefore, I propose to change the method to {{Table.writeToSink()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3956) Make FileInputFormats in Streaming independent from the Configuration object

2016-05-23 Thread Kostas Kloudas (JIRA)

Kostas Kloudas created FLINK-3956:
-

 Summary: Make FileInputFormats in Streaming independent from the 
Configuration object
 Key: FLINK-3956
 URL: https://issues.apache.org/jira/browse/FLINK-3956
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas


Currently many input file input formats rely on a user specified Configuration 
object for their correct configuration. As an example, the DelimitedInputFormat 
can take its delimiter character from this object. 

This is happening for legacy reasons, and does not work in the streaming 
codebase. In streaming, the user-specified configuration object is over-written 
at the TaskManagers by a new Configuration object.

This issue aims at solving this by refactoring current file input formats to be 
able to set their parameters through setters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3955) Change Table.toSink() to Table.writeToSink()

2016-05-23 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296356#comment-15296356
 ] 

Matthias J. Sax commented on FLINK-3955:


I am not sure if this name solve the issue. What about {{Table.toDataSet()}} ?

> Change Table.toSink() to Table.writeToSink()
> 
>
> Key: FLINK-3955
> URL: https://issues.apache.org/jira/browse/FLINK-3955
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Currently, a {{Table}} can be emitted to a {{TableSink}} using the 
> {{Table.toSink()}} method.
> However, the name of the method indicates that the {{Table}} is converted 
> into a {{Sink}}.
> Therefore, I propose to change the method to {{Table.writeToSink()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296355#comment-15296355
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2020#discussion_r64220867
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/FilePathFilter.java
 ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.api.functions.source;
+
+import org.apache.flink.core.fs.Path;
+
+import java.io.Serializable;
+
+/**
+ * An interface to be implemented by the user when using the {@link 
FileSplitMonitoringFunction}.
+ * The {@link #filterPath(Path)} method is responsible for deciding if a 
path is eligible for further
+ * processing or not. This can serve to exclude temporary or partial files 
that
+ * are still being written.
+ *
+ *
+ * A default implementation is the {@link DefaultFilter} which excludes 
files starting with ".", "_", or
+ * contain the "_COPYING_" in their names. This can be retrieved by {@link 
DefaultFilter#getInstance()}.
+ * */
+public interface FilePathFilter extends Serializable {
--- End diff --

We should probably make this @PublicEvolving, just to be on the save site. 
I can fix it up when merging.


> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2314] Make Streaming File Sources Persi...

2016-05-23 Thread aljoscha

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2020#discussion_r64220867
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/FilePathFilter.java
 ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.api.functions.source;
+
+import org.apache.flink.core.fs.Path;
+
+import java.io.Serializable;
+
+/**
+ * An interface to be implemented by the user when using the {@link 
FileSplitMonitoringFunction}.
+ * The {@link #filterPath(Path)} method is responsible for deciding if a 
path is eligible for further
+ * processing or not. This can serve to exclude temporary or partial files 
that
+ * are still being written.
+ *
+ *
+ * A default implementation is the {@link DefaultFilter} which excludes 
files starting with ".", "_", or
+ * contain the "_COPYING_" in their names. This can be retrieved by {@link 
DefaultFilter#getInstance()}.
+ * */
+public interface FilePathFilter extends Serializable {
--- End diff --

We should probably make this @PublicEvolving, just to be on the save site. 
I can fix it up when merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3896) Allow a StreamTask to be Externally Cancelled.

2016-05-23 Thread Kostas Kloudas (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296350#comment-15296350
 ] 

Kostas Kloudas commented on FLINK-3896:
---

This issue came up while implementing FLINK-2314 and the PR for the latter also 
contains the solution for this one: 

https://github.com/apache/flink/pull/2020

> Allow a StreamTask to be Externally Cancelled.
> --
>
> Key: FLINK-3896
> URL: https://issues.apache.org/jira/browse/FLINK-3896
> Project: Flink
>  Issue Type: Bug
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>
> When implementing a custom operator, it may be useful to be able to cancel 
> the task it runs into due to an external event. As an example imagine an 
> Operator that spawns a thread to do some work and forward the element to the 
> next one. 
> In this case, if an exception is thrown by that thread, then this will not be 
> automatically propagated to the main thread, in order to cancel the task. In 
> this case, it would be useful to cancel the task by that thread. This issue 
> aims at adding exactly 
> this functionality to the StreamTask. Currently a Task can do so, but this is 
> not accessible to the StreamTask.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3955) Change Table.toSink() to Table.writeToSink()

2016-05-23 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-3955:


 Summary: Change Table.toSink() to Table.writeToSink()
 Key: FLINK-3955
 URL: https://issues.apache.org/jira/browse/FLINK-3955
 Project: Flink
  Issue Type: Improvement
  Components: Table API
Affects Versions: 1.1.0
Reporter: Fabian Hueske
Assignee: Fabian Hueske


Currently, a {{Table}} can be emitted to a {{TableSink}} using the 
{{Table.toSink()}} method.
However, the name of the method indicates that the {{Table}} is converted into 
a {{Sink}}.

Therefore, I propose to change the method to {{Table.writeToSink()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296335#comment-15296335
 ] 

ASF GitHub Bot commented on FLINK-1502:
---

Github user zentol closed the pull request at:

https://github.com/apache/flink/pull/1947


> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 1.1.0
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3311) Add a connector for streaming data into Cassandra

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296342#comment-15296342
 ] 

ASF GitHub Bot commented on FLINK-3311:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1771#issuecomment-220975120
  
I finally was able to fix the restart issue. there were 2 massive bugs in 
the CassandraCommitter:
- within open() the checkpoint entry was always overridden
- within close() the checkpoint entry was always deleted

in addition i have made the following changes:
- renamed GenericAtLeastOnceSink to GenericWriteAheadSink
- implemented a caching of the last committed checkpointID in the 
CassandraCommitter

It's rather obvious that more tests are required.


> Add a connector for streaming data into Cassandra
> -
>
> Key: FLINK-3311
> URL: https://issues.apache.org/jira/browse/FLINK-3311
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Andrea Sella
>
> We had users in the past asking for a Flink+Cassandra integration.
> It seems that there is a well-developed java client for connecting into 
> Cassandra: https://github.com/datastax/java-driver (ASL 2.0)
> There are also tutorials out there on how to start a local cassandra instance 
> (for the tests): 
> http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
> For the data types, I think we should support TupleX types, and map standard 
> java types to the respective cassandra types.
> In addition, it seems that there is a object mapper from datastax to store 
> POJOs in Cassandra (there are annotations for defining the primary key and 
> types)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3311/FLINK-3332] Add Cassandra connecto...

2016-05-23 Thread zentol

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1771#issuecomment-220975120
  
I finally was able to fix the restart issue. there were 2 massive bugs in 
the CassandraCommitter:
- within open() the checkpoint entry was always overridden
- within close() the checkpoint entry was always deleted

in addition i have made the following changes:
- renamed GenericAtLeastOnceSink to GenericWriteAheadSink
- implemented a caching of the last committed checkpointID in the 
CassandraCommitter

It's rather obvious that more tests are required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296341#comment-15296341
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

GitHub user kl0u opened a pull request:

https://github.com/apache/flink/pull/2020

[FLINK-2314] Make Streaming File Sources Persistent

This PR solves FLINK-2314 and combines a number of sub-tasks. In addition, 
it solves FLINK-3896 which was introduced as part of this task.

The way File Input sources are now processed is the following:
 * One task monitors (parallelism 1) a user-specified path for new 
files/data
 * The above task assigns FileInputSplits to downstream (parallel) 
readers to actually read the data

The monitoring entity scans the path, splits the files to be processed in 
splits, and assigns them downstream. For now, two modes are supported. These 
are the PROCESS_ONCE which just processes the current contents of the path and 
exits, and the REPROCESS_WITH_APPENDED which periodically monitors the path and 
reprocesses new files and (the entire contents of) files with new data.

In addition, these sources are checkpointed, i.e. in the case of a task 
failure the job will resume from where it left off.

Finally, some changes were introduced in the way we are handling 
FileInputFormats after discussions with @aljoscha .

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kl0u/flink api_ft_files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2020.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2020


commit 0d378f85ef7beec598701d78e0537f7479be99d9
Author: kl0u 
Date:   2016-05-10T16:56:58Z

[FLINK-3896] Allow a StreamTask to be Externally Cancelled

It adds a method failExternally() to the StreamTask, so that custom 
Operators
can make their containing task fail when needed.

commit 1a06e70d4cc72593663ed5065e9c68c5b9fadac1
Author: kl0u 
Date:   2016-04-10T14:56:42Z

[FLINK-3717] Make FileInputFormat checkpointable

This adds a new interface called CheckpointableInputFormat
which describes input formats whose state is queryable,
i.e. getCurrentChannelState() returns where the reader is
in the underlying source, and they can resume reading from
a user-specified position.

This functionality is not yet leveraged by current readers.

commit 13db59ff214c6c1790b73e8b06984a7170924c5a
Author: kl0u 
Date:   2016-04-18T14:37:54Z

[FLINK-3889] Refactor File Monitoring Source

This is meant to replace the different file
reading sources in Flink streaming. Now there is
one monitoring source with DOP 1 monitoring a
directory and assigning input split to downstream
readers.

In addition, it makes the new features added by
FLINK-3717 work together with the aforementioned entities
(the monitor and the readers) in order to have
fault tolerant file sources and exactly once guarantees.

This does not replace the old API calls. This
will be done in a future commit.

commit 0c8e852b96752a716c36452f7ced11c79cca5560
Author: kl0u 
Date:   2016-05-18T14:44:45Z

[FLINK-2314] Make Streaming File Sources Persistent




> Make Streaming File Sources Persistent
> --
>
> Key: FLINK-2314
> URL: https://issues.apache.org/jira/browse/FLINK-2314
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>
> Streaming File sources should participate in the checkpointing. They should 
> track the bytes they read from the file and checkpoint it.
> One can look at the sequence generating source function for an example of a 
> checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2314] Make Streaming File Sources Persi...

2016-05-23 Thread kl0u

GitHub user kl0u opened a pull request:

https://github.com/apache/flink/pull/2020

[FLINK-2314] Make Streaming File Sources Persistent

This PR solves FLINK-2314 and combines a number of sub-tasks. In addition, 
it solves FLINK-3896 which was introduced as part of this task.

The way File Input sources are now processed is the following:
 * One task monitors (parallelism 1) a user-specified path for new 
files/data
 * The above task assigns FileInputSplits to downstream (parallel) 
readers to actually read the data

The monitoring entity scans the path, splits the files to be processed in 
splits, and assigns them downstream. For now, two modes are supported. These 
are the PROCESS_ONCE which just processes the current contents of the path and 
exits, and the REPROCESS_WITH_APPENDED which periodically monitors the path and 
reprocesses new files and (the entire contents of) files with new data.

In addition, these sources are checkpointed, i.e. in the case of a task 
failure the job will resume from where it left off.

Finally, some changes were introduced in the way we are handling 
FileInputFormats after discussions with @aljoscha .

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kl0u/flink api_ft_files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2020.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2020


commit 0d378f85ef7beec598701d78e0537f7479be99d9
Author: kl0u 
Date:   2016-05-10T16:56:58Z

[FLINK-3896] Allow a StreamTask to be Externally Cancelled

It adds a method failExternally() to the StreamTask, so that custom 
Operators
can make their containing task fail when needed.

commit 1a06e70d4cc72593663ed5065e9c68c5b9fadac1
Author: kl0u 
Date:   2016-04-10T14:56:42Z

[FLINK-3717] Make FileInputFormat checkpointable

This adds a new interface called CheckpointableInputFormat
which describes input formats whose state is queryable,
i.e. getCurrentChannelState() returns where the reader is
in the underlying source, and they can resume reading from
a user-specified position.

This functionality is not yet leveraged by current readers.

commit 13db59ff214c6c1790b73e8b06984a7170924c5a
Author: kl0u 
Date:   2016-04-18T14:37:54Z

[FLINK-3889] Refactor File Monitoring Source

This is meant to replace the different file
reading sources in Flink streaming. Now there is
one monitoring source with DOP 1 monitoring a
directory and assigning input split to downstream
readers.

In addition, it makes the new features added by
FLINK-3717 work together with the aforementioned entities
(the monitor and the readers) in order to have
fault tolerant file sources and exactly once guarantees.

This does not replace the old API calls. This
will be done in a future commit.

commit 0c8e852b96752a716c36452f7ced11c79cca5560
Author: kl0u 
Date:   2016-05-18T14:44:45Z

[FLINK-2314] Make Streaming File Sources Persistent




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3311) Add a connector for streaming data into Cassandra

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296337#comment-15296337
 ] 

ASF GitHub Bot commented on FLINK-3311:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/1771#discussion_r64217845
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/GenericAtLeastOnceSink.java
 ---
@@ -0,0 +1,192 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.runtime.operators;
+
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.memory.DataInputView;
+import org.apache.flink.runtime.io.disk.InputViewIterator;
+import org.apache.flink.runtime.state.AbstractStateBackend;
+import org.apache.flink.runtime.state.StateHandle;
+import 
org.apache.flink.runtime.util.ReusingMutableToRegularIteratorWrapper;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.streaming.runtime.tasks.StreamTaskState;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.TreeMap;
+import java.util.UUID;
+
+/**
+ * Generic Sink that emits its input elements into an arbitrary backend. 
This sink is integrated with the checkpointing
+ * mechanism and can provide exactly-once guarantees; depending on the 
storage backend and sink/committer implementation.
+ * 
+ * Incoming records are stored within a {@link 
org.apache.flink.runtime.state.AbstractStateBackend}, and only committed if a
+ * checkpoint is completed.
+ *
+ * @param  Type of the elements emitted by this sink
+ */
+public abstract class GenericAtLeastOnceSink extends 
AbstractStreamOperator implements OneInputStreamOperator {
+   protected static final Logger LOG = 
LoggerFactory.getLogger(GenericAtLeastOnceSink.class);
+   private final CheckpointCommitter committer;
+   private transient AbstractStateBackend.CheckpointStateOutputView out;
+   protected final TypeSerializer serializer;
+   private final String id;
+
+   private ExactlyOnceState state = new ExactlyOnceState();
+
+   public GenericAtLeastOnceSink(CheckpointCommitter committer, 
TypeSerializer serializer, String jobID) throws Exception {
+   this.committer = committer;
+   this.serializer = serializer;
+   this.id = UUID.randomUUID().toString();
+   this.committer.setJobId(jobID);
+   this.committer.createResource();
+   }
+
+   @Override
+   public void open() throws Exception {
+   committer.setOperatorId(id);
+   
committer.setOperatorSubtaskId(getRuntimeContext().getIndexOfThisSubtask());
+   committer.open();
+   }
+
+   public void close() throws Exception {
+   committer.close();
+   }
+
+   /**
+* Saves a handle in the state.
+*
+* @param checkpointId
+* @throws IOException
+*/
+   private void saveHandleInState(final long checkpointId, final long 
timestamp) throws Exception {
+   //only add handle if a new OperatorState was created since the 
last snapshot
+   if (out != null) {
+   StateHandle handle = 
out.closeAndGetHandle();
+   if (state.pendingHandles.containsKey(checkpointId)) {
+   //we already have a checkpoint stored for that 
ID that may have been partially written,
+   //so we

[GitHub] flink pull request: [FLINK-3311/FLINK-3332] Add Cassandra connecto...

2016-05-23 Thread zentol

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/1771#discussion_r64217845
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/GenericAtLeastOnceSink.java
 ---
@@ -0,0 +1,192 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.runtime.operators;
+
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.core.memory.DataInputView;
+import org.apache.flink.runtime.io.disk.InputViewIterator;
+import org.apache.flink.runtime.state.AbstractStateBackend;
+import org.apache.flink.runtime.state.StateHandle;
+import 
org.apache.flink.runtime.util.ReusingMutableToRegularIteratorWrapper;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.streaming.runtime.tasks.StreamTaskState;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.TreeMap;
+import java.util.UUID;
+
+/**
+ * Generic Sink that emits its input elements into an arbitrary backend. 
This sink is integrated with the checkpointing
+ * mechanism and can provide exactly-once guarantees; depending on the 
storage backend and sink/committer implementation.
+ * 
+ * Incoming records are stored within a {@link 
org.apache.flink.runtime.state.AbstractStateBackend}, and only committed if a
+ * checkpoint is completed.
+ *
+ * @param  Type of the elements emitted by this sink
+ */
+public abstract class GenericAtLeastOnceSink extends 
AbstractStreamOperator implements OneInputStreamOperator {
+   protected static final Logger LOG = 
LoggerFactory.getLogger(GenericAtLeastOnceSink.class);
+   private final CheckpointCommitter committer;
+   private transient AbstractStateBackend.CheckpointStateOutputView out;
+   protected final TypeSerializer serializer;
+   private final String id;
+
+   private ExactlyOnceState state = new ExactlyOnceState();
+
+   public GenericAtLeastOnceSink(CheckpointCommitter committer, 
TypeSerializer serializer, String jobID) throws Exception {
+   this.committer = committer;
+   this.serializer = serializer;
+   this.id = UUID.randomUUID().toString();
+   this.committer.setJobId(jobID);
+   this.committer.createResource();
+   }
+
+   @Override
+   public void open() throws Exception {
+   committer.setOperatorId(id);
+   
committer.setOperatorSubtaskId(getRuntimeContext().getIndexOfThisSubtask());
+   committer.open();
+   }
+
+   public void close() throws Exception {
+   committer.close();
+   }
+
+   /**
+* Saves a handle in the state.
+*
+* @param checkpointId
+* @throws IOException
+*/
+   private void saveHandleInState(final long checkpointId, final long 
timestamp) throws Exception {
+   //only add handle if a new OperatorState was created since the 
last snapshot
+   if (out != null) {
+   StateHandle handle = 
out.closeAndGetHandle();
+   if (state.pendingHandles.containsKey(checkpointId)) {
+   //we already have a checkpoint stored for that 
ID that may have been partially written,
+   //so we discard this "alternate version" and 
use the stored checkpoint
+   handle.discardState();
+   } else {
+   state.pendingHandles.put(checkpointId, new

[GitHub] flink pull request: [FLINK-1502] [WIP] Basic Metric System

2016-05-23 Thread zentol

Github user zentol closed the pull request at:

https://github.com/apache/flink/pull/1947


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3632) Clean up Table API exceptions

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296254#comment-15296254
 ] 

ASF GitHub Bot commented on FLINK-3632:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/2015#issuecomment-220956222
  
Will merge this PR.


> Clean up Table API exceptions
> -
>
> Key: FLINK-3632
> URL: https://issues.apache.org/jira/browse/FLINK-3632
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Reporter: Fabian Hueske
>Assignee: Yijie Shen
> Fix For: 1.1.0
>
>
> The Table API throws many different exception types including:
> - {{IllegalArgumentException}}
> - {{TableException}}
> - {{CodeGenException}}
> - {{PlanGenException}}
> - {{ExpressionParserException}}
> from various places of the query translation code. 
> This needs to be cleaned up. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3632][TableAPI]Clean up Table API excep...

2016-05-23 Thread fhueske

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/2015#issuecomment-220956222
  
Will merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3939] [tableAPI] Prevent translation of...

2016-05-23 Thread fhueske

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/2014#issuecomment-220956186
  
Thanks for the review @yjshen!
Will address the comment and merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3939) Prevent distinct aggregates and grouping sets from being translated

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296253#comment-15296253
 ] 

ASF GitHub Bot commented on FLINK-3939:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/2014#issuecomment-220956186
  
Thanks for the review @yjshen!
Will address the comment and merge this PR.


> Prevent distinct aggregates and grouping sets from being translated
> ---
>
> Key: FLINK-3939
> URL: https://issues.apache.org/jira/browse/FLINK-3939
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
> Fix For: 1.1.0
>
>
> Flink's SQL interface is currently not capable of executing distinct 
> aggregates and grouping sets.
> We need to prevent that queries with these operations are translated by 
> adapting the DataSetAggregateRule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3939) Prevent distinct aggregates and grouping sets from being translated

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296251#comment-15296251
 ] 

ASF GitHub Bot commented on FLINK-3939:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/2014#discussion_r64207510
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/rules/dataSet/DataSetAggregateRule.scala
 ---
@@ -33,20 +33,32 @@ class DataSetAggregateRule
   "DataSetAggregateRule")
   {
 
-def convert(rel: RelNode): RelNode = {
-  val agg: LogicalAggregate = rel.asInstanceOf[LogicalAggregate]
-  val traitSet: RelTraitSet = 
rel.getTraitSet.replace(DataSetConvention.INSTANCE)
-  val convInput: RelNode = RelOptRule.convert(agg.getInput, 
DataSetConvention.INSTANCE)
-
-  new DataSetAggregate(
-rel.getCluster,
-traitSet,
-convInput,
-agg.getNamedAggCalls,
-rel.getRowType,
-agg.getInput.getRowType,
-agg.getGroupSet.toArray)
-  }
+  override def matches(call: RelOptRuleCall): Boolean = {
--- End diff --

That's a good idea. I won't work for all unsupported operators (inner 
equi-joins are initially Cartesian products + filters before translation rules 
merge join and filter), but some operations such as distinct aggregates are not 
rewritten be rules such that we can throw an exception when they are observed.


> Prevent distinct aggregates and grouping sets from being translated
> ---
>
> Key: FLINK-3939
> URL: https://issues.apache.org/jira/browse/FLINK-3939
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
> Fix For: 1.1.0
>
>
> Flink's SQL interface is currently not capable of executing distinct 
> aggregates and grouping sets.
> We need to prevent that queries with these operations are translated by 
> adapting the DataSetAggregateRule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3939] [tableAPI] Prevent translation of...

2016-05-23 Thread fhueske

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/2014#discussion_r64207510
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/rules/dataSet/DataSetAggregateRule.scala
 ---
@@ -33,20 +33,32 @@ class DataSetAggregateRule
   "DataSetAggregateRule")
   {
 
-def convert(rel: RelNode): RelNode = {
-  val agg: LogicalAggregate = rel.asInstanceOf[LogicalAggregate]
-  val traitSet: RelTraitSet = 
rel.getTraitSet.replace(DataSetConvention.INSTANCE)
-  val convInput: RelNode = RelOptRule.convert(agg.getInput, 
DataSetConvention.INSTANCE)
-
-  new DataSetAggregate(
-rel.getCluster,
-traitSet,
-convInput,
-agg.getNamedAggCalls,
-rel.getRowType,
-agg.getInput.getRowType,
-agg.getGroupSet.toArray)
-  }
+  override def matches(call: RelOptRuleCall): Boolean = {
--- End diff --

That's a good idea. I won't work for all unsupported operators (inner 
equi-joins are initially Cartesian products + filters before translation rules 
merge join and filter), but some operations such as distinct aggregates are not 
rewritten be rules such that we can throw an exception when they are observed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-3954) Installing Flink on EMR using EMR bootstrap action

2016-05-23 Thread Akshay Shingote (JIRA)

Akshay Shingote created FLINK-3954:
--

 Summary: Installing Flink on EMR using EMR bootstrap action
 Key: FLINK-3954
 URL: https://issues.apache.org/jira/browse/FLINK-3954
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.0.2
 Environment: AWS EMR 4.6
Reporter: Akshay Shingote


Hello, I want to know can Flink be installed on AWS EMR through EMR's bootstrap 
action?? This will help getting Flink installed on EMR's master & core 
nodes..so is there any way through which we can install Flink on AWS EMR 
through bootstrap action ??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-3927) TaskManager registration may fail if Yarn versions don't match

2016-05-23 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3927.
-
Resolution: Fixed

Fixed via 017106e140f3c17ebaaa0507e1dcbbc445c8f0ac

> TaskManager registration may fail if Yarn versions don't match
> --
>
> Key: FLINK-3927
> URL: https://issues.apache.org/jira/browse/FLINK-3927
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Flink's ResourceManager uses the Yarn container ids to identify connecting 
> task managers. Yarn's stringified container id may not be consistent across 
> different Hadoop versions, e.g. Hadoop 2.3.0 and Hadoop 2.7.1. The 
> ResourceManager gets it from the Yarn reports while the TaskManager infers it 
> from the Yarn environment variables. The ResourceManager may use Hadoop 2.3.0 
> version while the cluster runs Hadoop 2.7.1. 
> The solution is to pass the ID through a custom environment variable which is 
> set by the ResourceManager before launching the TaskManager in the container. 
> That way we will always use the Hadoop client's id generation method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3953) Surefire plugin executes unit tests twice

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296211#comment-15296211
 ] 

ASF GitHub Bot commented on FLINK-3953:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2019


> Surefire plugin executes unit tests twice
> -
>
> Key: FLINK-3953
> URL: https://issues.apache.org/jira/browse/FLINK-3953
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> After FLINK-3909 the unit tests are executed twice. There are now two 
> executions defined for the Surefire plugin: {{unit-tests}} and 
> {{integration-tests}}. In addition, there is a default execution called 
> {{default-test}}. This leads to the unit tests to be executed twice. Either 
> renaming unit-tests to default-test or skipping default-test would fix the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-3953) Surefire plugin executes unit tests twice

2016-05-23 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed FLINK-3953.
-
Resolution: Fixed

Fixed via 5fdf39b1fec032f5816cb188334c129ff9186415

> Surefire plugin executes unit tests twice
> -
>
> Key: FLINK-3953
> URL: https://issues.apache.org/jira/browse/FLINK-3953
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> After FLINK-3909 the unit tests are executed twice. There are now two 
> executions defined for the Surefire plugin: {{unit-tests}} and 
> {{integration-tests}}. In addition, there is a default execution called 
> {{default-test}}. This leads to the unit tests to be executed twice. Either 
> renaming unit-tests to default-test or skipping default-test would fix the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3927) TaskManager registration may fail if Yarn versions don't match

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296212#comment-15296212
 ] 

ASF GitHub Bot commented on FLINK-3927:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2013


> TaskManager registration may fail if Yarn versions don't match
> --
>
> Key: FLINK-3927
> URL: https://issues.apache.org/jira/browse/FLINK-3927
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Flink's ResourceManager uses the Yarn container ids to identify connecting 
> task managers. Yarn's stringified container id may not be consistent across 
> different Hadoop versions, e.g. Hadoop 2.3.0 and Hadoop 2.7.1. The 
> ResourceManager gets it from the Yarn reports while the TaskManager infers it 
> from the Yarn environment variables. The ResourceManager may use Hadoop 2.3.0 
> version while the cluster runs Hadoop 2.7.1. 
> The solution is to pass the ID through a custom environment variable which is 
> set by the ResourceManager before launching the TaskManager in the container. 
> That way we will always use the Hadoop client's id generation method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3927][yarn] make container id consisten...

2016-05-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2013


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3953] rename unit-tests execution to de...

2016-05-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2019


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3927) TaskManager registration may fail if Yarn versions don't match

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296207#comment-15296207
 ] 

ASF GitHub Bot commented on FLINK-3927:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/2013#issuecomment-220944377
  
+1 to merge


> TaskManager registration may fail if Yarn versions don't match
> --
>
> Key: FLINK-3927
> URL: https://issues.apache.org/jira/browse/FLINK-3927
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Flink's ResourceManager uses the Yarn container ids to identify connecting 
> task managers. Yarn's stringified container id may not be consistent across 
> different Hadoop versions, e.g. Hadoop 2.3.0 and Hadoop 2.7.1. The 
> ResourceManager gets it from the Yarn reports while the TaskManager infers it 
> from the Yarn environment variables. The ResourceManager may use Hadoop 2.3.0 
> version while the cluster runs Hadoop 2.7.1. 
> The solution is to pass the ID through a custom environment variable which is 
> set by the ResourceManager before launching the TaskManager in the container. 
> That way we will always use the Hadoop client's id generation method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3927][yarn] make container id consisten...

2016-05-23 Thread rmetzger

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/2013#issuecomment-220944377
  
+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3953) Surefire plugin executes unit tests twice

2016-05-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296191#comment-15296191
 ] 

ASF GitHub Bot commented on FLINK-3953:
---

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/2019

[FLINK-3953] rename unit-tests execution to default-test

After 38698c0b101cbb48f8c10adf4060983ac07e2f4b, there are now two
executions defined for the Surefire plugin: unit-tests and
integration-tests. In addition, there is an implicit default execution
called default-test. This leads to the unit tests to be executed
twice. This renames unit-tests to default-test to prevent duplicate
execution.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink FLINK-3953

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2019.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2019


commit c1d099431a48583075fd51267914df946f14aa4f
Author: Maximilian Michels 
Date:   2016-05-23T10:06:25Z

[FLINK-3953] rename unit-tests execution to default-test

After 38698c0b101cbb48f8c10adf4060983ac07e2f4b, there are now two
executions defined for the Surefire plugin: unit-tests and
integration-tests. In addition, there is an implicit default execution
called default-test. This leads to the unit tests to be executed
twice. This renames unit-tests to default-test to prevent duplicate
execution.




> Surefire plugin executes unit tests twice
> -
>
> Key: FLINK-3953
> URL: https://issues.apache.org/jira/browse/FLINK-3953
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> After FLINK-3909 the unit tests are executed twice. There are now two 
> executions defined for the Surefire plugin: {{unit-tests}} and 
> {{integration-tests}}. In addition, there is a default execution called 
> {{default-test}}. This leads to the unit tests to be executed twice. Either 
> renaming unit-tests to default-test or skipping default-test would fix the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-3953] rename unit-tests execution to de...

2016-05-23 Thread mxm

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/2019

[FLINK-3953] rename unit-tests execution to default-test

After 38698c0b101cbb48f8c10adf4060983ac07e2f4b, there are now two
executions defined for the Surefire plugin: unit-tests and
integration-tests. In addition, there is an implicit default execution
called default-test. This leads to the unit tests to be executed
twice. This renames unit-tests to default-test to prevent duplicate
execution.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink FLINK-3953

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2019.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2019


commit c1d099431a48583075fd51267914df946f14aa4f
Author: Maximilian Michels 
Date:   2016-05-23T10:06:25Z

[FLINK-3953] rename unit-tests execution to default-test

After 38698c0b101cbb48f8c10adf4060983ac07e2f4b, there are now two
executions defined for the Surefire plugin: unit-tests and
integration-tests. In addition, there is an implicit default execution
called default-test. This leads to the unit tests to be executed
twice. This renames unit-tests to default-test to prevent duplicate
execution.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-3953) Surefire plugin executes unit tests twice

2016-05-23 Thread Maximilian Michels (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296185#comment-15296185
 ] 

Maximilian Michels commented on FLINK-3953:
---

Follow-up of FLINK-3909.

> Surefire plugin executes unit tests twice
> -
>
> Key: FLINK-3953
> URL: https://issues.apache.org/jira/browse/FLINK-3953
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Tests
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> After FLINK-3909 the unit tests are executed twice. There are now two 
> executions defined for the Surefire plugin: {{unit-tests}} and 
> {{integration-tests}}. In addition, there is a default execution called 
> {{default-test}}. This leads to the unit tests to be executed twice. Either 
> renaming unit-tests to default-test or skipping default-test would fix the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3953) Surefire plugin executes unit tests twice

2016-05-23 Thread Maximilian Michels (JIRA)

Maximilian Michels created FLINK-3953:
-

 Summary: Surefire plugin executes unit tests twice
 Key: FLINK-3953
 URL: https://issues.apache.org/jira/browse/FLINK-3953
 Project: Flink
  Issue Type: Bug
  Components: Build System, Tests
Affects Versions: 1.1.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
 Fix For: 1.1.0


After FLINK-3909 the unit tests are executed twice. There are now two 
executions defined for the Surefire plugin: {{unit-tests}} and 
{{integration-tests}}. In addition, there is a default execution called 
{{default-test}}. This leads to the unit tests to be executed twice. Either 
renaming unit-tests to default-test or skipping default-test would fix the 
problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

92 matches

Mail list logo