[GitHub] flink pull request #3003: [Flink-5188] Create analog of RowCsvInputFormat in...

2016-12-13 Thread tonycox
GitHub user tonycox opened a pull request:

https://github.com/apache/flink/pull/3003

[Flink-5188] Create analog of RowCsvInputFormat in java and adjust all the 
imports of Row and RowTypeInfo

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tonycox/flink FLINK-5188

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3003.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3003


commit 6b3c98c43965d2adbed20019cf0e22b512ee7744
Author: Jark Wu 
Date:   2016-12-08T14:44:29Z

[FLINK-5187] [core] Create analog of Row and RowTypeInfo and RowComparator 
in core

commit 8da708eaac266b221b6146b4e32600c0fc118a75
Author: Jark Wu 
Date:   2016-12-09T03:43:09Z

address review comments and add RowTypeInfoTest

commit 83bbf7682b4577a320b3fd6227f17e47b73e35fe
Author: Jark Wu 
Date:   2016-12-09T11:29:58Z

fix tabs

commit 739af6821fe796f2d2dd17d2fcacc47ad1fa034b
Author: Jark Wu 
Date:   2016-12-12T14:48:38Z

address review comments

commit 64270234172fecdf0e7fe18c1d80463a6bb0572c
Author: Jark Wu 
Date:   2016-12-13T05:36:06Z

remove unused ArrayList import

commit 9187914e3a34ccfe43a963a4bff8080225457eb9
Author: tonycox 
Date:   2016-12-09T17:41:36Z

[FLINK-5188] Adjust Row and RowTypeInfo dependencies

commit 55dffb6c3c882dbf720a13b0f7ba850d5aa51cf7
Author: tonycox 
Date:   2016-12-13T12:05:58Z

Move RowCsvInputFormat to java.io




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3002: Fix typo in variable name

2016-12-13 Thread rboorgapally
GitHub user rboorgapally opened a pull request:

https://github.com/apache/flink/pull/3002

Fix typo in variable name

- Fixed a typo in the variable name
- `mvn clean verify` has been executed successfully locally 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rboorgapally/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3002.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3002


commit 0c17471521fadb3a34784ee850ff43324e1e2a31
Author: Raghav 
Date:   2016-12-14T05:31:25Z

Fix typo in variable name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4391) Provide support for asynchronous operations over streams

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747053#comment-15747053
 ] 

ASF GitHub Bot commented on FLINK-4391:
---

Github user bjlovegithub commented on a diff in the pull request:

https://github.com/apache/flink/pull/2629#discussion_r92315209
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/async/RichAsyncFunction.java
 ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.functions.async;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.api.common.functions.AbstractRichFunction;
+import org.apache.flink.api.common.functions.IterationRuntimeContext;
+import org.apache.flink.api.common.functions.RichFunction;
+import org.apache.flink.api.common.functions.RuntimeContext;
+import 
org.apache.flink.streaming.api.functions.async.collector.AsyncCollector;
+
+/**
+ * Rich variant of the {@link AsyncFunction}. As a {@link RichFunction}, 
it gives access to the
+ * {@link org.apache.flink.api.common.functions.RuntimeContext} and 
provides setup and teardown methods:
+ * {@link RichFunction#open(org.apache.flink.configuration.Configuration)} 
and
+ * {@link RichFunction#close()}.
+ *
+ * 
+ * {@link RichAsyncFunction#getRuntimeContext()} and {@link 
RichAsyncFunction#getRuntimeContext()} are
+ * not supported because the key may get changed while accessing states in 
the working thread.
+ *
+ * @param  The type of the input elements.
+ * @param  The type of the returned elements.
+ */
+
+@PublicEvolving
+public abstract class RichAsyncFunction extends 
AbstractRichFunction
+   implements AsyncFunction {
+
+   @Override
+   public abstract void asyncInvoke(IN input, AsyncCollector 
collector) throws Exception;
+
+   @Override
+   public RuntimeContext getRuntimeContext() {
--- End diff --

Personally, I prefer the second option... It is handy for the user codes, 
even though the limitations present.


> Provide support for asynchronous operations over streams
> 
>
> Key: FLINK-4391
> URL: https://issues.apache.org/jira/browse/FLINK-4391
> Project: Flink
>  Issue Type: New Feature
>  Components: DataStream API
>Reporter: Jamie Grier
>Assignee: david.wang
>
> Many Flink users need to do asynchronous processing driven by data from a 
> DataStream.  The classic example would be joining against an external 
> database in order to enrich a stream with extra information.
> It would be nice to add general support for this type of operation in the 
> Flink API.  Ideally this could simply take the form of a new operator that 
> manages async operations, keeps so many of them in flight, and then emits 
> results to downstream operators as the async operations complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2629: [FLINK-4391] Provide support for asynchronous oper...

2016-12-13 Thread bjlovegithub
Github user bjlovegithub commented on a diff in the pull request:

https://github.com/apache/flink/pull/2629#discussion_r92315209
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/async/RichAsyncFunction.java
 ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.functions.async;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.api.common.functions.AbstractRichFunction;
+import org.apache.flink.api.common.functions.IterationRuntimeContext;
+import org.apache.flink.api.common.functions.RichFunction;
+import org.apache.flink.api.common.functions.RuntimeContext;
+import 
org.apache.flink.streaming.api.functions.async.collector.AsyncCollector;
+
+/**
+ * Rich variant of the {@link AsyncFunction}. As a {@link RichFunction}, 
it gives access to the
+ * {@link org.apache.flink.api.common.functions.RuntimeContext} and 
provides setup and teardown methods:
+ * {@link RichFunction#open(org.apache.flink.configuration.Configuration)} 
and
+ * {@link RichFunction#close()}.
+ *
+ * 
+ * {@link RichAsyncFunction#getRuntimeContext()} and {@link 
RichAsyncFunction#getRuntimeContext()} are
+ * not supported because the key may get changed while accessing states in 
the working thread.
+ *
+ * @param  The type of the input elements.
+ * @param  The type of the returned elements.
+ */
+
+@PublicEvolving
+public abstract class RichAsyncFunction extends 
AbstractRichFunction
+   implements AsyncFunction {
+
+   @Override
+   public abstract void asyncInvoke(IN input, AsyncCollector 
collector) throws Exception;
+
+   @Override
+   public RuntimeContext getRuntimeContext() {
--- End diff --

Personally, I prefer the second option... It is handy for the user codes, 
even though the limitations present.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2629: [FLINK-4391] Provide support for asynchronous oper...

2016-12-13 Thread bjlovegithub
Github user bjlovegithub commented on a diff in the pull request:

https://github.com/apache/flink/pull/2629#discussion_r92314677
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/async/RichAsyncFunction.java
 ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.functions.async;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.api.common.functions.AbstractRichFunction;
+import org.apache.flink.api.common.functions.IterationRuntimeContext;
+import org.apache.flink.api.common.functions.RichFunction;
+import org.apache.flink.api.common.functions.RuntimeContext;
+import 
org.apache.flink.streaming.api.functions.async.collector.AsyncCollector;
+
+/**
+ * Rich variant of the {@link AsyncFunction}. As a {@link RichFunction}, 
it gives access to the
+ * {@link org.apache.flink.api.common.functions.RuntimeContext} and 
provides setup and teardown methods:
+ * {@link RichFunction#open(org.apache.flink.configuration.Configuration)} 
and
+ * {@link RichFunction#close()}.
+ *
+ * 
+ * {@link RichAsyncFunction#getRuntimeContext()} and {@link 
RichAsyncFunction#getRuntimeContext()} are
+ * not supported because the key may get changed while accessing states in 
the working thread.
+ *
+ * @param  The type of the input elements.
+ * @param  The type of the returned elements.
+ */
+
+@PublicEvolving
+public abstract class RichAsyncFunction extends 
AbstractRichFunction
+   implements AsyncFunction {
+
+   @Override
+   public abstract void asyncInvoke(IN input, AsyncCollector 
collector) throws Exception;
+
+   @Override
+   public RuntimeContext getRuntimeContext() {
--- End diff --

Hi @mproch,  In our use case in the production env, we provide some metrics 
in the `AsyncWaitOperator`, like the number of elements in the 
`AsyncCollectorBuffer` etc. Maybe the `AsyncWaitOperator` provides those basic 
statistics metrics.
The `RichAsyncFunction` is useful for its `open()` and `close()` methods, 
where the async clients can be initialized and freed.
I think there are two options:  
1. Remove `RichAsyncFunction`, since `AsyncWaitOperator` currently only 
supports non-keyed stream.
2. Keep `RichAsyncFunction` but it is different to use 
`RichAsyncFunction.getRuntimeContext()`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4821) Implement rescalable non-partitioned state for Kinesis Connector

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746999#comment-15746999
 ] 

ASF GitHub Bot commented on FLINK-4821:
---

GitHub user tony810430 opened a pull request:

https://github.com/apache/flink/pull/3001

[FLINK-4821] [kinesis] Implement rescalable non-partitioned state for 
Kinesis Connector

Implement ListCheckpointed interface on Kinesis Consumer

As a reminder, I returned empty list instead of null in some cases in the 
snapshotState, because snapshotState method in AbstractUdfStreamOperator didn't 
handle the situation when list is null and would get NullPointeException in the 
foreach statement. (see line 106 to line 119 in AbstractUdfStreamOperator)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tony810430/flink FLINK-4821

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3001.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3001


commit fa0829738fd075fb4e65e31698acbc8fd0f16af2
Author: 魏偉哲 
Date:   2016-12-14T02:18:25Z

[FLINK-4821] Implement rescalable non-partitioned state for Kinesis 
Connector




> Implement rescalable non-partitioned state for Kinesis Connector
> 
>
> Key: FLINK-4821
> URL: https://issues.apache.org/jira/browse/FLINK-4821
> Project: Flink
>  Issue Type: New Feature
>  Components: Kinesis Connector
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Wei-Che Wei
> Fix For: 1.2.0
>
>
> FLINK-4379 added the rescalable non-partitioned state feature, along with the 
> implementation for the Kafka connector.
> The AWS Kinesis connector will benefit from the feature and should implement 
> it too. This ticket tracks progress for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #3001: [FLINK-4821] [kinesis] Implement rescalable non-pa...

2016-12-13 Thread tony810430
GitHub user tony810430 opened a pull request:

https://github.com/apache/flink/pull/3001

[FLINK-4821] [kinesis] Implement rescalable non-partitioned state for 
Kinesis Connector

Implement ListCheckpointed interface on Kinesis Consumer

As a reminder, I returned empty list instead of null in some cases in the 
snapshotState, because snapshotState method in AbstractUdfStreamOperator didn't 
handle the situation when list is null and would get NullPointeException in the 
foreach statement. (see line 106 to line 119 in AbstractUdfStreamOperator)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tony810430/flink FLINK-4821

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3001.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3001


commit fa0829738fd075fb4e65e31698acbc8fd0f16af2
Author: 魏偉哲 
Date:   2016-12-14T02:18:25Z

[FLINK-4821] Implement rescalable non-partitioned state for Kinesis 
Connector




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5108) Remove ClientShutdownHook during job execution

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746832#comment-15746832
 ] 

ASF GitHub Bot commented on FLINK-5108:
---

Github user Renkai commented on the issue:

https://github.com/apache/flink/pull/2928
  
That is ok to close this issue.

Max 于2016年12月14日周三 02:31写道:

> There is one problem we overlooked. In detached mode we ensure cluster
> shutdown through a message sent by the client during job submission to 
tell
> the JobManager that this is going to be the last job it has to execute. In
> interactive execution mode, the user jar can contain multiple jobs; this 
is
> mostly useful for interactive batch jobs. Since we just execute the main
> method of the user jar, we don't know how many jobs are submitted and when
> to shutdown the cluster. That's why we chose to delegate the shutdown to
> the client for interactive jobs. Thus, I'm hesitant to remove the shutdown
> hook because it ensures that the cluster shuts down during interactive job
> executions. It prevents clusters from lingering around when the client
> shuts down.
>
> A couple of solution for this problem:
>
>1.
>
>The JobManager watches the client and shuts down a) if it looses
>connection to the client and the job it executes has completed or b) 
the
>client tells the JobManager to shut down.
>2.
>
>The JobManager drives the execution which is now part of the client
>3.
>
>We don't allow multiple jobs to execute. Then we always have a clear
>shutdown point. This is perhaps the easiest and most elegant solution. 
Most
>users only execute a single job at a time anyways. We can still allow
>interactive job executions if the user chooses to. Perhaps we can make 
this
>more explicit in the API to give a hint to the client.
>
> I'm afraid we will have to close this PR until we realize one of the above
> solutions (or another one).
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



> Remove ClientShutdownHook during job execution
> --
>
> Key: FLINK-5108
> URL: https://issues.apache.org/jira/browse/FLINK-5108
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Maximilian Michels
>Assignee: Renkai Ge
> Fix For: 1.2.0
>
>
> The behavior of the Standalone mode is to not react to client interrupts once 
> a job has been deployed. We should change the Yarn client implementation to 
> behave the same. This avoids accidental shutdown of the job, e.g. when the 
> user sends an interrupt via CTRL-C or when the client machine shuts down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2928: [FLINK-5108] Remove ClientShutdownHook during job executi...

2016-12-13 Thread Renkai
Github user Renkai commented on the issue:

https://github.com/apache/flink/pull/2928
  
That is ok to close this issue.

Max 于2016年12月14日周三 02:31写道:

> There is one problem we overlooked. In detached mode we ensure cluster
> shutdown through a message sent by the client during job submission to 
tell
> the JobManager that this is going to be the last job it has to execute. In
> interactive execution mode, the user jar can contain multiple jobs; this 
is
> mostly useful for interactive batch jobs. Since we just execute the main
> method of the user jar, we don't know how many jobs are submitted and when
> to shutdown the cluster. That's why we chose to delegate the shutdown to
> the client for interactive jobs. Thus, I'm hesitant to remove the shutdown
> hook because it ensures that the cluster shuts down during interactive job
> executions. It prevents clusters from lingering around when the client
> shuts down.
>
> A couple of solution for this problem:
>
>1.
>
>The JobManager watches the client and shuts down a) if it looses
>connection to the client and the job it executes has completed or b) 
the
>client tells the JobManager to shut down.
>2.
>
>The JobManager drives the execution which is now part of the client
>3.
>
>We don't allow multiple jobs to execute. Then we always have a clear
>shutdown point. This is perhaps the easiest and most elegant solution. 
Most
>users only execute a single job at a time anyways. We can still allow
>interactive job executions if the user chooses to. Perhaps we can make 
this
>more explicit in the API to give a hint to the client.
>
> I'm afraid we will have to close this PR until we realize one of the above
> solutions (or another one).
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2982: [FLINK-4460] Side Outputs in Flink

2016-12-13 Thread chenqin
Github user chenqin commented on the issue:

https://github.com/apache/flink/pull/2982
  
@aljoscha Thanks for your time. We can chat more after 1.2 release!

I think it makes sense to extends Collector, even though we may not remove 
collect(T obj) due to API compability issue in 1.X. Per @fhueske comments in 
[FLIP-13 email 
thread](http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-FLIP-13-Side-Outputs-in-Flink-td14204.html)
 

Only point I would like to add: there seems decent amount of refactor to 
replace underlining output chains using collect(tag, element), yet seems 
reasonable investment moving forward (multiple inputs/ multiple outputs)

`tooLateEvents()` method is something added for user's convenience. should 
be fine to remove if doesn't gain much benefit. `LateArrivingTag` share same 
type as input (which is like already fixed once input type defined). Add late 
arriving tag within apply method seems redudant. In fact, without any changes 
to this diff, user also be able to access late arriving events via following 
way.

`
OutputTag lateElementsOutput = new LateArrivingOutputTag();
DataStream input = ...
SingleOutputStreamOperator windowed = input
  .keyBy(...)
  .window(...)
  .apply(Function);

DataStream lateElements = windowed.getSideOutput(lateElementsOutput);
`

Thanks,
Chen



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4460) Side Outputs in Flink

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746680#comment-15746680
 ] 

ASF GitHub Bot commented on FLINK-4460:
---

Github user chenqin commented on the issue:

https://github.com/apache/flink/pull/2982
  
@aljoscha Thanks for your time. We can chat more after 1.2 release!

I think it makes sense to extends Collector, even though we may not remove 
collect(T obj) due to API compability issue in 1.X. Per @fhueske comments in 
[FLIP-13 email 
thread](http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-FLIP-13-Side-Outputs-in-Flink-td14204.html)
 

Only point I would like to add: there seems decent amount of refactor to 
replace underlining output chains using collect(tag, element), yet seems 
reasonable investment moving forward (multiple inputs/ multiple outputs)

`tooLateEvents()` method is something added for user's convenience. should 
be fine to remove if doesn't gain much benefit. `LateArrivingTag` share same 
type as input (which is like already fixed once input type defined). Add late 
arriving tag within apply method seems redudant. In fact, without any changes 
to this diff, user also be able to access late arriving events via following 
way.

`
OutputTag lateElementsOutput = new LateArrivingOutputTag();
DataStream input = ...
SingleOutputStreamOperator windowed = input
  .keyBy(...)
  .window(...)
  .apply(Function);

DataStream lateElements = windowed.getSideOutput(lateElementsOutput);
`

Thanks,
Chen



> Side Outputs in Flink
> -
>
> Key: FLINK-4460
> URL: https://issues.apache.org/jira/browse/FLINK-4460
> Project: Flink
>  Issue Type: New Feature
>  Components: Core, DataStream API
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Chen Qin
>  Labels: latearrivingevents, sideoutput
>
> https://docs.google.com/document/d/1vg1gpR8JL4dM07Yu4NyerQhhVvBlde5qdqnuJv4LcV4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2985: [FLINK-5104] Bipartite graph validation

2016-12-13 Thread mushketyk
Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2985
  
Hi @greghogan ,

Thank you for your feedback. I've updated the PR accordingly. The only 
thing that I did differently is that I've replaced projections and two 
`count()` executions with joins and one `count()` call. What do you think about 
this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5104) Implement BipartiteGraph validator

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746410#comment-15746410
 ] 

ASF GitHub Bot commented on FLINK-5104:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2985
  
Hi @greghogan ,

Thank you for your feedback. I've updated the PR accordingly. The only 
thing that I did differently is that I've replaced projections and two 
`count()` executions with joins and one `count()` call. What do you think about 
this?


> Implement BipartiteGraph validator
> --
>
> Key: FLINK-5104
> URL: https://issues.apache.org/jira/browse/FLINK-5104
> Project: Flink
>  Issue Type: Sub-task
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>
> BipartiteGraph should have a validator similar to GraphValidator for Graph 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5104) Implement BipartiteGraph validator

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746405#comment-15746405
 ] 

ASF GitHub Bot commented on FLINK-5104:
---

Github user mushketyk commented on a diff in the pull request:

https://github.com/apache/flink/pull/2985#discussion_r92276412
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/validation/InvalidBipartiteVertexIdsValidator.java
 ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.validation;
+
+import org.apache.flink.api.common.functions.CoGroupFunction;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.bipartite.BipartiteEdge;
+import org.apache.flink.graph.bipartite.BipartiteGraph;
+import org.apache.flink.util.Collector;
+import static 
org.apache.flink.api.java.functions.FunctionAnnotation.ForwardedFields;
+
+/**
+ * Checks that the edge set input contains valid vertex Ids, i.e. that they
+ * also exist in top and bottom vertex sets.
+ */
+public class InvalidBipartiteVertexIdsValidator 
extends BipartiteGraphValidator {
+
+   @Override
+   public boolean validate(BipartiteGraph 
bipartiteGraph) throws Exception {
+   DataSet edgesTopIds = 
bipartiteGraph.getEdges().map(new GetTopIdsMap());
+   DataSet edgesBottomIds = 
bipartiteGraph.getEdges().map(new GetBottomIdsMap());
+
+   DataSet invalidTopIds = 
invalidIds(bipartiteGraph.getTopVertices(), edgesTopIds);
+   DataSet invalidBottomIds = 
invalidIds(bipartiteGraph.getBottomVertices(), edgesBottomIds);
+
+   return invalidTopIds.count() == 0 && invalidBottomIds.count() 
== 0;
--- End diff --

Since these two datasets have different types I couldn't figure out how to 
use accumulators and `RichOutputFormat`  to run `collect()` only once, but I've 
replaced it with two outer joins and one `count()` call.
Is it a feasible approach?


> Implement BipartiteGraph validator
> --
>
> Key: FLINK-5104
> URL: https://issues.apache.org/jira/browse/FLINK-5104
> Project: Flink
>  Issue Type: Sub-task
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>
> BipartiteGraph should have a validator similar to GraphValidator for Graph 
> class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2985: [FLINK-5104] Bipartite graph validation

2016-12-13 Thread mushketyk
Github user mushketyk commented on a diff in the pull request:

https://github.com/apache/flink/pull/2985#discussion_r92276412
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/validation/InvalidBipartiteVertexIdsValidator.java
 ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.validation;
+
+import org.apache.flink.api.common.functions.CoGroupFunction;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.bipartite.BipartiteEdge;
+import org.apache.flink.graph.bipartite.BipartiteGraph;
+import org.apache.flink.util.Collector;
+import static 
org.apache.flink.api.java.functions.FunctionAnnotation.ForwardedFields;
+
+/**
+ * Checks that the edge set input contains valid vertex Ids, i.e. that they
+ * also exist in top and bottom vertex sets.
+ */
+public class InvalidBipartiteVertexIdsValidator 
extends BipartiteGraphValidator {
+
+   @Override
+   public boolean validate(BipartiteGraph 
bipartiteGraph) throws Exception {
+   DataSet edgesTopIds = 
bipartiteGraph.getEdges().map(new GetTopIdsMap());
+   DataSet edgesBottomIds = 
bipartiteGraph.getEdges().map(new GetBottomIdsMap());
+
+   DataSet invalidTopIds = 
invalidIds(bipartiteGraph.getTopVertices(), edgesTopIds);
+   DataSet invalidBottomIds = 
invalidIds(bipartiteGraph.getBottomVertices(), edgesBottomIds);
+
+   return invalidTopIds.count() == 0 && invalidBottomIds.count() 
== 0;
--- End diff --

Since these two datasets have different types I couldn't figure out how to 
use accumulators and `RichOutputFormat`  to run `collect()` only once, but I've 
replaced it with two outer joins and one `count()` call.
Is it a feasible approach?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4861) Package optional project artifacts

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746359#comment-15746359
 ] 

ASF GitHub Bot commented on FLINK-4861:
---

Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/2664
  
New implementation using maven-shade-plugin in #3000.


> Package optional project artifacts
> --
>
> Key: FLINK-4861
> URL: https://issues.apache.org/jira/browse/FLINK-4861
> Project: Flink
>  Issue Type: New Feature
>  Components: Build System
>Affects Versions: 1.2.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.2.0
>
>
> Per the mailing list 
> [discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Additional-project-downloads-td13223.html],
>  package the Flink libraries and connectors into subdirectories of a new 
> {{opt}} directory in the release/snapshot tarballs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2664: [FLINK-4861] [build] Package optional project artifacts

2016-12-13 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/2664
  
New implementation using maven-shade-plugin in #3000.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4861) Package optional project artifacts

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746356#comment-15746356
 ] 

ASF GitHub Bot commented on FLINK-4861:
---

GitHub user greghogan opened a pull request:

https://github.com/apache/flink/pull/3000

[FLINK-4861] [build] Package optional project artifacts

Package the Flink connectors, metrics, and libraries into subdirectories of 
a new opt directory in the release/snapshot tarballs.

I'm not aware of a way to place the maven-shade-plugin configuration in the 
parent pom and conditionally enable it for a subset of modules.

$ ls build-target/opt
flink-avro_2.10-1.2-SNAPSHOT.jar
flink-cep_2.10-1.2-SNAPSHOT.jar
flink-cep-scala_2.10-1.2-SNAPSHOT.jar
flink-connector-cassandra_2.10-1.2-SNAPSHOT.jar
flink-connector-elasticsearch_2.10-1.2-SNAPSHOT.jar
flink-connector-elasticsearch2_2.10-1.2-SNAPSHOT.jar
flink-connector-filesystem_2.10-1.2-SNAPSHOT.jar
flink-connector-flume_2.10-1.2-SNAPSHOT.jar
flink-connector-hadoop-compatibility_2.10-1.2-SNAPSHOT.jar
flink-connector-hbase_2.10-1.2-SNAPSHOT.jar
flink-connector-hcatalog-1.2-SNAPSHOT.jar
flink-connector-jdbc-1.2-SNAPSHOT.jar
flink-connector-kafka-0.10_2.10-1.2-SNAPSHOT.jar
flink-connector-kafka-0.8_2.10-1.2-SNAPSHOT.jar
flink-connector-kafka-0.9_2.10-1.2-SNAPSHOT.jar
flink-connector-nifi_2.10-1.2-SNAPSHOT.jar
flink-connector-rabbitmq_2.10-1.2-SNAPSHOT.jar
flink-connector-redis_2.10-1.2-SNAPSHOT.jar
flink-connector-twitter_2.10-1.2-SNAPSHOT.jar
flink-gelly_2.10-1.2-SNAPSHOT.jar
flink-gelly-examples_2.10-1.2-SNAPSHOT.jar
flink-gelly-scala_2.10-1.2-SNAPSHOT.jar
flink-metrics-dropwizard-1.2-SNAPSHOT.jar
flink-metrics-ganglia-1.2-SNAPSHOT.jar
flink-metrics-graphite-1.2-SNAPSHOT.jar
flink-metrics-statsd-1.2-SNAPSHOT.jar
flink-ml_2.10-1.2-SNAPSHOT.jar
flink-storm_2.10-1.2-SNAPSHOT.jar
flink-storm-examples_2.10-1.2-SNAPSHOT.jar

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/greghogan/flink 
4861b_package_optional_project_artifacts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3000.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3000


commit 972d3d01faf5375bff976a90f651c1bb2d8776bf
Author: Greg Hogan 
Date:   2016-12-13T20:14:19Z

[FLINK-4861] [build] Package optional project artifacts

Package the Flink connectors, metrics, and libraries into subdirectories
of a new opt directory in the release/snapshot tarballs.




> Package optional project artifacts
> --
>
> Key: FLINK-4861
> URL: https://issues.apache.org/jira/browse/FLINK-4861
> Project: Flink
>  Issue Type: New Feature
>  Components: Build System
>Affects Versions: 1.2.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.2.0
>
>
> Per the mailing list 
> [discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Additional-project-downloads-td13223.html],
>  package the Flink libraries and connectors into subdirectories of a new 
> {{opt}} directory in the release/snapshot tarballs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #3000: [FLINK-4861] [build] Package optional project arti...

2016-12-13 Thread greghogan
GitHub user greghogan opened a pull request:

https://github.com/apache/flink/pull/3000

[FLINK-4861] [build] Package optional project artifacts

Package the Flink connectors, metrics, and libraries into subdirectories of 
a new opt directory in the release/snapshot tarballs.

I'm not aware of a way to place the maven-shade-plugin configuration in the 
parent pom and conditionally enable it for a subset of modules.

$ ls build-target/opt
flink-avro_2.10-1.2-SNAPSHOT.jar
flink-cep_2.10-1.2-SNAPSHOT.jar
flink-cep-scala_2.10-1.2-SNAPSHOT.jar
flink-connector-cassandra_2.10-1.2-SNAPSHOT.jar
flink-connector-elasticsearch_2.10-1.2-SNAPSHOT.jar
flink-connector-elasticsearch2_2.10-1.2-SNAPSHOT.jar
flink-connector-filesystem_2.10-1.2-SNAPSHOT.jar
flink-connector-flume_2.10-1.2-SNAPSHOT.jar
flink-connector-hadoop-compatibility_2.10-1.2-SNAPSHOT.jar
flink-connector-hbase_2.10-1.2-SNAPSHOT.jar
flink-connector-hcatalog-1.2-SNAPSHOT.jar
flink-connector-jdbc-1.2-SNAPSHOT.jar
flink-connector-kafka-0.10_2.10-1.2-SNAPSHOT.jar
flink-connector-kafka-0.8_2.10-1.2-SNAPSHOT.jar
flink-connector-kafka-0.9_2.10-1.2-SNAPSHOT.jar
flink-connector-nifi_2.10-1.2-SNAPSHOT.jar
flink-connector-rabbitmq_2.10-1.2-SNAPSHOT.jar
flink-connector-redis_2.10-1.2-SNAPSHOT.jar
flink-connector-twitter_2.10-1.2-SNAPSHOT.jar
flink-gelly_2.10-1.2-SNAPSHOT.jar
flink-gelly-examples_2.10-1.2-SNAPSHOT.jar
flink-gelly-scala_2.10-1.2-SNAPSHOT.jar
flink-metrics-dropwizard-1.2-SNAPSHOT.jar
flink-metrics-ganglia-1.2-SNAPSHOT.jar
flink-metrics-graphite-1.2-SNAPSHOT.jar
flink-metrics-statsd-1.2-SNAPSHOT.jar
flink-ml_2.10-1.2-SNAPSHOT.jar
flink-storm_2.10-1.2-SNAPSHOT.jar
flink-storm-examples_2.10-1.2-SNAPSHOT.jar

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/greghogan/flink 
4861b_package_optional_project_artifacts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3000.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3000


commit 972d3d01faf5375bff976a90f651c1bb2d8776bf
Author: Greg Hogan 
Date:   2016-12-13T20:14:19Z

[FLINK-4861] [build] Package optional project artifacts

Package the Flink connectors, metrics, and libraries into subdirectories
of a new opt directory in the release/snapshot tarballs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5311) Write user documentation for BipartiteGraph

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746285#comment-15746285
 ] 

ASF GitHub Bot commented on FLINK-5311:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2984
  
Hi @vasia. Thank you for your comments. I've added the link to the doc and 
added the image (I tried to make it in the same style as other Gelly image).


> Write user documentation for BipartiteGraph
> ---
>
> Key: FLINK-5311
> URL: https://issues.apache.org/jira/browse/FLINK-5311
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>
> We need to add user documentation. The progress on BipartiteGraph can be 
> tracked in the following JIRA:
> https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2984: [FLINK-5311] Add user documentation for bipartite graph

2016-12-13 Thread mushketyk
Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2984
  
Hi @vasia. Thank you for your comments. I've added the link to the doc and 
added the image (I tried to make it in the same style as other Gelly image).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (FLINK-5245) Add support for BipartiteGraph mutations

2016-12-13 Thread Ivan Mushketyk (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746163#comment-15746163
 ] 

Ivan Mushketyk edited comment on FLINK-5245 at 12/13/16 8:23 PM:
-

I got it now, sorry for the misunderstanding.

In any case, it seems that there is a consensus between you and Greg that this 
particular feature is not particularly useful for BipartiteGraph. If you have 
no objections, I'll remove this issue.


was (Author: ivan.mushketyk):
I got it now, sorry for the misunderstanding.

In any case, it seems that there is a consensus between you and Greg that this 
particular feature is not particularly useful for BipartiteGraph. If you have 
no objections, I'll remove this particular issue.

> Add support for BipartiteGraph mutations
> 
>
> Key: FLINK-5245
> URL: https://issues.apache.org/jira/browse/FLINK-5245
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>
> Implement methods for adding and removing vertices and edges similarly to 
> Graph class.
> Depends on https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5245) Add support for BipartiteGraph mutations

2016-12-13 Thread Ivan Mushketyk (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746163#comment-15746163
 ] 

Ivan Mushketyk commented on FLINK-5245:
---

I got it now, sorry for the misunderstanding.

In any case, it seems that there is a consensus between you and Greg that this 
particular feature is not particularly useful for BipartiteGraph. If you have 
no objections, I'll remove this particular issue.

> Add support for BipartiteGraph mutations
> 
>
> Key: FLINK-5245
> URL: https://issues.apache.org/jira/browse/FLINK-5245
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>
> Implement methods for adding and removing vertices and edges similarly to 
> Graph class.
> Depends on https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5332) Non-thread safe FileSystem::initOutPathLocalFS() can cause lost files/directories in local execution

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746126#comment-15746126
 ] 

ASF GitHub Bot commented on FLINK-5332:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2999
  
I think this should go into 1.2 - it is quite a bug for local testing.


> Non-thread safe FileSystem::initOutPathLocalFS() can cause lost 
> files/directories in local execution
> 
>
> Key: FLINK-5332
> URL: https://issues.apache.org/jira/browse/FLINK-5332
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.2.0
>
>
> This is mainly relevant to tests and Local Mini Cluster executions.
> The {{FileOutputFormat}} and its subclasses rely on 
> {{FileSystem::initOutPathLocalFS()}} to prepare the output directory. When 
> multiple parallel output writers call that method, there is a slim chance 
> that one parallel threads deletes the others directory. The checks that the 
> method has are not bullet proof.
> I believe that this is the cause for many Travis test instabilities that we 
> observed over time.
> Simply synchronizing that method per process should do the trick. Since it is 
> a rare initialization method, and only relevant in tests & local mini cluster 
> executions, it should be a price that is okay to pay. I see no other way, as 
> we do not have simple access to an atomic "check and delete and recreate" 
> file operation.
> The synchronization also makes many "re-try" code paths obsolete (there 
> should be no re-tries needed on proper file systems).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2999: [FLINK-5332] [core] Synchronize FileSystem::initOutPathLo...

2016-12-13 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2999
  
I think this should go into 1.2 - it is quite a bug for local testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5332) Non-thread safe FileSystem::initOutPathLocalFS() can cause lost files/directories in local execution

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746125#comment-15746125
 ] 

ASF GitHub Bot commented on FLINK-5332:
---

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/2999

[FLINK-5332] [core] Synchronize FileSystem::initOutPathLocalFS() to prevent 
lost files/directories when called concurrently

This is mainly relevant to tests and Local Mini Cluster executions.

The `FileOutputFormat` and its subclasses rely on 
`FileSystem::initOutPathLocalFS()` to prepare the output directory. When 
multiple parallel output writers call that method, there is a slim chance that 
one parallel threads deletes the others directory. The checks that the method 
has are not bullet proof.

I believe that this is the cause for many Travis test instabilities that we 
observed over time.

Simply synchronizing that method per process should do the trick. Since it 
is a rare initialization method, and only relevant in tests & local mini 
cluster executions, it should be a price that is okay to pay. I see no other 
way, as we do not have simple access to an atomic "check and delete and 
recreate" file operation.
The synchronization also makes many "re-try" code paths obsolete (there 
should be no re-tries needed on proper file systems).

### Tests

This is tricky to test. The test in `InitOutputPathTest.java` uses a series 
of latch to re-produce the problematic thread execution interleaving to 
validate the problem. The properly fixed variant cannot use that interleaving 
(because it fixes the problem, duh), but pushes the thread interleaving 
best-effort towards the case where the problem would occur, were the method not 
properly synchronized. Sounds weird, I know.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink fs_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2999.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2999


commit 7a4d5b2f53e3b8c27a5224e405cf1b671f474a58
Author: Stephan Ewen 
Date:   2016-12-13T18:15:11Z

[tests] Add 'CheckedThread' as a common test utility

commit e25b436f1de18be0dc2c3b02b82bf0b8203f0b44
Author: Stephan Ewen 
Date:   2016-12-13T18:12:12Z

[FLINK-5332] [core] Synchronize FileSystem::initOutPathLocalFS() to prevent 
lost files when called concurrently.




> Non-thread safe FileSystem::initOutPathLocalFS() can cause lost 
> files/directories in local execution
> 
>
> Key: FLINK-5332
> URL: https://issues.apache.org/jira/browse/FLINK-5332
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.2.0
>
>
> This is mainly relevant to tests and Local Mini Cluster executions.
> The {{FileOutputFormat}} and its subclasses rely on 
> {{FileSystem::initOutPathLocalFS()}} to prepare the output directory. When 
> multiple parallel output writers call that method, there is a slim chance 
> that one parallel threads deletes the others directory. The checks that the 
> method has are not bullet proof.
> I believe that this is the cause for many Travis test instabilities that we 
> observed over time.
> Simply synchronizing that method per process should do the trick. Since it is 
> a rare initialization method, and only relevant in tests & local mini cluster 
> executions, it should be a price that is okay to pay. I see no other way, as 
> we do not have simple access to an atomic "check and delete and recreate" 
> file operation.
> The synchronization also makes many "re-try" code paths obsolete (there 
> should be no re-tries needed on proper file systems).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2999: [FLINK-5332] [core] Synchronize FileSystem::initOu...

2016-12-13 Thread StephanEwen
GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/2999

[FLINK-5332] [core] Synchronize FileSystem::initOutPathLocalFS() to prevent 
lost files/directories when called concurrently

This is mainly relevant to tests and Local Mini Cluster executions.

The `FileOutputFormat` and its subclasses rely on 
`FileSystem::initOutPathLocalFS()` to prepare the output directory. When 
multiple parallel output writers call that method, there is a slim chance that 
one parallel threads deletes the others directory. The checks that the method 
has are not bullet proof.

I believe that this is the cause for many Travis test instabilities that we 
observed over time.

Simply synchronizing that method per process should do the trick. Since it 
is a rare initialization method, and only relevant in tests & local mini 
cluster executions, it should be a price that is okay to pay. I see no other 
way, as we do not have simple access to an atomic "check and delete and 
recreate" file operation.
The synchronization also makes many "re-try" code paths obsolete (there 
should be no re-tries needed on proper file systems).

### Tests

This is tricky to test. The test in `InitOutputPathTest.java` uses a series 
of latch to re-produce the problematic thread execution interleaving to 
validate the problem. The properly fixed variant cannot use that interleaving 
(because it fixes the problem, duh), but pushes the thread interleaving 
best-effort towards the case where the problem would occur, were the method not 
properly synchronized. Sounds weird, I know.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink fs_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2999.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2999


commit 7a4d5b2f53e3b8c27a5224e405cf1b671f474a58
Author: Stephan Ewen 
Date:   2016-12-13T18:15:11Z

[tests] Add 'CheckedThread' as a common test utility

commit e25b436f1de18be0dc2c3b02b82bf0b8203f0b44
Author: Stephan Ewen 
Date:   2016-12-13T18:12:12Z

[FLINK-5332] [core] Synchronize FileSystem::initOutPathLocalFS() to prevent 
lost files when called concurrently.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4798) CEPITCase.testSimpleKeyedPatternCEP test failure

2016-12-13 Thread Boris Osipov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746119#comment-15746119
 ] 

Boris Osipov commented on FLINK-4798:
-

Yes, it is. I've reproduced it several times with such behavior.

> CEPITCase.testSimpleKeyedPatternCEP test failure
> 
>
> Key: FLINK-4798
> URL: https://issues.apache.org/jira/browse/FLINK-4798
> Project: Flink
>  Issue Type: Bug
>  Components: CEP
>Affects Versions: 1.2.0
>Reporter: Robert Metzger
>Assignee: Boris Osipov
>  Labels: test-stability
>
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.flink.cep.CEPITCase
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.627 sec <<< 
> FAILURE! - in org.apache.flink.cep.CEPITCase
> testSimpleKeyedPatternCEP(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 0.312 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<3> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302)
>   at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:61)
> {code}
> in https://api.travis-ci.org/jobs/166676733/log.txt?deansi=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2969: [FLINK-5289] Meaningful exception when using value state ...

2016-12-13 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2969
  
+1 looks good. Will put this into my merge pipeline


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5289) NPE when using value state on non-keyed stream

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746105#comment-15746105
 ] 

ASF GitHub Bot commented on FLINK-5289:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2969
  
+1 looks good. Will put this into my merge pipeline


> NPE when using value state on non-keyed stream
> --
>
> Key: FLINK-5289
> URL: https://issues.apache.org/jira/browse/FLINK-5289
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Timo Walther
>Assignee: Stefan Richter
>
> Using a {{ValueStateDescriptor}} and 
> {{getRuntimeContext().getState(descriptor)}} on a non-keyed stream leads to 
> {{NullPointerException}} which is not very helpful for users:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.api.operators.StreamingRuntimeContext.getState(StreamingRuntimeContext.java:109)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (FLINK-4798) CEPITCase.testSimpleKeyedPatternCEP test failure

2016-12-13 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-4798:

Comment: was deleted

(was: I think that FLINK-5332 is probably the root cause of this issue.)

> CEPITCase.testSimpleKeyedPatternCEP test failure
> 
>
> Key: FLINK-4798
> URL: https://issues.apache.org/jira/browse/FLINK-4798
> Project: Flink
>  Issue Type: Bug
>  Components: CEP
>Affects Versions: 1.2.0
>Reporter: Robert Metzger
>Assignee: Boris Osipov
>  Labels: test-stability
>
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.flink.cep.CEPITCase
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.627 sec <<< 
> FAILURE! - in org.apache.flink.cep.CEPITCase
> testSimpleKeyedPatternCEP(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 0.312 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<3> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302)
>   at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:61)
> {code}
> in https://api.travis-ci.org/jobs/166676733/log.txt?deansi=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4798) CEPITCase.testSimpleKeyedPatternCEP test failure

2016-12-13 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746088#comment-15746088
 ] 

Stephan Ewen commented on FLINK-4798:
-

I think that FLINK-5332 is probably the root cause of this issue.

> CEPITCase.testSimpleKeyedPatternCEP test failure
> 
>
> Key: FLINK-4798
> URL: https://issues.apache.org/jira/browse/FLINK-4798
> Project: Flink
>  Issue Type: Bug
>  Components: CEP
>Affects Versions: 1.2.0
>Reporter: Robert Metzger
>Assignee: Boris Osipov
>  Labels: test-stability
>
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.flink.cep.CEPITCase
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.627 sec <<< 
> FAILURE! - in org.apache.flink.cep.CEPITCase
> testSimpleKeyedPatternCEP(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 0.312 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<3> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302)
>   at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:61)
> {code}
> in https://api.travis-ci.org/jobs/166676733/log.txt?deansi=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4798) CEPITCase.testSimpleKeyedPatternCEP test failure

2016-12-13 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746086#comment-15746086
 ] 

Stephan Ewen commented on FLINK-4798:
-

I think that FLINK-5332 is probably the root cause of this issue.

> CEPITCase.testSimpleKeyedPatternCEP test failure
> 
>
> Key: FLINK-4798
> URL: https://issues.apache.org/jira/browse/FLINK-4798
> Project: Flink
>  Issue Type: Bug
>  Components: CEP
>Affects Versions: 1.2.0
>Reporter: Robert Metzger
>Assignee: Boris Osipov
>  Labels: test-stability
>
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.flink.cep.CEPITCase
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.627 sec <<< 
> FAILURE! - in org.apache.flink.cep.CEPITCase
> testSimpleKeyedPatternCEP(org.apache.flink.cep.CEPITCase)  Time elapsed: 
> 0.312 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<3> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302)
>   at org.apache.flink.cep.CEPITCase.after(CEPITCase.java:61)
> {code}
> in https://api.travis-ci.org/jobs/166676733/log.txt?deansi=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-5288) Flakey ConnectedComponentsITCase#testConnectedComponentsExample unit test

2016-12-13 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-5288.
-
Resolution: Duplicate

> Flakey ConnectedComponentsITCase#testConnectedComponentsExample unit test
> -
>
> Key: FLINK-5288
> URL: https://issues.apache.org/jira/browse/FLINK-5288
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: TravisCI
>Reporter: Nico Kruber
>  Labels: test-stability
>
> https://api.travis-ci.org/jobs/182243067/log.txt?deansi=true
> {code:none}
> Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.272 sec <<< 
> FAILURE! - in org.apache.flink.graph.test.examples.ConnectedComponentsITCase
> testConnectedComponentsExample[Execution mode = 
> CLUSTER](org.apache.flink.graph.test.examples.ConnectedComponentsITCase)  
> Time elapsed: 1.195 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<4> but was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302)
>   at 
> org.apache.flink.graph.test.examples.ConnectedComponentsITCase.after(ConnectedComponentsITCase.java:70)
> Failed tests: 
>   
> ConnectedComponentsITCase.after:70->TestBaseUtils.compareResultsByLinesInMemory:302->TestBaseUtils.compareResultsByLinesInMemory:316
>  Different number of lines in expected and obtained result. expected:<4> but 
> was:<3>{code}
> full log:
> https://transfer.sh/RjFRD/38.4.tar.gz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5288) Flakey ConnectedComponentsITCase#testConnectedComponentsExample unit test

2016-12-13 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-5288.
---

> Flakey ConnectedComponentsITCase#testConnectedComponentsExample unit test
> -
>
> Key: FLINK-5288
> URL: https://issues.apache.org/jira/browse/FLINK-5288
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: TravisCI
>Reporter: Nico Kruber
>  Labels: test-stability
>
> https://api.travis-ci.org/jobs/182243067/log.txt?deansi=true
> {code:none}
> Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.272 sec <<< 
> FAILURE! - in org.apache.flink.graph.test.examples.ConnectedComponentsITCase
> testConnectedComponentsExample[Execution mode = 
> CLUSTER](org.apache.flink.graph.test.examples.ConnectedComponentsITCase)  
> Time elapsed: 1.195 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<4> but was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302)
>   at 
> org.apache.flink.graph.test.examples.ConnectedComponentsITCase.after(ConnectedComponentsITCase.java:70)
> Failed tests: 
>   
> ConnectedComponentsITCase.after:70->TestBaseUtils.compareResultsByLinesInMemory:302->TestBaseUtils.compareResultsByLinesInMemory:316
>  Different number of lines in expected and obtained result. expected:<4> but 
> was:<3>{code}
> full log:
> https://transfer.sh/RjFRD/38.4.tar.gz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5287) Test randomly fails with wrong result: testWithAtomic2[Execution mode = CLUSTER](org.apache.flink.api.scala.operators.JoinITCase)

2016-12-13 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-5287.
---

> Test randomly fails with wrong result: testWithAtomic2[Execution mode = 
> CLUSTER](org.apache.flink.api.scala.operators.JoinITCase)
> -
>
> Key: FLINK-5287
> URL: https://issues.apache.org/jira/browse/FLINK-5287
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Reporter: Robert Metzger
>  Labels: test-stability
>
> I encountered this issue here: 
> https://api.travis-ci.org/jobs/182009802/log.txt?deansi=true
> {code}
> testWithAtomic2[Execution mode = 
> CLUSTER](org.apache.flink.api.scala.operators.JoinITCase)  Time elapsed: 
> 0.237 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302)
>   at 
> org.apache.flink.api.scala.operators.JoinITCase.after(JoinITCase.scala:51)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-5287) Test randomly fails with wrong result: testWithAtomic2[Execution mode = CLUSTER](org.apache.flink.api.scala.operators.JoinITCase)

2016-12-13 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-5287.
-
Resolution: Duplicate

> Test randomly fails with wrong result: testWithAtomic2[Execution mode = 
> CLUSTER](org.apache.flink.api.scala.operators.JoinITCase)
> -
>
> Key: FLINK-5287
> URL: https://issues.apache.org/jira/browse/FLINK-5287
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Reporter: Robert Metzger
>  Labels: test-stability
>
> I encountered this issue here: 
> https://api.travis-ci.org/jobs/182009802/log.txt?deansi=true
> {code}
> testWithAtomic2[Execution mode = 
> CLUSTER](org.apache.flink.api.scala.operators.JoinITCase)  Time elapsed: 
> 0.237 sec  <<< FAILURE!
> java.lang.AssertionError: Different number of lines in expected and obtained 
> result. expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:316)
>   at 
> org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:302)
>   at 
> org.apache.flink.api.scala.operators.JoinITCase.after(JoinITCase.scala:51)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5332) Non-thread safe FileSystem::initOutPathLocalFS() can cause lost files/directories in local execution

2016-12-13 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-5332:
---

 Summary: Non-thread safe FileSystem::initOutPathLocalFS() can 
cause lost files/directories in local execution
 Key: FLINK-5332
 URL: https://issues.apache.org/jira/browse/FLINK-5332
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Critical
 Fix For: 1.2.0


This is mainly relevant to tests and Local Mini Cluster executions.

The {{FileOutputFormat}} and its subclasses rely on 
{{FileSystem::initOutPathLocalFS()}} to prepare the output directory. When 
multiple parallel output writers call that method, there is a slim chance that 
one parallel threads deletes the others directory. The checks that the method 
has are not bullet proof.

I believe that this is the cause for many Travis test instabilities that we 
observed over time.

Simply synchronizing that method per process should do the trick. Since it is a 
rare initialization method, and only relevant in tests & local mini cluster 
executions, it should be a price that is okay to pay. I see no other way, as we 
do not have simple access to an atomic "check and delete and recreate" file 
operation.

The synchronization also makes many "re-try" code paths obsolete (there should 
be no re-tries needed on proper file systems).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5322) Clean up yarn configuration documentation

2016-12-13 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-5322:

Affects Version/s: 1.1.3

> Clean up yarn configuration documentation
> -
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.2.0, 1.1.3
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.
> --
> The yarn section of the configuration page should be cleaned up a bit. The 
> "yarn.containers.vcores" parameter is listed twice, the example for 
> "yarn.application-master.env" is listed as a separate parameter and the 
> "yarn.taskmanager.env" description indirectly references another parameter 
> ("same as the above") which just isn't maintainable; instead it should be 
> described similarly as the application-master entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5322) Clean up yarn configuration documentation

2016-12-13 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-5322:

Description: 
The value I specified in flink-conf.yaml

{code}
yarn.taskmanager.env:
  MY_ENV: test
{code}

is not available in {{System.getenv("MY_ENV")}} from the plan execution 
(execution flow of main method) nor from within execution of a streaming 
operator.

Interestingly, it does appear within the Flink JobManager Web UI under Job 
Manager -> Configuration.

--

The yarn section of the configuration page should be cleaned up a bit. The 
"yarn.containers.vcores" parameter is listed twice, the example for 
"yarn.application-master.env" is listed as a separate parameter and the 
"yarn.taskmanager.env" description indirectly references another parameter 
("same as the above") which just isn't maintainable; instead it should be 
described similarly as the application-master entry.

  was:
The value I specified in flink-conf.yaml

{code}
yarn.taskmanager.env:
  MY_ENV: test
{code}

is not available in {{System.getenv("MY_ENV")}} from the plan execution 
(execution flow of main method) nor from within execution of a streaming 
operator.

Interestingly, it does appear within the Flink JobManager Web UI under Job 
Manager -> Configuration.

--

The yarn section of the configuration page should be cleaned up a bit. The 
"yarn.containers.vcores" parameter is listed twice, the example for 
"yarn.application-master.env" is listed as a separate parameter and the 
"yarn.taskmanager.env" description indirectly references another parameter 
("same as the above") which just isn't maintainable.


> Clean up yarn configuration documentation
> -
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.2.0, 1.1.3
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.
> --
> The yarn section of the configuration page should be cleaned up a bit. The 
> "yarn.containers.vcores" parameter is listed twice, the example for 
> "yarn.application-master.env" is listed as a separate parameter and the 
> "yarn.taskmanager.env" description indirectly references another parameter 
> ("same as the above") which just isn't maintainable; instead it should be 
> described similarly as the application-master entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5322) Clean up yarn configuration documentation

2016-12-13 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-5322:

Fix Version/s: (was: 1.1.3)
   1.1.4
   1.2.0

> Clean up yarn configuration documentation
> -
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.2.0, 1.1.3
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.2.0, 1.1.4
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.
> --
> The yarn section of the configuration page should be cleaned up a bit. The 
> "yarn.containers.vcores" parameter is listed twice, the example for 
> "yarn.application-master.env" is listed as a separate parameter and the 
> "yarn.taskmanager.env" description indirectly references another parameter 
> ("same as the above") which just isn't maintainable; instead it should be 
> described similarly as the application-master entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5322) Clean up yarn configuration documentation

2016-12-13 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-5322:

Affects Version/s: 1.2.0

> Clean up yarn configuration documentation
> -
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.2.0, 1.1.3
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.
> --
> The yarn section of the configuration page should be cleaned up a bit. The 
> "yarn.containers.vcores" parameter is listed twice, the example for 
> "yarn.application-master.env" is listed as a separate parameter and the 
> "yarn.taskmanager.env" description indirectly references another parameter 
> ("same as the above") which just isn't maintainable; instead it should be 
> described similarly as the application-master entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5322) Clean up yarn configuration documentation

2016-12-13 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-5322:

Description: 
The value I specified in flink-conf.yaml

{code}
yarn.taskmanager.env:
  MY_ENV: test
{code}

is not available in {{System.getenv("MY_ENV")}} from the plan execution 
(execution flow of main method) nor from within execution of a streaming 
operator.

Interestingly, it does appear within the Flink JobManager Web UI under Job 
Manager -> Configuration.

--

The yarn section of the configuration page should be cleaned up a bit. The 
"yarn.containers.vcores" parameter is listed twice, the example for 
"yarn.application-master.env" is listed as a separate parameter and the 
"yarn.taskmanager.env" description indirectly references another parameter 
("same as the above") which just isn't maintainable.

  was:
The value I specified in flink-conf.yaml

{code}
yarn.taskmanager.env:
  MY_ENV: test
{code}

is not available in {{System.getenv("MY_ENV")}} from the plan execution 
(execution flow of main method) nor from within execution of a streaming 
operator.

Interestingly, it does appear within the Flink JobManager Web UI under Job 
Manager -> Configuration.


> Clean up yarn configuration documentation
> -
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.
> --
> The yarn section of the configuration page should be cleaned up a bit. The 
> "yarn.containers.vcores" parameter is listed twice, the example for 
> "yarn.application-master.env" is listed as a separate parameter and the 
> "yarn.taskmanager.env" description indirectly references another parameter 
> ("same as the above") which just isn't maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-5322) Clean up yarn configuration documentation

2016-12-13 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746029#comment-15746029
 ] 

Chesnay Schepler edited comment on FLINK-5322 at 12/13/16 7:35 PM:
---

yes, we can adjust the configuration documentation.


was (Author: zentol):
yes, we can adjust the configuration.

> Clean up yarn configuration documentation
> -
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.
> --
> The yarn section of the configuration page should be cleaned up a bit. The 
> "yarn.containers.vcores" parameter is listed twice, the example for 
> "yarn.application-master.env" is listed as a separate parameter and the 
> "yarn.taskmanager.env" description indirectly references another parameter 
> ("same as the above") which just isn't maintainable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5322) Clean up yarn configuration documentation

2016-12-13 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-5322:

Summary: Clean up yarn configuration documentation  (was: 
yarn.taskmanager.env value does not appear in System.getenv)

> Clean up yarn configuration documentation
> -
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-5322) Clean up yarn configuration documentation

2016-12-13 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746029#comment-15746029
 ] 

Chesnay Schepler edited comment on FLINK-5322 at 12/13/16 7:32 PM:
---

yes, we can adjust the configuration.


was (Author: zentol):
yes, we can adjust he configuration.

> Clean up yarn configuration documentation
> -
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5322) yarn.taskmanager.env value does not appear in System.getenv

2016-12-13 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-5322:

Issue Type: Improvement  (was: Bug)

> yarn.taskmanager.env value does not appear in System.getenv
> ---
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-5322) yarn.taskmanager.env value does not appear in System.getenv

2016-12-13 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-5322:
---

Assignee: Chesnay Schepler

> yarn.taskmanager.env value does not appear in System.getenv
> ---
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, YARN
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-5322) yarn.taskmanager.env value does not appear in System.getenv

2016-12-13 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-5322:

Component/s: Documentation

> yarn.taskmanager.env value does not appear in System.getenv
> ---
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5322) yarn.taskmanager.env value does not appear in System.getenv

2016-12-13 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746029#comment-15746029
 ] 

Chesnay Schepler commented on FLINK-5322:
-

yes, we can adjust he configuration.

> yarn.taskmanager.env value does not appear in System.getenv
> ---
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, YARN
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Assignee: Chesnay Schepler
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5322) yarn.taskmanager.env value does not appear in System.getenv

2016-12-13 Thread Shannon Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746008#comment-15746008
 ] 

Shannon Carey commented on FLINK-5322:
--

Yes, that worked! Thanks! It becomes available from within UDF (streaming 
operator) code. Perhaps this ticket can be adapted to just clarify the 
documentation to explain that the YAML isn't really interpreted the way people 
may expect/to provide an example?

> yarn.taskmanager.env value does not appear in System.getenv
> ---
>
> Key: FLINK-5322
> URL: https://issues.apache.org/jira/browse/FLINK-5322
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
> Environment: Flink 1.1.3 on AWS EMR emr-5.2.0 (Hadoop "Amazon 2.7.3")
>Reporter: Shannon Carey
>Priority: Trivial
> Fix For: 1.1.3
>
>
> The value I specified in flink-conf.yaml
> {code}
> yarn.taskmanager.env:
>   MY_ENV: test
> {code}
> is not available in {{System.getenv("MY_ENV")}} from the plan execution 
> (execution flow of main method) nor from within execution of a streaming 
> operator.
> Interestingly, it does appear within the Flink JobManager Web UI under Job 
> Manager -> Configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5108) Remove ClientShutdownHook during job execution

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745856#comment-15745856
 ] 

ASF GitHub Bot commented on FLINK-5108:
---

Github user mxm commented on the issue:

https://github.com/apache/flink/pull/2928
  
There is one problem we overlooked. In detached mode we ensure cluster 
shutdown through a message sent by the client during job submission to tell the 
JobManager that this is going to be the last job it has to execute. In 
interactive execution mode, the user jar can contain multiple jobs; this is 
mostly useful for interactive batch jobs. Since we just execute the main method 
of the user jar, we don't know how many jobs are submitted and when to shutdown 
the cluster. That's why we chose to delegate the shutdown to the client for 
interactive jobs. Thus, I'm hesitant to remove the shutdown hook because it 
ensures that the cluster shuts down during interactive job executions. It 
prevents clusters from lingering around when the client shuts down.

A couple of solution for this problem:

1. The JobManager watches the client and shuts down a) if it looses 
connection to the client and the job it executes has completed or b) the client 
tells the JobManager to shut down.

2. The JobManager drives the execution which is now part of the client

3. We don't allow multiple jobs to execute. Then we always have a clear 
shutdown point. This is perhaps the easiest and most elegant solution. Most 
users only execute a single job at a time anyways. We can still allow 
interactive job executions if the user chooses to. Perhaps we can make this 
more explicit in the API to give a hint to the client. 

I'm afraid we will have to close this PR until we realize one of the above 
solutions (or another one).


> Remove ClientShutdownHook during job execution
> --
>
> Key: FLINK-5108
> URL: https://issues.apache.org/jira/browse/FLINK-5108
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Maximilian Michels
>Assignee: Renkai Ge
> Fix For: 1.2.0
>
>
> The behavior of the Standalone mode is to not react to client interrupts once 
> a job has been deployed. We should change the Yarn client implementation to 
> behave the same. This avoids accidental shutdown of the job, e.g. when the 
> user sends an interrupt via CTRL-C or when the client machine shuts down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2928: [FLINK-5108] Remove ClientShutdownHook during job executi...

2016-12-13 Thread mxm
Github user mxm commented on the issue:

https://github.com/apache/flink/pull/2928
  
There is one problem we overlooked. In detached mode we ensure cluster 
shutdown through a message sent by the client during job submission to tell the 
JobManager that this is going to be the last job it has to execute. In 
interactive execution mode, the user jar can contain multiple jobs; this is 
mostly useful for interactive batch jobs. Since we just execute the main method 
of the user jar, we don't know how many jobs are submitted and when to shutdown 
the cluster. That's why we chose to delegate the shutdown to the client for 
interactive jobs. Thus, I'm hesitant to remove the shutdown hook because it 
ensures that the cluster shuts down during interactive job executions. It 
prevents clusters from lingering around when the client shuts down.

A couple of solution for this problem:

1. The JobManager watches the client and shuts down a) if it looses 
connection to the client and the job it executes has completed or b) the client 
tells the JobManager to shut down.

2. The JobManager drives the execution which is now part of the client

3. We don't allow multiple jobs to execute. Then we always have a clear 
shutdown point. This is perhaps the easiest and most elegant solution. Most 
users only execute a single job at a time anyways. We can still allow 
interactive job executions if the user chooses to. Perhaps we can make this 
more explicit in the API to give a hint to the client. 

I'm afraid we will have to close this PR until we realize one of the above 
solutions (or another one).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-5108) Remove ClientShutdownHook during job execution

2016-12-13 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-5108:
--
Priority: Major  (was: Blocker)

> Remove ClientShutdownHook during job execution
> --
>
> Key: FLINK-5108
> URL: https://issues.apache.org/jira/browse/FLINK-5108
> Project: Flink
>  Issue Type: Bug
>  Components: YARN Client
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Maximilian Michels
>Assignee: Renkai Ge
> Fix For: 1.2.0
>
>
> The behavior of the Standalone mode is to not react to client interrupts once 
> a job has been deployed. We should change the Yarn client implementation to 
> behave the same. This avoids accidental shutdown of the job, e.g. when the 
> user sends an interrupt via CTRL-C or when the client machine shuts down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5051) Backwards compatibility for serializers in backend state

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745679#comment-15745679
 ] 

ASF GitHub Bot commented on FLINK-5051:
---

Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/2962#discussion_r92220463
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyedBackendSerializationProxy.java
 ---
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.state;
+
+import org.apache.flink.api.common.state.StateDescriptor;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import 
org.apache.flink.api.common.typeutils.TypeSerializerSerializationProxy;
+import org.apache.flink.core.io.IOReadableWritable;
+import org.apache.flink.core.io.VersionedIOReadableWritable;
+import org.apache.flink.core.memory.DataInputView;
+import org.apache.flink.core.memory.DataOutputView;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Serialization proxy for all meta data in keyed state backends. In the 
future we might also migrate the actual state
+ * serialization logic here.
+ */
+public class KeyedBackendSerializationProxy extends 
VersionedIOReadableWritable {
+
+   private static final int VERSION = 1;
+
+   private TypeSerializerSerializationProxy keySerializerProxy;
+   private List namedStateSerializationProxies;
+
+   private ClassLoader userCodeClassLoader;
+
+   public KeyedBackendSerializationProxy(ClassLoader userCodeClassLoader) {
+   this.userCodeClassLoader = 
Preconditions.checkNotNull(userCodeClassLoader);
+   }
+
+   public KeyedBackendSerializationProxy(TypeSerializer keySerializer, 
List namedStateSerializationProxies) {
+   this.keySerializerProxy = new 
TypeSerializerSerializationProxy<>(Preconditions.checkNotNull(keySerializer));
+   this.namedStateSerializationProxies = 
Preconditions.checkNotNull(namedStateSerializationProxies);
+   
Preconditions.checkArgument(namedStateSerializationProxies.size() <= 
Short.MAX_VALUE);
+   }
+
+   public List getNamedStateSerializationProxies() {
+   return namedStateSerializationProxies;
+   }
+
+   public TypeSerializerSerializationProxy getKeySerializerProxy() {
+   return keySerializerProxy;
+   }
+
+   @Override
+   public int getVersion() {
+   return VERSION;
+   }
+
+   @Override
+   public void write(DataOutputView out) throws IOException {
+   super.write(out);
+
+   keySerializerProxy.write(out);
+
+   out.writeShort(namedStateSerializationProxies.size());
+   Map kVStateToId = new 
HashMap<>(namedStateSerializationProxies.size());
--- End diff --

Leftover from the previous code. Will remove it.


> Backwards compatibility for serializers in backend state
> 
>
> Key: FLINK-5051
> URL: https://issues.apache.org/jira/browse/FLINK-5051
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> When a new state is register, e.g. in a keyed backend via 
> `getPartitionedState`, the caller has to provide all type serializers 
> required for the persistence of state components. Explicitly passing the 
> serializers on state creation already allows for potentiall version upgrades 
> of serializers.
> However, those serializers are currently not part of any snapshot and are 
> only provided at runtime, when the state is 

[GitHub] flink pull request #2962: [FLINK-5051] Backwards compatibility for serialize...

2016-12-13 Thread StefanRRichter
Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/2962#discussion_r92220463
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyedBackendSerializationProxy.java
 ---
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.state;
+
+import org.apache.flink.api.common.state.StateDescriptor;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import 
org.apache.flink.api.common.typeutils.TypeSerializerSerializationProxy;
+import org.apache.flink.core.io.IOReadableWritable;
+import org.apache.flink.core.io.VersionedIOReadableWritable;
+import org.apache.flink.core.memory.DataInputView;
+import org.apache.flink.core.memory.DataOutputView;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Serialization proxy for all meta data in keyed state backends. In the 
future we might also migrate the actual state
+ * serialization logic here.
+ */
+public class KeyedBackendSerializationProxy extends 
VersionedIOReadableWritable {
+
+   private static final int VERSION = 1;
+
+   private TypeSerializerSerializationProxy keySerializerProxy;
+   private List namedStateSerializationProxies;
+
+   private ClassLoader userCodeClassLoader;
+
+   public KeyedBackendSerializationProxy(ClassLoader userCodeClassLoader) {
+   this.userCodeClassLoader = 
Preconditions.checkNotNull(userCodeClassLoader);
+   }
+
+   public KeyedBackendSerializationProxy(TypeSerializer keySerializer, 
List namedStateSerializationProxies) {
+   this.keySerializerProxy = new 
TypeSerializerSerializationProxy<>(Preconditions.checkNotNull(keySerializer));
+   this.namedStateSerializationProxies = 
Preconditions.checkNotNull(namedStateSerializationProxies);
+   
Preconditions.checkArgument(namedStateSerializationProxies.size() <= 
Short.MAX_VALUE);
+   }
+
+   public List getNamedStateSerializationProxies() {
+   return namedStateSerializationProxies;
+   }
+
+   public TypeSerializerSerializationProxy getKeySerializerProxy() {
+   return keySerializerProxy;
+   }
+
+   @Override
+   public int getVersion() {
+   return VERSION;
+   }
+
+   @Override
+   public void write(DataOutputView out) throws IOException {
+   super.write(out);
+
+   keySerializerProxy.write(out);
+
+   out.writeShort(namedStateSerializationProxies.size());
+   Map kVStateToId = new 
HashMap<>(namedStateSerializationProxies.size());
--- End diff --

Leftover from the previous code. Will remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5290) Ensure backwards compatibility of the hashes used to generate JobVertexIds

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745665#comment-15745665
 ] 

ASF GitHub Bot commented on FLINK-5290:
---

Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/2966#discussion_r92219738
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/graph/StreamGraphHasherV2.java
 ---
@@ -0,0 +1,319 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.graph;
+
+import com.google.common.hash.HashFunction;
+import com.google.common.hash.Hasher;
+import com.google.common.hash.Hashing;
+import org.apache.flink.runtime.jobgraph.JobVertexID;
+import org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator;
+import org.apache.flink.streaming.api.operators.ChainingStrategy;
+import org.apache.flink.streaming.api.operators.StreamOperator;
+import org.apache.flink.streaming.api.transformations.StreamTransformation;
+import org.apache.flink.streaming.runtime.partitioner.ForwardPartitioner;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.nio.charset.Charset;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Queue;
+import java.util.Set;
+
+import static org.apache.flink.util.StringUtils.byteToHexString;
+
+/**
+ * StreamGraphHasher from Flink 1.2. This contains duplicated code to 
ensure that the algorithm does not change with
+ * future Flink versions.
+ * 
+ * DO NOT MODIFY THIS CLASS
+ */
+public class StreamGraphHasherV2 implements StreamGraphHasher {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(StreamGraphHasherV2.class);
+
+   /**
+* Returns a map with a hash for each {@link StreamNode} of the {@link
+* StreamGraph}. The hash is used as the {@link JobVertexID} in order to
+* identify nodes across job submissions if they didn't change.
+* 
--- End diff --

Thanks, I fixed those.


> Ensure backwards compatibility of the hashes used to generate JobVertexIds
> --
>
> Key: FLINK-5290
> URL: https://issues.apache.org/jira/browse/FLINK-5290
> Project: Flink
>  Issue Type: Sub-task
>  Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> The way in which hashes for JobVertexIds are generated changed between Flink 
> 1.1 and 1.2 (parallelism was considered for the hash in 1.1). We need to be 
> backwards compatible to old JobVertexId generation so that we can still 
> assign state from old savepoints.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5189) Deprecate table.Row

2016-12-13 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745658#comment-15745658
 ] 

Fabian Hueske commented on FLINK-5189:
--

Given that we move all Table API classes anyway (FLINK-4704), we can also 
delete `table.Row`, IMO.
Sorry for the confusion.

> Deprecate table.Row
> ---
>
> Key: FLINK-5189
> URL: https://issues.apache.org/jira/browse/FLINK-5189
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core, Table API & SQL
>Reporter: Anton Solovev
>Assignee: Anton Solovev
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5325) Introduce interface for CloseableRegistry to separate user from system-facing functionality

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745654#comment-15745654
 ] 

ASF GitHub Bot commented on FLINK-5325:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/2992
  
I made the interface generic and changed the hierarchy as suggested by 
@zentol. 


> Introduce interface for CloseableRegistry to separate user from system-facing 
> functionality
> ---
>
> Key: FLINK-5325
> URL: https://issues.apache.org/jira/browse/FLINK-5325
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> Currently, the API of {{CloseableRegistry}} exposes the {{close}} method to 
> all client code. We should separate the API into a user-facing interface 
> (allowing only for un/registration of {{Closeable}} and a system-facing part 
> that also exposes the {{close}} method. This prevents users from accidentally 
> calling {{close}}, thus closing resources that other callers registered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2992: [FLINK-5325] Splitting user/system-facing API of Closeabl...

2016-12-13 Thread StefanRRichter
Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/2992
  
I made the interface generic and changed the hierarchy as suggested by 
@zentol. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-5331) PythonPlanBinderTest idling extremely long

2016-12-13 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-5331:
-
Attachment: Screen Shot 2016-12-13 at 16.36.38.png

Activity monitor showing some of the python processes

> PythonPlanBinderTest idling extremely long
> --
>
> Key: FLINK-5331
> URL: https://issues.apache.org/jira/browse/FLINK-5331
> Project: Flink
>  Issue Type: Improvement
>  Components: Python API, Tests
>Affects Versions: 1.2.0
>Reporter: Till Rohrmann
>Priority: Minor
> Attachments: Screen Shot 2016-12-13 at 16.36.38.png
>
>
> When running {{mvn clean verify}} locally, the {{PythonPlanBinderTest}} takes 
> sometimes extremely long to finish. The latest build took more than 900 
> seconds to complete.
> {code}
> Running org.apache.flink.python.api.PythonPlanBinderTest
> log4j:WARN No appenders could be found for logger 
> (org.apache.flink.python.api.PythonPlanBinderTest).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 923.586 sec - 
> in org.apache.flink.python.api.PythonPlanBinderTest
> Results :
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
> {code}
> Additionally I see a lot of python processes (~ 70-80) on my system doing 
> effectively nothing. This behaviour seems really strange to me. I think it 
> would be great to revisit this test case to see whether we can streamline it 
> a little bit better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2958: [FLINK-4704] Move Table API to org.apache.flink.table

2016-12-13 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2958
  
Thanks for the PR @ex00!
I would like to suggest to move a few more files:
- move `flink-table` examples: `org/apache/flink/examples` -> 
`org/apache/flink/table/examples` (for Java and Scala)
- move `src/main/scala/org/apache/flink/table/windows.scala` -> 
`src/main/scala/org/apache/flink/table/api/windows.scala`
- move `src/main/scala/org/apache/flink/table/Types.scala` -> 
`src/main/scala/org/apache/flink/table/typeutils/Types.scala`
- move `src/main/scala/org/apache/flink/table/trees/TreeNode.scala` -> 
`src/main/scala/org/apache/flink/table/plan/TreeNode.scala` and remove the 
empty `trees` package / folder
- create a new package `org.apache.flink.table.calcite` and move 
`CalciteConfig.scala`, `FlinkCalciteSqlValidator.scala`, 
`FlinkPlannerImpl.scala`, `FlinkRelBuilder.scala`, `FlinkTypeFactory.scala`, 
and `FlinkTypeSystem.scala` from `src/main/scala/org/apache/flink/table` to 
`src/main/scala/org/apache/flink/table/calcite`

What do you think @ex00 and @twalthr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4704) Move Table API to org.apache.flink.table

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745638#comment-15745638
 ] 

ASF GitHub Bot commented on FLINK-4704:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2958
  
Thanks for the PR @ex00!
I would like to suggest to move a few more files:
- move `flink-table` examples: `org/apache/flink/examples` -> 
`org/apache/flink/table/examples` (for Java and Scala)
- move `src/main/scala/org/apache/flink/table/windows.scala` -> 
`src/main/scala/org/apache/flink/table/api/windows.scala`
- move `src/main/scala/org/apache/flink/table/Types.scala` -> 
`src/main/scala/org/apache/flink/table/typeutils/Types.scala`
- move `src/main/scala/org/apache/flink/table/trees/TreeNode.scala` -> 
`src/main/scala/org/apache/flink/table/plan/TreeNode.scala` and remove the 
empty `trees` package / folder
- create a new package `org.apache.flink.table.calcite` and move 
`CalciteConfig.scala`, `FlinkCalciteSqlValidator.scala`, 
`FlinkPlannerImpl.scala`, `FlinkRelBuilder.scala`, `FlinkTypeFactory.scala`, 
and `FlinkTypeSystem.scala` from `src/main/scala/org/apache/flink/table` to 
`src/main/scala/org/apache/flink/table/calcite`

What do you think @ex00 and @twalthr?


> Move Table API to org.apache.flink.table
> 
>
> Key: FLINK-4704
> URL: https://issues.apache.org/jira/browse/FLINK-4704
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Anton Mushin
>Priority: Blocker
> Fix For: 1.2.0
>
>
> This would be a large change. But maybe now is still a good time to do it. 
> Otherwise we will never fix this.
> Actually, the Table API is in the wrong package. At the moment it is in 
> {{org.apache.flink.api.table}} and the actual Scala/Java APIs are in 
> {{org.apache.flink.api.java/scala.table}}. All other APIs such as Python, 
> Gelly, Flink ML do not use the {{org.apache.flink.api}} namespace.
> I suggest the following packages:
> {code}
> org.apache.flink.table
> org.apache.flink.table.api.java
> org.apache.flink.table.api.scala
> {code}
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5325) Introduce interface for CloseableRegistry to separate user from system-facing functionality

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745485#comment-15745485
 ] 

ASF GitHub Bot commented on FLINK-5325:
---

Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/2992#discussion_r92199074
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/core/fs/CloseableRegistryImpl.java ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.core.fs;
+
+import org.apache.flink.util.AbstractCloseableRegistry;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * This class allows to register instances of {@link Closeable}, which are 
all closed if this registry is closed.
+ * 
+ * Registering to an already closed registry will throw an exception and 
close the provided {@link Closeable}
+ * 
+ * All methods in this class are thread-safe.
+ */
+public class CloseableRegistryImpl extends 
AbstractCloseableRegistry implements CloseableRegistry {
--- End diff --

I also totally agree that it should implement ``CloseableRegistry ``. 
However, this it can not trivially implement the interface without making 
``CloseableRegistry `` generic in .  This in turn 
would lead to some nasty naming like ``CloseableRegistry``. Maybe it 
is just a naming problem after all, just yet lacking the right idea on those 
names :-)


> Introduce interface for CloseableRegistry to separate user from system-facing 
> functionality
> ---
>
> Key: FLINK-5325
> URL: https://issues.apache.org/jira/browse/FLINK-5325
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> Currently, the API of {{CloseableRegistry}} exposes the {{close}} method to 
> all client code. We should separate the API into a user-facing interface 
> (allowing only for un/registration of {{Closeable}} and a system-facing part 
> that also exposes the {{close}} method. This prevents users from accidentally 
> calling {{close}}, thus closing resources that other callers registered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2992: [FLINK-5325] Splitting user/system-facing API of C...

2016-12-13 Thread StefanRRichter
Github user StefanRRichter commented on a diff in the pull request:

https://github.com/apache/flink/pull/2992#discussion_r92199074
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/core/fs/CloseableRegistryImpl.java ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.core.fs;
+
+import org.apache.flink.util.AbstractCloseableRegistry;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * This class allows to register instances of {@link Closeable}, which are 
all closed if this registry is closed.
+ * 
+ * Registering to an already closed registry will throw an exception and 
close the provided {@link Closeable}
+ * 
+ * All methods in this class are thread-safe.
+ */
+public class CloseableRegistryImpl extends 
AbstractCloseableRegistry implements CloseableRegistry {
--- End diff --

I also totally agree that it should implement ``CloseableRegistry ``. 
However, this it can not trivially implement the interface without making 
``CloseableRegistry `` generic in .  This in turn 
would lead to some nasty naming like ``CloseableRegistry``. Maybe it 
is just a naming problem after all, just yet lacking the right idea on those 
names :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3848) Add ProjectableTableSource interface and translation rule

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745481#comment-15745481
 ] 

ASF GitHub Bot commented on FLINK-3848:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
Thanks for asking @tonycox! FLINK-5188 is more urgent. We want to do the 
feature freeze by end of the week. Would be great if you could open a PR at 
Thursday the latest. If this is too soon, please let me know and I can work on 
FLINK-5188.

Thanks, Fabian


> Add ProjectableTableSource interface and translation rule
> -
>
> Key: FLINK-3848
> URL: https://issues.apache.org/jira/browse/FLINK-3848
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Fabian Hueske
>Assignee: Anton Solovev
>
> Add a {{ProjectableTableSource}} interface for {{TableSource}} implementation 
> that support projection push-down.
> The interface could look as follows
> {code}
> def trait ProjectableTableSource {
>   def setProjection(fields: Array[String]): Unit
> }
> {code}
> In addition we need Calcite rules to push a projection into a TableScan that 
> refers to a {{ProjectableTableSource}}. We might need to tweak the cost model 
> as well to push the optimizer in the right direction.
> Moreover, the {{CsvTableSource}} could be extended to implement 
> {{ProjectableTableSource}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2971: [backport] [FLINK-5300] Add more gentle file delet...

2016-12-13 Thread tillrohrmann
Github user tillrohrmann closed the pull request at:

https://github.com/apache/flink/pull/2971


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-13 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
Thanks for asking @tonycox! FLINK-5188 is more urgent. We want to do the 
feature freeze by end of the week. Would be great if you could open a PR at 
Thursday the latest. If this is too soon, please let me know and I can work on 
FLINK-5188.

Thanks, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2971: [backport] [FLINK-5300] Add more gentle file deletion pro...

2016-12-13 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2971
  
Has been merged into the release-1.1 branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5300) FileStateHandle#discard & FsCheckpointStateOutputStream#close tries to delete non-empty directory

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745478#comment-15745478
 ] 

ASF GitHub Bot commented on FLINK-5300:
---

Github user tillrohrmann closed the pull request at:

https://github.com/apache/flink/pull/2971


> FileStateHandle#discard & FsCheckpointStateOutputStream#close tries to delete 
> non-empty directory
> -
>
> Key: FLINK-5300
> URL: https://issues.apache.org/jira/browse/FLINK-5300
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Critical
>
> Flink's behaviour to delete {{FileStateHandles}} and closing 
> {{FsCheckpointStateOutputStream}} always triggers a delete operation on the 
> parent directory. Often this call will fail because the directory still 
> contains some other files.
> A user reported that the SRE of their Hadoop cluster noticed this behaviour 
> in the logs. It might be more system friendly if we first checked whether the 
> directory is empty or not. This would prevent many error message to appear in 
> the Hadoop logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2970: [FLINK-5300] Add more gentle file deletion procedu...

2016-12-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2970


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5300) FileStateHandle#discard & FsCheckpointStateOutputStream#close tries to delete non-empty directory

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745477#comment-15745477
 ] 

ASF GitHub Bot commented on FLINK-5300:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2971
  
Has been merged into the release-1.1 branch.


> FileStateHandle#discard & FsCheckpointStateOutputStream#close tries to delete 
> non-empty directory
> -
>
> Key: FLINK-5300
> URL: https://issues.apache.org/jira/browse/FLINK-5300
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Critical
>
> Flink's behaviour to delete {{FileStateHandles}} and closing 
> {{FsCheckpointStateOutputStream}} always triggers a delete operation on the 
> parent directory. Often this call will fail because the directory still 
> contains some other files.
> A user reported that the SRE of their Hadoop cluster noticed this behaviour 
> in the logs. It might be more system friendly if we first checked whether the 
> directory is empty or not. This would prevent many error message to appear in 
> the Hadoop logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5300) FileStateHandle#discard & FsCheckpointStateOutputStream#close tries to delete non-empty directory

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745475#comment-15745475
 ] 

ASF GitHub Bot commented on FLINK-5300:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2970


> FileStateHandle#discard & FsCheckpointStateOutputStream#close tries to delete 
> non-empty directory
> -
>
> Key: FLINK-5300
> URL: https://issues.apache.org/jira/browse/FLINK-5300
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Critical
>
> Flink's behaviour to delete {{FileStateHandles}} and closing 
> {{FsCheckpointStateOutputStream}} always triggers a delete operation on the 
> parent directory. Often this call will fail because the directory still 
> contains some other files.
> A user reported that the SRE of their Hadoop cluster noticed this behaviour 
> in the logs. It might be more system friendly if we first checked whether the 
> directory is empty or not. This would prevent many error message to appear in 
> the Hadoop logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5300) FileStateHandle#discard & FsCheckpointStateOutputStream#close tries to delete non-empty directory

2016-12-13 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-5300.

Resolution: Fixed

Fixed in 1.2.0 via 8a7288ea9af6b6822ea1f4022bb08a10b7b82318
Fixed in 1.1.4 via f3d0cc3c13b95fd59ba7603722b30b42037591b7

> FileStateHandle#discard & FsCheckpointStateOutputStream#close tries to delete 
> non-empty directory
> -
>
> Key: FLINK-5300
> URL: https://issues.apache.org/jira/browse/FLINK-5300
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Critical
>
> Flink's behaviour to delete {{FileStateHandles}} and closing 
> {{FsCheckpointStateOutputStream}} always triggers a delete operation on the 
> parent directory. Often this call will fail because the directory still 
> contains some other files.
> A user reported that the SRE of their Hadoop cluster noticed this behaviour 
> in the logs. It might be more system friendly if we first checked whether the 
> directory is empty or not. This would prevent many error message to appear in 
> the Hadoop logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5331) PythonPlanBinderTest idling extremely long

2016-12-13 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-5331:


 Summary: PythonPlanBinderTest idling extremely long
 Key: FLINK-5331
 URL: https://issues.apache.org/jira/browse/FLINK-5331
 Project: Flink
  Issue Type: Improvement
  Components: Python API, Tests
Affects Versions: 1.2.0
Reporter: Till Rohrmann
Priority: Minor


When running {{mvn clean verify}} locally, the {{PythonPlanBinderTest}} takes 
sometimes extremely long to finish. The latest build took more than 900 seconds 
to complete.

{code}
Running org.apache.flink.python.api.PythonPlanBinderTest
log4j:WARN No appenders could be found for logger 
(org.apache.flink.python.api.PythonPlanBinderTest).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 923.586 sec - 
in org.apache.flink.python.api.PythonPlanBinderTest

Results :

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
{code}

Additionally I see a lot of python processes (~ 70-80) on my system doing 
effectively nothing. This behaviour seems really strange to me. I think it 
would be great to revisit this test case to see whether we can streamline it a 
little bit better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5008) Update quickstart documentation

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745440#comment-15745440
 ] 

ASF GitHub Bot commented on FLINK-5008:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/2764
  
I wasn't able to test the Scala SBT path though, so this may need some 
additional love by someone with a working SBT environment


> Update quickstart documentation
> ---
>
> Key: FLINK-5008
> URL: https://issues.apache.org/jira/browse/FLINK-5008
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Minor
>
> * The IDE setup documentation of Flink is outdated in both parts: IntelliJ 
> IDEA was based on an old version and Eclipse/Scala IDE does not work at all 
> anymore.
> * The example in the "Quickstart: Setup" is outdated and requires "." to be 
> in the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2764: [FLINK-5008] Update quickstart documentation

2016-12-13 Thread NicoK
Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/2764
  
I wasn't able to test the Scala SBT path though, so this may need some 
additional love by someone with a working SBT environment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2764: [FLINK-5008] Update quickstart documentation

2016-12-13 Thread NicoK
Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/2764
  
Developing Flink programs still works with Eclipse (tested with Eclipse 
4.6.1 and Scala IDE 4.4.1 for Scala 2.11). Alongside testing the quickstarts, I 
also updated them as promised and made a switch for showing unstable vs. stable 
documentation which differs slightly in setting up the example projects and 
getting Flink binaries. Please, have a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5008) Update quickstart documentation

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745429#comment-15745429
 ] 

ASF GitHub Bot commented on FLINK-5008:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/2764
  
Developing Flink programs still works with Eclipse (tested with Eclipse 
4.6.1 and Scala IDE 4.4.1 for Scala 2.11). Alongside testing the quickstarts, I 
also updated them as promised and made a switch for showing unstable vs. stable 
documentation which differs slightly in setting up the example projects and 
getting Flink binaries. Please, have a look.


> Update quickstart documentation
> ---
>
> Key: FLINK-5008
> URL: https://issues.apache.org/jira/browse/FLINK-5008
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Minor
>
> * The IDE setup documentation of Flink is outdated in both parts: IntelliJ 
> IDEA was based on an old version and Eclipse/Scala IDE does not work at all 
> anymore.
> * The example in the "Quickstart: Setup" is outdated and requires "." to be 
> in the path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5325) Introduce interface for CloseableRegistry to separate user from system-facing functionality

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745414#comment-15745414
 ] 

ASF GitHub Bot commented on FLINK-5325:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2992
  
Changes look good to me. But I think that @zentol is right that 
`AbstractCloseableRegistry` should probably implement the `CloseableRegistry` 
interface from an OOP point of view. 


> Introduce interface for CloseableRegistry to separate user from system-facing 
> functionality
> ---
>
> Key: FLINK-5325
> URL: https://issues.apache.org/jira/browse/FLINK-5325
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> Currently, the API of {{CloseableRegistry}} exposes the {{close}} method to 
> all client code. We should separate the API into a user-facing interface 
> (allowing only for un/registration of {{Closeable}} and a system-facing part 
> that also exposes the {{close}} method. This prevents users from accidentally 
> calling {{close}}, thus closing resources that other callers registered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2992: [FLINK-5325] Splitting user/system-facing API of Closeabl...

2016-12-13 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2992
  
Changes look good to me. But I think that @zentol is right that 
`AbstractCloseableRegistry` should probably implement the `CloseableRegistry` 
interface from an OOP point of view. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4861) Package optional project artifacts

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745409#comment-15745409
 ] 

ASF GitHub Bot commented on FLINK-4861:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2664
  
@greghogan I think that when an immediate dependency is "provided", then 
its transitive dependencies are not pulled. If a user adds a "provided" 
`flink-streaming-connector-kafka-0.9_2.10`, then no Kafka dependency will be 
pulled into the user program uber jar.

Consequently, all transitive dependencies should be added to the `lib` 
folder, which is what we would get by uber-jaring the Kafka connector.


> Package optional project artifacts
> --
>
> Key: FLINK-4861
> URL: https://issues.apache.org/jira/browse/FLINK-4861
> Project: Flink
>  Issue Type: New Feature
>  Components: Build System
>Affects Versions: 1.2.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.2.0
>
>
> Per the mailing list 
> [discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Additional-project-downloads-td13223.html],
>  package the Flink libraries and connectors into subdirectories of a new 
> {{opt}} directory in the release/snapshot tarballs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2664: [FLINK-4861] [build] Package optional project artifacts

2016-12-13 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2664
  
@greghogan I think that when an immediate dependency is "provided", then 
its transitive dependencies are not pulled. If a user adds a "provided" 
`flink-streaming-connector-kafka-0.9_2.10`, then no Kafka dependency will be 
pulled into the user program uber jar.

Consequently, all transitive dependencies should be added to the `lib` 
folder, which is what we would get by uber-jaring the Kafka connector.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2664: [FLINK-4861] [build] Package optional project artifacts

2016-12-13 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/2664
  
@StephanEwen @rmetzger, as I revisit this, I'm still questioning the 
viability of shading an uber jar. For example, a user depending on a Kafka 
connector. Normally the dependency would be packaged in the user's uber jar. 
Instead, the user could mark the dependency as provided and copy the connector 
jar into `lib/`. If the user makes use of any transitive dependencies (as I 
assume Kafka class would be used) then because of the shading would not the 
user be required to add Kafka as a dependency to their pom?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4861) Package optional project artifacts

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745338#comment-15745338
 ] 

ASF GitHub Bot commented on FLINK-4861:
---

Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/2664
  
@StephanEwen @rmetzger, as I revisit this, I'm still questioning the 
viability of shading an uber jar. For example, a user depending on a Kafka 
connector. Normally the dependency would be packaged in the user's uber jar. 
Instead, the user could mark the dependency as provided and copy the connector 
jar into `lib/`. If the user makes use of any transitive dependencies (as I 
assume Kafka class would be used) then because of the shading would not the 
user be required to add Kafka as a dependency to their pom?


> Package optional project artifacts
> --
>
> Key: FLINK-4861
> URL: https://issues.apache.org/jira/browse/FLINK-4861
> Project: Flink
>  Issue Type: New Feature
>  Components: Build System
>Affects Versions: 1.2.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.2.0
>
>
> Per the mailing list 
> [discussion|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Additional-project-downloads-td13223.html],
>  package the Flink libraries and connectors into subdirectories of a new 
> {{opt}} directory in the release/snapshot tarballs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2998: [FLINK-5330] [tests] Harden KafkaConsumer08Test to...

2016-12-13 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/2998

[FLINK-5330] [tests] Harden KafkaConsumer08Test to fail reliably with 
unknown host exception

Using static mocking to reliably fail the InetAddress.getByName call with 
an UnknowHostException.
Furthermore, the PR decreases the connection timeouts which speeds the test 
execution up.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixKafkaConsumer08Test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2998


commit e91b4593122ec223dbd3c2140ba7e82d2282bc1d
Author: Till Rohrmann 
Date:   2016-12-13T14:48:06Z

[FLINK-5330] [tests] Harden KafkaConsumer08Test to fail reliably with 
unknown host exception

Using static mocking to reliably fail the InetAddress.getByName call with 
an UnknowHostException.
Furthermore, the PR decreases the connection timeouts which speeds the test 
execution up.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5330) Harden KafkaConsumer08Test

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745330#comment-15745330
 ] 

ASF GitHub Bot commented on FLINK-5330:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/2998

[FLINK-5330] [tests] Harden KafkaConsumer08Test to fail reliably with 
unknown host exception

Using static mocking to reliably fail the InetAddress.getByName call with 
an UnknowHostException.
Furthermore, the PR decreases the connection timeouts which speeds the test 
execution up.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixKafkaConsumer08Test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2998


commit e91b4593122ec223dbd3c2140ba7e82d2282bc1d
Author: Till Rohrmann 
Date:   2016-12-13T14:48:06Z

[FLINK-5330] [tests] Harden KafkaConsumer08Test to fail reliably with 
unknown host exception

Using static mocking to reliably fail the InetAddress.getByName call with 
an UnknowHostException.
Furthermore, the PR decreases the connection timeouts which speeds the test 
execution up.




> Harden KafkaConsumer08Test
> --
>
> Key: FLINK-5330
> URL: https://issues.apache.org/jira/browse/FLINK-5330
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.2.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Minor
> Fix For: 1.2.0
>
>
> The {{KafkaConsumer08Test}} assumes that some of the Kafka bootstrap server 
> hostname cannot be resolved. This can lead to failures if these names are 
> wrongly resolved by chance (e.g. if using Google's DNS server). Moreover, the 
> test cases take a long time to finish because the connection timeouts are set 
> to a high value. This can be decreased to speed these test cases up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-13 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @fhueske , what should I finish first, this PR or FLINK-5188 ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-5330) Harden KafkaConsumer08Test

2016-12-13 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-5330:


 Summary: Harden KafkaConsumer08Test
 Key: FLINK-5330
 URL: https://issues.apache.org/jira/browse/FLINK-5330
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Priority: Minor
 Fix For: 1.2.0


The {{KafkaConsumer08Test}} assumes that some of the Kafka bootstrap server 
hostname cannot be resolved. This can lead to failures if these names are 
wrongly resolved by chance (e.g. if using Google's DNS server). Moreover, the 
test cases take a long time to finish because the connection timeouts are set 
to a high value. This can be decreased to speed these test cases up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2982: [FLINK-4460] Side Outputs in Flink

2016-12-13 Thread aljoscha
Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/2982
  
@chenqin I had a quick look at the implementation and it looks quite good. 
I'll look at it in more detail once the 1.2 release is out and then I'll also 
have more thorough comments.

These are some quick comments off the top of my head:
 - I think we can extend `Collector` with a `collect(OutputTag, T)` method. 
Then we wouldn't need the extra `RichCollector` and `CollectorWrapper` to work 
around that.
 - For `WindowedStream` I would like to have something like this:
```
OutputTag lateElementsOutput = ...;
DataStream input = ...
SingleOutputStreamOperator windowed = input
  .keyBy(...)
  .window(...)
  .apply(Function, lateElementsOutput);

DataStream lateElements = windowed.getSideOutput(lateElementsOutput);
```
or maube something else if we find a better Idea. With the 
`WindowedStream.tooLateElements()` this would instantiate an extra 
`WindowOperator` just for getting late elements while another window operator 
would be responsible for processing the actual elements. That seems wasteful.

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4460) Side Outputs in Flink

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745298#comment-15745298
 ] 

ASF GitHub Bot commented on FLINK-4460:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/2982
  
@chenqin I had a quick look at the implementation and it looks quite good. 
I'll look at it in more detail once the 1.2 release is out and then I'll also 
have more thorough comments.

These are some quick comments off the top of my head:
 - I think we can extend `Collector` with a `collect(OutputTag, T)` method. 
Then we wouldn't need the extra `RichCollector` and `CollectorWrapper` to work 
around that.
 - For `WindowedStream` I would like to have something like this:
```
OutputTag lateElementsOutput = ...;
DataStream input = ...
SingleOutputStreamOperator windowed = input
  .keyBy(...)
  .window(...)
  .apply(Function, lateElementsOutput);

DataStream lateElements = windowed.getSideOutput(lateElementsOutput);
```
or maube something else if we find a better Idea. With the 
`WindowedStream.tooLateElements()` this would instantiate an extra 
`WindowOperator` just for getting late elements while another window operator 
would be responsible for processing the actual elements. That seems wasteful.

What do you think?


> Side Outputs in Flink
> -
>
> Key: FLINK-4460
> URL: https://issues.apache.org/jira/browse/FLINK-4460
> Project: Flink
>  Issue Type: New Feature
>  Components: Core, DataStream API
>Affects Versions: 1.2.0, 1.1.3
>Reporter: Chen Qin
>  Labels: latearrivingevents, sideoutput
>
> https://docs.google.com/document/d/1vg1gpR8JL4dM07Yu4NyerQhhVvBlde5qdqnuJv4LcV4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5240) Properly Close StateBackend in StreamTask when closing/canceling

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745293#comment-15745293
 ] 

ASF GitHub Bot commented on FLINK-5240:
---

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/2997

[FLINK-5240][tests] ensure state backends are properly closed

This adds additional test cases to verify the state backends are closed
properly upon the end of a task. The state backends should always be
closed regardless of the final state of the task.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink FLINK-5240

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2997


commit 5162c26d4cf59be8f997ee1e99200ff143f13db2
Author: Maximilian Michels 
Date:   2016-12-13T14:21:31Z

[FLINK-5240][tests] ensure state backends are properly closed

This adds additional test cases to verify the state backends are closed
properly upon the end of a task. The state backends should always be
closed regardless of the final state of the task.




> Properly Close StateBackend in StreamTask when closing/canceling
> 
>
> Key: FLINK-5240
> URL: https://issues.apache.org/jira/browse/FLINK-5240
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.2.0
>
>
> Right now, the {{StreamTask}} never calls {{close()}} on the state backend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2997: [FLINK-5240][tests] ensure state backends are prop...

2016-12-13 Thread mxm
GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/2997

[FLINK-5240][tests] ensure state backends are properly closed

This adds additional test cases to verify the state backends are closed
properly upon the end of a task. The state backends should always be
closed regardless of the final state of the task.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink FLINK-5240

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2997


commit 5162c26d4cf59be8f997ee1e99200ff143f13db2
Author: Maximilian Michels 
Date:   2016-12-13T14:21:31Z

[FLINK-5240][tests] ensure state backends are properly closed

This adds additional test cases to verify the state backends are closed
properly upon the end of a task. The state backends should always be
closed regardless of the final state of the task.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-5223) Add documentation of UDTF in Table API & SQL

2016-12-13 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-5223.

   Resolution: Done
Fix Version/s: 1.2.0

Done with 5c86efbb449c631aea0b1b490cec706ad7596b44

> Add documentation of UDTF in Table API & SQL
> 
>
> Key: FLINK-5223
> URL: https://issues.apache.org/jira/browse/FLINK-5223
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Jark Wu
>Assignee: Jark Wu
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-5304) Change method name from crossApply to join in Table API

2016-12-13 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-5304.

   Resolution: Done
Fix Version/s: 1.2.0

Done with da4af1259ae750953ca2e7a3ecec342d9eb77bac

> Change method name from crossApply to join in Table API
> ---
>
> Key: FLINK-5304
> URL: https://issues.apache.org/jira/browse/FLINK-5304
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Jark Wu
>Assignee: Jark Wu
> Fix For: 1.2.0
>
>
> Currently, the UDTF in Table API is used with {{crossApply}}, but is used 
> with JOIN in SQL. UDTF is something similar to Table, so it make sense to use 
> {{.join("udtf(c) as (s)")}} in Table API too, and join is more familiar to 
> users than {{crossApply}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5223) Add documentation of UDTF in Table API & SQL

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745175#comment-15745175
 ] 

ASF GitHub Bot commented on FLINK-5223:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2956


> Add documentation of UDTF in Table API & SQL
> 
>
> Key: FLINK-5223
> URL: https://issues.apache.org/jira/browse/FLINK-5223
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Jark Wu
>Assignee: Jark Wu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5220) Flink SQL projection pushdown

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745173#comment-15745173
 ] 

ASF GitHub Bot commented on FLINK-5220:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2923


> Flink SQL projection pushdown
> -
>
> Key: FLINK-5220
> URL: https://issues.apache.org/jira/browse/FLINK-5220
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: zhangjing
>Assignee: zhangjing
>
> The jira is to do projection pushdown optimization. Please go forward to the 
> the design document for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2978: [FLINK-5304] [table] Change method name from cross...

2016-12-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2978


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-5304) Change method name from crossApply to join in Table API

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745174#comment-15745174
 ] 

ASF GitHub Bot commented on FLINK-5304:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2978


> Change method name from crossApply to join in Table API
> ---
>
> Key: FLINK-5304
> URL: https://issues.apache.org/jira/browse/FLINK-5304
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Jark Wu
>Assignee: Jark Wu
>
> Currently, the UDTF in Table API is used with {{crossApply}}, but is used 
> with JOIN in SQL. UDTF is something similar to Table, so it make sense to use 
> {{.join("udtf(c) as (s)")}} in Table API too, and join is more familiar to 
> users than {{crossApply}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2923: [FLINK-5220] [Table API & SQL] Flink SQL projectio...

2016-12-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2923


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   >