[GitHub] flink pull request: [FLINK-1994] [ml] Add different gain calculati...

2016-01-20 Thread rawkintrevo
Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/1397#issuecomment-173422504
  
I can't get the enumeration thing to work. If you can modify it that would 
be awesome.  Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1994) Add different gain calculation schemes to SGD

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109868#comment-15109868
 ] 

ASF GitHub Bot commented on FLINK-1994:
---

Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/1397#issuecomment-173422504
  
I can't get the enumeration thing to work. If you can modify it that would 
be awesome.  Thanks. 


> Add different gain calculation schemes to SGD
> -
>
> Key: FLINK-1994
> URL: https://issues.apache.org/jira/browse/FLINK-1994
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Trevor Grant
>Priority: Minor
>  Labels: ML, Starter
>
> The current SGD implementation uses as gain for the weight updates the 
> formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain 
> calculation configurable and to provide different strategies for that. For 
> example:
> * stepsize/(1 + iterationNumber)
> * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4)
> See also how to properly select the gains [1].
> Resources:
> [1] http://arxiv.org/pdf/1107.2490.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3109) Join two streams with two different buffer time

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110057#comment-15110057
 ] 

ASF GitHub Bot commented on FLINK-3109:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50360393
  
--- Diff: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/CoGroupJoinITCase.java
 ---
@@ -234,6 +234,109 @@ public void invoke(String value) throws Exception {
Assert.assertEquals(expectedResult, testResults);
}
 
+
+   // TODO: design buffer join test
--- End diff --

@wangyangjun Don't worry. Failing tests seems unrelated to your changes. 
There are some flaky tests in Flink.


> Join two streams with two different buffer time
> ---
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.1
>Reporter: Wang Yangjun
>  Labels: easyfix, patch
> Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. 
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. 
> The tuple in which could be described as (id, showed timestamp). The other 
> one is click stream -- (id, clicked timestamp). We want get a joined stream, 
> which includes all the advertisement that is clicked by user in 20 minutes 
> after showed.
> It is possible that after an advertisement is shown, some user click it 
> immediately. It is possible that "click" message arrives server earlier than 
> "show" message because of Internet delay. We assume that the maximum delay is 
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3109]Join two streams with two differen...

2016-01-20 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50360393
  
--- Diff: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/CoGroupJoinITCase.java
 ---
@@ -234,6 +234,109 @@ public void invoke(String value) throws Exception {
Assert.assertEquals(expectedResult, testResults);
}
 
+
+   // TODO: design buffer join test
--- End diff --

@wangyangjun Don't worry. Failing tests seems unrelated to your changes. 
There are some flaky tests in Flink.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3269) Return value from DataInputStream#read() should be checked in PythonPlanReceiver

2016-01-20 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-3269.
---
Resolution: Fixed

Fixed in d00739bc4dc6beda1cefed53267ea53ee8627c25

> Return value from DataInputStream#read() should be checked in 
> PythonPlanReceiver
> 
>
> Key: FLINK-3269
> URL: https://issues.apache.org/jira/browse/FLINK-3269
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Chesnay Schepler
>Priority: Minor
>
> There're several places where the return value from DataInputStream#read() 
> isn't checked.
> Here is an example from BytesDeserializer :
> {code}
>   byte[] buffer = new byte[size];
>   input.read(buffer);
>   return buffer;
> {code}
> Buffer of size bytes is allocated.
> However, the read operation may not fill size bytes.
> The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3109]Join two streams with two differen...

2016-01-20 Thread wangyangjun
GitHub user wangyangjun opened a pull request:

https://github.com/apache/flink/pull/1527

[FLINK-3109]Join two streams with two different buffer time -- Java i…

Java implementation of jira 
[FLINK-3109](https://issues.apache.org/jira/browse/FLINK-3109)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyangjun/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1527


commit a521c83eb31b653f0a4bfc9da58837a587a378c4
Author: Yangjun Wang 
Date:   2015-12-05T01:16:49Z

[FLINK-3109]Join two streams with two different buffer time -- Java 
implementation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3109) Join two streams with two different buffer time

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108657#comment-15108657
 ] 

ASF GitHub Bot commented on FLINK-3109:
---

GitHub user wangyangjun opened a pull request:

https://github.com/apache/flink/pull/1527

[FLINK-3109]Join two streams with two different buffer time -- Java i…

Java implementation of jira 
[FLINK-3109](https://issues.apache.org/jira/browse/FLINK-3109)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyangjun/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1527


commit a521c83eb31b653f0a4bfc9da58837a587a378c4
Author: Yangjun Wang 
Date:   2015-12-05T01:16:49Z

[FLINK-3109]Join two streams with two different buffer time -- Java 
implementation




> Join two streams with two different buffer time
> ---
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.1
>Reporter: Wang Yangjun
>  Labels: easyfix, patch
> Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. 
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. 
> The tuple in which could be described as (id, showed timestamp). The other 
> one is click stream -- (id, clicked timestamp). We want get a joined stream, 
> which includes all the advertisement that is clicked by user in 20 minutes 
> after showed.
> It is possible that after an advertisement is shown, some user click it 
> immediately. It is possible that "click" message arrives server earlier than 
> "show" message because of Internet delay. We assume that the maximum delay is 
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3255) Chaining behavior should not depend on parallelism

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108675#comment-15108675
 ] 

ASF GitHub Bot commented on FLINK-3255:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1518#issuecomment-173231033
  
Test failures are unrelated.

Merging this...


> Chaining behavior should not depend on parallelism
> --
>
> Key: FLINK-3255
> URL: https://issues.apache.org/jira/browse/FLINK-3255
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10.1
>
>
> Currently, operators are chained more aggressively when the parallelism is 
> one. That makes debugging tougher as it changes threading behavior.
> The benefits are also limited: Real installations where that type of 
> efficiency would be needed would not run in parallelism 1, or would not use a 
> partitioning/broadcast step there (if explicitly required to run parallelism 
> 1).
> In the future, when we want to allow parallelism to be adjusted dynamically, 
> this will be even more tricky.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3255] [streaming] Disable parallelism-d...

2016-01-20 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1518#issuecomment-173231033
  
Test failures are unrelated.

Merging this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3246] Remove unnecessary '-parent' suff...

2016-01-20 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1515#issuecomment-173231755
  
Merging this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3246) Consolidate maven project names with *-parent suffix

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108676#comment-15108676
 ] 

ASF GitHub Bot commented on FLINK-3246:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1515#issuecomment-173231755
  
Merging this...


> Consolidate maven project names with *-parent suffix
> 
>
> Key: FLINK-3246
> URL: https://issues.apache.org/jira/browse/FLINK-3246
> Project: Flink
>  Issue Type: Improvement
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.0.0
>
>
> The projects {{flink-streaming-connectors-parent}} and {{flink-contrib}} 
> parent carry the unnecessary {{-parent}} suffix.
> I suspect that was mistakenly added when looking at the root project called 
> {{flink-parent}}. The suffix was added there to not have a project with an 
> unqualified name {{flink}}. However, for the projects mentioned here, that 
> suffix is not necessary and can be dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3109) Join two streams with two different buffer time

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108998#comment-15108998
 ] 

ASF GitHub Bot commented on FLINK-3109:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50288042
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/StreamJoinOperator.java
 ---
@@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators;
+
+import org.apache.flink.api.common.functions.JoinFunction;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.core.memory.DataInputView;
+import org.apache.flink.runtime.state.StateBackend;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import 
org.apache.flink.streaming.runtime.operators.windowing.buffers.HeapWindowBuffer;
+import 
org.apache.flink.streaming.runtime.streamrecord.MultiplexingStreamRecordSerializer;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.streaming.runtime.tasks.StreamTaskState;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.Objects.requireNonNull;
+
+
+public class StreamJoinOperator
+   extends AbstractUdfStreamOperator>
+   implements TwoInputStreamOperator {
+
+   private static final long serialVersionUID = 8650694601687319011L;
+   private static final Logger LOG = 
LoggerFactory.getLogger(StreamJoinOperator.class);
+
+   private HeapWindowBuffer stream1Buffer;
+   private HeapWindowBuffer stream2Buffer;
+   private final KeySelector keySelector1;
+   private final KeySelector keySelector2;
+   private long stream1WindowLength;
+   private long stream2WindowLength;
+
+   protected transient long currentWatermark1 = -1L;
+   protected transient long currentWatermark2 = -1L;
+   protected transient long currentWatermark = -1L;
+
+   private TypeSerializer inputSerializer1;
+   private TypeSerializer inputSerializer2;
+   /**
+* If this is true. The current processing time is set as the timestamp 
of incoming elements.
+* This for use with a {@link 
org.apache.flink.streaming.api.windowing.evictors.TimeEvictor}
+* if eviction should happen based on processing time.
+*/
+   private boolean setProcessingTime = false;
+
+   public StreamJoinOperator(JoinFunction userFunction,
+   KeySelector keySelector1,
+   KeySelector keySelector2,
+   long stream1WindowLength,
+   long stream2WindowLength,
+   TypeSerializer inputSerializer1,
+   TypeSerializer inputSerializer2) {
+   super(userFunction);
+   this.keySelector1 = requireNonNull(keySelector1);
+   this.keySelector2 = requireNonNull(keySelector2);
+
+   this.stream1WindowLength = requireNonNull(stream1WindowLength);
+   this.stream2WindowLength = requireNonNull(stream2WindowLength);
+
+   this.inputSerializer1 = requireNonNull(inputSerializer1);
+   this.inputSerializer2 = requireNonNull(inputSerializer2);
+   }
+
+   @Override
+   public void open() throws Exception {
+   super.open();
+   if (null == inputSerializer1 || null == inputSerializer2) {
+   throw new IllegalStateException("Input serializer was 
not set.");
+   }
+
+   

[GitHub] flink pull request: [FLINK-3109]Join two streams with two differen...

2016-01-20 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50288200
  
--- Diff: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/CoGroupJoinITCase.java
 ---
@@ -234,6 +234,109 @@ public void invoke(String value) throws Exception {
Assert.assertEquals(expectedResult, testResults);
}
 
+
+   // TODO: design buffer join test
--- End diff --

TODO still unresolved?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3135) Add chainable driver for UNARY_NO_OP strategy

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109019#comment-15109019
 ] 

ASF GitHub Bot commented on FLINK-3135:
---

GitHub user ramkrish86 opened a pull request:

https://github.com/apache/flink/pull/1530

FLINK-3135 Add chainable driver for UNARY_NO_OP strategy(Ram)

My first flink PR. A simple way to add a Chainable Driver for UNARY_NO_OP. 
Just followed the existing code pieces and also reading the NoOpDriver created 
the NoOpChainedDriver. Just trying to learn and happy to work on 
feedbacks/comments.
Since here the input just needs to be passed to the output there is no need 
for any class to be loaded I felt. Hence most of the abstract APIs have empty 
implementations.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ramkrish86/flink FLINK-3135

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1530.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1530


commit 16b755a78e2cd56e193363dc3cd77d87eeaa7a16
Author: ramkrishna 
Date:   2016-01-20T17:22:55Z

FLINK-3135 Add chainable driver for UNARY_NO_OP strategy(Ram)




> Add chainable driver for UNARY_NO_OP strategy
> -
>
> Key: FLINK-3135
> URL: https://issues.apache.org/jira/browse/FLINK-3135
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.0.0
>Reporter: Fabian Hueske
>Priority: Minor
>  Labels: starter
>
> A chainable driver for UNARY_NO_OP strategy would decrease the serialization 
> overhead in certain situations.
> Should be fairly easy to implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3135 Add chainable driver for UNARY_NO_O...

2016-01-20 Thread ramkrish86
GitHub user ramkrish86 opened a pull request:

https://github.com/apache/flink/pull/1530

FLINK-3135 Add chainable driver for UNARY_NO_OP strategy(Ram)

My first flink PR. A simple way to add a Chainable Driver for UNARY_NO_OP. 
Just followed the existing code pieces and also reading the NoOpDriver created 
the NoOpChainedDriver. Just trying to learn and happy to work on 
feedbacks/comments.
Since here the input just needs to be passed to the output there is no need 
for any class to be loaded I felt. Hence most of the abstract APIs have empty 
implementations.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ramkrish86/flink FLINK-3135

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1530.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1530


commit 16b755a78e2cd56e193363dc3cd77d87eeaa7a16
Author: ramkrishna 
Date:   2016-01-20T17:22:55Z

FLINK-3135 Add chainable driver for UNARY_NO_OP strategy(Ram)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3268) Unstable test JobManagerSubmittedJobGraphsRecoveryITCase

2016-01-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109046#comment-15109046
 ] 

Stephan Ewen commented on FLINK-3268:
-

Another failure, different part of the test: 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/103628601/log.txt

This time, ZooKeeper connection is lost while cleaning up ZooKeeper registry 
data.

> Unstable test JobManagerSubmittedJobGraphsRecoveryITCase
> 
>
> Key: FLINK-3268
> URL: https://issues.apache.org/jira/browse/FLINK-3268
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Ufuk Celebi
>Priority: Critical
> Fix For: 1.0.0
>
>
> Logs for the failed test: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/103625073/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3247) Kafka Connector unusable with quickstarts - shading issue

2016-01-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109075#comment-15109075
 ] 

Stephan Ewen commented on FLINK-3247:
-

The {{*}} exclude worked for me, using Maven 3.0.4 and was the culprit.

Probably difference in behavior between Maven versions.

We should change this to not use star excludes at all.

> Kafka Connector unusable with quickstarts - shading issue
> -
>
> Key: FLINK-3247
> URL: https://issues.apache.org/jira/browse/FLINK-3247
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Robert Metzger
>Priority: Blocker
> Fix For: 1.0.0
>
>
> The Kafka Connector now requires Curator, which is referenced as 
> {{flink-shaded-curator}}. The quickstarts make sure it is not packaged into 
> the jar file via exclusions.
> The curator classes are however only in relocated form in the flink-dist.jar 
> - relocated manually in the {{flink-runtime}} project. The connector can thus 
> not use find the Curator classes and fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3256] Fix colocation group re-instantia...

2016-01-20 Thread senorcarbone
Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1526#issuecomment-173260776
  
Sure, I will do it on the fly. Thanks for checking it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3122] [Gelly] Generic vertex value in l...

2016-01-20 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1521#issuecomment-173260743
  
Thank you @s1ck! Looks good to me. I will merge it later if no objections.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3256) Invalid execution graph cleanup for jobs with colocation groups

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108834#comment-15108834
 ] 

ASF GitHub Bot commented on FLINK-3256:
---

Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/1526#issuecomment-173260776
  
Sure, I will do it on the fly. Thanks for checking it!


> Invalid execution graph cleanup for jobs with colocation groups
> ---
>
> Key: FLINK-3256
> URL: https://issues.apache.org/jira/browse/FLINK-3256
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Reporter: Paris Carbone
>Assignee: Paris Carbone
>Priority: Blocker
>
> Currently, upon restarting an execution graph, we clean-up the colocation 
> constraints for each group present in an ExecutionJobVertex respectively.
> This can lead to invalid reconfiguration upon a restart or any other activity 
> that relies on state cleanup of the execution graph. For example, upon 
> restarting a DataStream job with iterations the following steps are executed:
> 1) IterationSource colgroup constraints are reset
> 2) IterationSource execution vertices reset and create new colocation 
> constraints
> 3) IterationSink colgroup constraints are reset
> 4) IterationSink execution vertices reset and create different colocation 
> constraints.
> This can be trivially fixed by reseting colocation groups independently from 
> ExecutionJobVertices, thus, updating them once per reconfiguration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3122) Generalize value type in LabelPropagation

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108833#comment-15108833
 ] 

ASF GitHub Bot commented on FLINK-3122:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1521#issuecomment-173260743
  
Thank you @s1ck! Looks good to me. I will merge it later if no objections.


> Generalize value type in LabelPropagation
> -
>
> Key: FLINK-3122
> URL: https://issues.apache.org/jira/browse/FLINK-3122
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>
> Currently, {{LabelPropagation}} expects a {{Long}} value as label. This can 
> be generalized to anything which extends {{Comparable}}. The issue depends on 
> FLINK-3118 to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3265) RabbitMQ not threadsafe: ConcurrentModificationException

2016-01-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-3265:

Summary: RabbitMQ not threadsafe: ConcurrentModificationException  (was: 
Unstable RabbitMQ test: 
MessageAcknowledgingSourceBase.notifyCheckpointComplete: 
ConcurrentModificationException)

> RabbitMQ not threadsafe: ConcurrentModificationException
> 
>
> Key: FLINK-3265
> URL: https://issues.apache.org/jira/browse/FLINK-3265
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
>  Labels: test-stability
>
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.454 sec <<< 
> FAILURE! - in org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest
> testCheckpointing(org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest)
>   Time elapsed: 0.902 sec  <<< ERROR!
> java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.remove(HashMap.java:1443)
>   at java.util.AbstractSet.removeAll(AbstractSet.java:178)
>   at 
> org.apache.flink.streaming.api.functions.source.MessageAcknowledgingSourceBase.notifyCheckpointComplete(MessageAcknowledgingSourceBase.java:198)
>   at 
> org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest.testCheckpointing(RMQSourceTest.java:144)
> Results :
> Tests in error: 
>   RMQSourceTest.testCheckpointing:144 » ConcurrentModification
> {code}
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/103452897/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3265) RabbitMQ Source not threadsafe: ConcurrentModificationException

2016-01-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-3265:

Summary: RabbitMQ Source not threadsafe: ConcurrentModificationException  
(was: RabbitMQ not threadsafe: ConcurrentModificationException)

> RabbitMQ Source not threadsafe: ConcurrentModificationException
> ---
>
> Key: FLINK-3265
> URL: https://issues.apache.org/jira/browse/FLINK-3265
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
>Priority: Blocker
>  Labels: test-stability
>
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.454 sec <<< 
> FAILURE! - in org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest
> testCheckpointing(org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest)
>   Time elapsed: 0.902 sec  <<< ERROR!
> java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.remove(HashMap.java:1443)
>   at java.util.AbstractSet.removeAll(AbstractSet.java:178)
>   at 
> org.apache.flink.streaming.api.functions.source.MessageAcknowledgingSourceBase.notifyCheckpointComplete(MessageAcknowledgingSourceBase.java:198)
>   at 
> org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest.testCheckpointing(RMQSourceTest.java:144)
> Results :
> Tests in error: 
>   RMQSourceTest.testCheckpointing:144 » ConcurrentModification
> {code}
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/103452897/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3267] Disable reference tracking in Kry...

2016-01-20 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1528#issuecomment-173314485
  
Trigger was a mail from Theo on the ML where deserialization failes in the 
reference mapper. There may be another issue still, but a failure in a code 
that is better not executed seemed avoidable ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2933] Flink scala libraries exposed wit...

2016-01-20 Thread mxm
GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/1529

[FLINK-2933] Flink scala libraries exposed with maven should carry scala 
version

This pull request adds Scala suffixes to all Maven modules which dependent 
on a Scala version. The default Scala version is 2.10. It also includes a small 
script to list Scala-dependent modules. The current situation looks like this:

```
The following modules DON'T have a dependency on Scala:
flink-parent
flink-annotations
flink-batch-connectors
flink-contrib-parent
flink-core
flink-hcatalog
flink-jdbc
flink-libraries
flink-quickstart
flink-quickstart-java
flink-quickstart-scala
flink-shaded-curator
flink-shaded-curator-recipes
flink-shaded-curator-test
flink-shaded-hadoop
flink-shaded-hadoop2
flink-shaded-include-yarn-tests
flink-streaming-connectors

The following modules have a dependency on Scala:
flink-avro
flink-clients
flink-connector-elasticsearch
flink-connector-filesystem
flink-connector-flume
flink-connector-kafka
flink-connector-nifi
flink-connector-rabbitmq
flink-connector-twitter
flink-connector-wikiedits
flink-dist
flink-examples
flink-examples-batch
flink-examples-streaming
flink-fs-tests
flink-gelly
flink-gelly-scala
flink-hadoop-compatibility
flink-hbase
flink-java
flink-java8
flink-ml
flink-operator-stats
flink-optimizer
flink-python
flink-runtime
flink-runtime-web
flink-scala
flink-scala-shell
flink-storm
flink-storm-examples
flink-streaming-contrib
flink-streaming-java
flink-streaming-scala
flink-table
flink-test-utils
flink-tests
flink-tweet-inputformat
flink-yarn
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink FLINK-2940

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1529.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1529


commit bac800a3c23f675e983fb744bbe747e30c5ca7b0
Author: Maximilian Michels 
Date:   2016-01-20T09:39:40Z

add Scala suffixes to Scala dependent modules

commit 6a958abfaf221f75706ec630cd6bea3097f03bfc
Author: Maximilian Michels 
Date:   2016-01-20T14:22:55Z

remove scala suffix from scala-free modules

commit 452d76d2c4dd8028add9957941601849d8433082
Author: Maximilian Michels 
Date:   2016-01-20T15:43:19Z

[tools] adapt change-scala-version script

commit 0eb69085ba1c95ff150d47b435f0072642345c8c
Author: Maximilian Michels 
Date:   2016-01-20T15:50:18Z

[tools] add script to list Scala dependent modules




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3268) Unstable test JobManagerSubmittedJobGraphsRecoveryITCase

2016-01-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108850#comment-15108850
 ] 

Stephan Ewen commented on FLINK-3268:
-

The actual test failure is that the cluster does not properly shut down. 
Shutdown is triggered while one blocking job is still running.

The log of the shutdown period is below:
{code}
15:28:35,960 INFO  
org.apache.flink.runtime.jobmanager.JobManagerSubmittedJobGraphsRecoveryITCase  
- Wait that the non-leader removes the submitted job.
15:28:35,961 INFO  org.apache.flink.runtime.testingUtils.TestingTaskManager 
 - Stopping TaskManager akka://flink/user/taskmanager_1#-949637839.
15:28:35,961 INFO  org.apache.flink.runtime.testingUtils.TestingJobManager  
 - Stopping JobManager akka.tcp://flink@127.0.0.1:54921/user/jobmanager.
15:28:35,961 INFO  org.apache.flink.runtime.testingUtils.TestingTaskManager 
 - Cancelling all computations and discarding all cached data.
15:28:35,962 INFO  org.apache.flink.runtime.taskmanager.Task
 - Attempting to fail task externally Blocking Vertex (1/1)
15:28:35,962 INFO  org.apache.flink.runtime.taskmanager.Task
 - Blocking Vertex (1/1) switched to FAILED with exception.
java.lang.Exception: TaskManager is shutting down.
at 
org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:216)
at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
at 
org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:119)
at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
at akka.actor.ActorCell.terminate(ActorCell.scala:369)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15:28:35,963 INFO  org.apache.flink.runtime.testingUtils.TestingJobManager  
 - Stopping JobManager akka.tcp://flink@127.0.0.1:54922/user/jobmanager.
15:28:35,964 INFO  org.apache.flink.runtime.taskmanager.Task
 - Triggering cancellation of task code Blocking Vertex (1/1) 
(7e54466b8bfc1f8063e7a145126688ae).
15:28:35,965 INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - 
Stopping ZooKeeperLeaderElectionService.
15:28:35,965 INFO  org.apache.flink.runtime.testingUtils.TestingTaskManager 
 - Disassociating from JobManager
15:28:35,965 INFO  org.apache.flink.runtime.blob.BlobCache  
 - Shutting down BlobCache
15:28:35,965 INFO  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - 
Stopping ZooKeeperLeaderRetrievalService.
15:28:35,973 INFO  org.apache.flink.runtime.taskmanager.Task
 - Freeing task resources for Blocking Vertex (1/1)
15:28:40,135 ERROR org.apache.curator.framework.recipes.cache.PathChildrenCache 
 - 
java.lang.IllegalStateException: instance must be started before calling this 
method
at 
org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.getChildren(CuratorFrameworkImpl.java:379)
at 
org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:502)
at 
org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35)
at 
org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15:28:40,136 INFO  
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  - Removed 
job graph 1b1fc2fc95ce37e47702697bb4b4e84d from 

[jira] [Commented] (FLINK-2933) Flink scala libraries exposed with maven should carry scala version

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109021#comment-15109021
 ] 

ASF GitHub Bot commented on FLINK-2933:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1529#issuecomment-173300875
  
Thanks for the PR! This fixes an issue that has been reported several times 
by users.
IIRC, the outcome of the discussion on the dev ML was to remove the Scala 
dependency from `flink-java` before adding the suffix. 

So we have to resolve FLINK-2972 first (one way or the other). Otherwise, 
we would update the `flink-java` SNAPSHOT artifacts twice and cause quite a bit 
trouble for our bleeding-edge users.


> Flink scala libraries exposed with maven should carry scala version
> ---
>
> Key: FLINK-2933
> URL: https://issues.apache.org/jira/browse/FLINK-2933
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Reporter: Frederick F. Kautz IV
>Assignee: Maximilian Michels
>Priority: Minor
>
> [If I put this on the wrong component, can someone please update?]
> Major versions of scala are not forward nor backwards compatible. Libraries 
> build for 2.10 will not work with 2.11 or vice versa.
> In order to avoid build related problems, it is strongly recommended to 
> append the scala version it is compatible within the artifact id. This 
> ensures the correct version of the library is pulled in rather than deferring 
> the problem to a future build or runtime error.
> For example, akka exposes the following packages for the same version:
> {code}
> 
>   com.typesafe.akka
>   akka-actor_2.10
>   2.3.14
> 
> {code}
> {code}
> 
>   com.typesafe.akka
>   akka-actor_2.11
>   2.3.14
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2972) Remove Twitter Chill dependency from flink-java module

2016-01-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109063#comment-15109063
 ] 

Stephan Ewen commented on FLINK-2972:
-

I think this is an issue when we deploy test jars.

> Remove Twitter Chill dependency from flink-java module
> --
>
> Key: FLINK-2972
> URL: https://issues.apache.org/jira/browse/FLINK-2972
> Project: Flink
>  Issue Type: Sub-task
>  Components: Java API
>Affects Versions: 1.0.0
>Reporter: Fabian Hueske
> Fix For: 1.0.0
>
>
> The {{flink-java}} module has a transitive dependency on Scala due to 
> Twitter's Chill library. Chill is used within Flink's KryoSerializer to 
> obtain a Kryo instance that has be initialized for Scala classes. 
> In order to decouple {{flink-java}} from Scala, its dependency on Chill needs 
> to be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3266) LocalFlinkMiniCluster leaks resources when multiple jobs are submitted

2016-01-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109076#comment-15109076
 ] 

Stephan Ewen commented on FLINK-3266:
-

Which threads are that in particular? The memory could well we a consequence of 
the thread leak...

> LocalFlinkMiniCluster leaks resources when multiple jobs are submitted
> --
>
> Key: FLINK-3266
> URL: https://issues.apache.org/jira/browse/FLINK-3266
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Priority: Minor
>
> After a job submitted to a LocalEnvironment finishes, some threads are not 
> stopped, and are stuck in waiting forever.
> You can observe this, if you enclose the body of the main function of the 
> WordCount example with a loop that executes 100 times, and monitor the thread 
> count (with VisualVM for example).
> (The problem only happens if I use a mini cluster. If I use start-local.sh 
> and submit jobs to it, then there is no leak.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2237) Add hash-based Aggregation

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109095#comment-15109095
 ] 

ASF GitHub Bot commented on FLINK-2237:
---

Github user ggevay commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-173319246
  
I've added some performance testing code and comments.


> Add hash-based Aggregation
> --
>
> Key: FLINK-2237
> URL: https://issues.apache.org/jira/browse/FLINK-2237
> Project: Flink
>  Issue Type: New Feature
>Reporter: Rafiullah Momand
>Assignee: Gabor Gevay
>Priority: Minor
>
> Aggregation functions at the moment are implemented in a sort-based way.
> How can we implement hash based Aggregation for Flink?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2237] [runtime] Add hash-based combiner...

2016-01-20 Thread ggevay
Github user ggevay commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-173319246
  
I've added some performance testing code and comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3267) Disable reference tracking in Kryo fallback serializer

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109066#comment-15109066
 ] 

ASF GitHub Bot commented on FLINK-3267:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1528#issuecomment-173314485
  
Trigger was a mail from Theo on the ML where deserialization failes in the 
reference mapper. There may be another issue still, but a failure in a code 
that is better not executed seemed avoidable ;-)


> Disable reference tracking in Kryo fallback serializer
> --
>
> Key: FLINK-3267
> URL: https://issues.apache.org/jira/browse/FLINK-3267
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.0.0
>
>
> Kryo runs extra logic to track and resolve repeated references to the same 
> object (similar as JavaSerialization)
> We should disable reference tracking
>   - reference tracking is costly
>   - it is virtually always unnecessary in the datatypes used in Flink
>   - most importantly, it is inconsistent with Flink's own serialization 
> (which does not do reference tracking)
>   - It may have problems if elements are read in a different order than they 
> are written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[GitHub] flink pull request: [FLINK-3267] Disable reference tracking in Kry...

2016-01-20 Thread StephanEwen
GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/1528

[FLINK-3267] Disable reference tracking in Kryo fallback serializer

Before this commit, Kryo runs extra logic to track and resolve repeated 
references to
the same object (similar as JavaSerialization)

This disables reference tracking because
  - reference tracking is costly
  - it is virtually always unnecessary in the datatypes used in Flink
  - it is inconsistent with Flink's own serialization (which does not do 
reference tracking)
  - it may have problems if elements are read in a different order than 
they are written.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink kryo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1528.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1528


commit cb94002f4b38090d21e0baadbd5964ebb9c4de9d
Author: Stephan Ewen 
Date:   2016-01-20T15:26:58Z

[FLINK-3267] Disable reference tracking in Kryo fallback serializer

Before this commit, Kryo runs extra logic to track and resolve repeated 
references to
the same object (similar as JavaSerialization)

This disables reference tracking because
  - reference tracking is costly
  - it is virtually always unnecessary in the datatypes used in Flink
  - it is inconsistent with Flink's own serialization (which does not do 
reference tracking)
  - it may have problems if elements are read in a different order than 
they are written.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3267) Disable reference tracking in Kryo fallback serializer

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108717#comment-15108717
 ] 

ASF GitHub Bot commented on FLINK-3267:
---

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/1528

[FLINK-3267] Disable reference tracking in Kryo fallback serializer

Before this commit, Kryo runs extra logic to track and resolve repeated 
references to
the same object (similar as JavaSerialization)

This disables reference tracking because
  - reference tracking is costly
  - it is virtually always unnecessary in the datatypes used in Flink
  - it is inconsistent with Flink's own serialization (which does not do 
reference tracking)
  - it may have problems if elements are read in a different order than 
they are written.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink kryo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1528.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1528


commit cb94002f4b38090d21e0baadbd5964ebb9c4de9d
Author: Stephan Ewen 
Date:   2016-01-20T15:26:58Z

[FLINK-3267] Disable reference tracking in Kryo fallback serializer

Before this commit, Kryo runs extra logic to track and resolve repeated 
references to
the same object (similar as JavaSerialization)

This disables reference tracking because
  - reference tracking is costly
  - it is virtually always unnecessary in the datatypes used in Flink
  - it is inconsistent with Flink's own serialization (which does not do 
reference tracking)
  - it may have problems if elements are read in a different order than 
they are written.




> Disable reference tracking in Kryo fallback serializer
> --
>
> Key: FLINK-3267
> URL: https://issues.apache.org/jira/browse/FLINK-3267
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.0.0
>
>
> Kryo runs extra logic to track and resolve repeated references to the same 
> object (similar as JavaSerialization)
> We should disable reference tracking
>   - reference tracking is costly
>   - it is virtually always unnecessary in the datatypes used in Flink
>   - most importantly, it is inconsistent with Flink's own serialization 
> (which does not do reference tracking)
>   - It may have problems if elements are read in a different order than they 
> are written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2933) Flink scala libraries exposed with maven should carry scala version

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108794#comment-15108794
 ] 

ASF GitHub Bot commented on FLINK-2933:
---

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/1529

[FLINK-2933] Flink scala libraries exposed with maven should carry scala 
version

This pull request adds Scala suffixes to all Maven modules which dependent 
on a Scala version. The default Scala version is 2.10. It also includes a small 
script to list Scala-dependent modules. The current situation looks like this:

```
The following modules DON'T have a dependency on Scala:
flink-parent
flink-annotations
flink-batch-connectors
flink-contrib-parent
flink-core
flink-hcatalog
flink-jdbc
flink-libraries
flink-quickstart
flink-quickstart-java
flink-quickstart-scala
flink-shaded-curator
flink-shaded-curator-recipes
flink-shaded-curator-test
flink-shaded-hadoop
flink-shaded-hadoop2
flink-shaded-include-yarn-tests
flink-streaming-connectors

The following modules have a dependency on Scala:
flink-avro
flink-clients
flink-connector-elasticsearch
flink-connector-filesystem
flink-connector-flume
flink-connector-kafka
flink-connector-nifi
flink-connector-rabbitmq
flink-connector-twitter
flink-connector-wikiedits
flink-dist
flink-examples
flink-examples-batch
flink-examples-streaming
flink-fs-tests
flink-gelly
flink-gelly-scala
flink-hadoop-compatibility
flink-hbase
flink-java
flink-java8
flink-ml
flink-operator-stats
flink-optimizer
flink-python
flink-runtime
flink-runtime-web
flink-scala
flink-scala-shell
flink-storm
flink-storm-examples
flink-streaming-contrib
flink-streaming-java
flink-streaming-scala
flink-table
flink-test-utils
flink-tests
flink-tweet-inputformat
flink-yarn
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink FLINK-2940

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1529.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1529


commit bac800a3c23f675e983fb744bbe747e30c5ca7b0
Author: Maximilian Michels 
Date:   2016-01-20T09:39:40Z

add Scala suffixes to Scala dependent modules

commit 6a958abfaf221f75706ec630cd6bea3097f03bfc
Author: Maximilian Michels 
Date:   2016-01-20T14:22:55Z

remove scala suffix from scala-free modules

commit 452d76d2c4dd8028add9957941601849d8433082
Author: Maximilian Michels 
Date:   2016-01-20T15:43:19Z

[tools] adapt change-scala-version script

commit 0eb69085ba1c95ff150d47b435f0072642345c8c
Author: Maximilian Michels 
Date:   2016-01-20T15:50:18Z

[tools] add script to list Scala dependent modules




> Flink scala libraries exposed with maven should carry scala version
> ---
>
> Key: FLINK-2933
> URL: https://issues.apache.org/jira/browse/FLINK-2933
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Reporter: Frederick F. Kautz IV
>Assignee: Maximilian Michels
>Priority: Minor
>
> [If I put this on the wrong component, can someone please update?]
> Major versions of scala are not forward nor backwards compatible. Libraries 
> build for 2.10 will not work with 2.11 or vice versa.
> In order to avoid build related problems, it is strongly recommended to 
> append the scala version it is compatible within the artifact id. This 
> ensures the correct version of the library is pulled in rather than deferring 
> the problem to a future build or runtime error.
> For example, akka exposes the following packages for the same version:
> {code}
> 
>   com.typesafe.akka
>   akka-actor_2.10
>   2.3.14
> 
> {code}
> {code}
> 
>   com.typesafe.akka
>   akka-actor_2.11
>   2.3.14
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3247) Kafka Connector unusable with quickstarts - shading issue

2016-01-20 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-3247:
-

Assignee: Robert Metzger

> Kafka Connector unusable with quickstarts - shading issue
> -
>
> Key: FLINK-3247
> URL: https://issues.apache.org/jira/browse/FLINK-3247
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Robert Metzger
>Priority: Blocker
> Fix For: 1.0.0
>
>
> The Kafka Connector now requires Curator, which is referenced as 
> {{flink-shaded-curator}}. The quickstarts make sure it is not packaged into 
> the jar file via exclusions.
> The curator classes are however only in relocated form in the flink-dist.jar 
> - relocated manually in the {{flink-runtime}} project. The connector can thus 
> not use find the Curator classes and fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3266) LocalFlinkMiniCluster leaks resources when multiple jobs are submitted

2016-01-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108853#comment-15108853
 ] 

Stephan Ewen commented on FLINK-3266:
-

Can you write what resources you found leaked?

> LocalFlinkMiniCluster leaks resources when multiple jobs are submitted
> --
>
> Key: FLINK-3266
> URL: https://issues.apache.org/jira/browse/FLINK-3266
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Priority: Minor
>
> After a job submitted to a LocalEnvironment finishes, some threads are not 
> stopped, and are stuck in waiting forever.
> You can observe this, if you enclose the body of the main function of the 
> WordCount example with a loop that executes 100 times, and monitor the thread 
> count (with VisualVM for example).
> (The problem only happens if I use a mini cluster. If I use start-local.sh 
> and submit jobs to it, then there is no leak.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3268) Unstable test JobManagerSubmittedJobGraphsRecoveryITCase

2016-01-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-3268:
--

Assignee: Ufuk Celebi

> Unstable test JobManagerSubmittedJobGraphsRecoveryITCase
> 
>
> Key: FLINK-3268
> URL: https://issues.apache.org/jira/browse/FLINK-3268
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Ufuk Celebi
>Priority: Critical
> Fix For: 1.0.0
>
>
> Logs for the failed test: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/103625073/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3161) Externalize cluster start-up and tear-down when available

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108760#comment-15108760
 ] 

ASF GitHub Bot commented on FLINK-3161:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1523#issuecomment-173244033
  
Since this tests that `pdsh` is available, this should be good to merge...


> Externalize cluster start-up and tear-down when available
> -
>
> Key: FLINK-3161
> URL: https://issues.apache.org/jira/browse/FLINK-3161
> Project: Flink
>  Issue Type: Improvement
>  Components: Start-Stop Scripts
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> I have been using pdsh, pdcp, and rpdcp to both distribute compiled Flink and 
> to start and stop the TaskManagers. The current shell script initializes 
> TaskManagers one-at-a-time. This is trivial to background but would be 
> unthrottled.
> From pdsh's archived homepage: "uses a sliding window of threads to execute 
> remote commands, conserving socket resources while allowing some connections 
> to timeout if needed".
> What other tools could be supported when available?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3161][dist] Externalize cluster start-u...

2016-01-20 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1523#issuecomment-173244033
  
Since this tests that `pdsh` is available, this should be good to merge...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3256] Fix colocation group re-instantia...

2016-01-20 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1526#issuecomment-173240931
  
Thanks for fixing this, looks like a good issue!

One suggestion for a change: Rather than collecting the CoLocationGroups in 
an extra set in the ExecutionGraph, can you iterate over the 
ExecutionJobvertices and simply reset them there?

I think each field less on the ExecutionGraph makes this simpler to 
maintain and more future-proof.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3266) LocalFlinkMiniCluster leaks resources when multiple jobs are submitted

2016-01-20 Thread Gabor Gevay (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108861#comment-15108861
 ] 

Gabor Gevay commented on FLINK-3266:


Threads staying alive are the most apparent, but there is some memory as well 
(maybe about 200 KB per job).

> LocalFlinkMiniCluster leaks resources when multiple jobs are submitted
> --
>
> Key: FLINK-3266
> URL: https://issues.apache.org/jira/browse/FLINK-3266
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Priority: Minor
>
> After a job submitted to a LocalEnvironment finishes, some threads are not 
> stopped, and are stuck in waiting forever.
> You can observe this, if you enclose the body of the main function of the 
> WordCount example with a loop that executes 100 times, and monitor the thread 
> count (with VisualVM for example).
> (The problem only happens if I use a mini cluster. If I use start-local.sh 
> and submit jobs to it, then there is no leak.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2972) Remove Twitter Chill dependency from flink-java module

2016-01-20 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108965#comment-15108965
 ] 

Fabian Hueske commented on FLINK-2972:
--

Is that an issue? 
The regular Maven artifact that users would include does not depend on Scala, 
right?

> Remove Twitter Chill dependency from flink-java module
> --
>
> Key: FLINK-2972
> URL: https://issues.apache.org/jira/browse/FLINK-2972
> Project: Flink
>  Issue Type: Sub-task
>  Components: Java API
>Affects Versions: 1.0.0
>Reporter: Fabian Hueske
> Fix For: 1.0.0
>
>
> The {{flink-java}} module has a transitive dependency on Scala due to 
> Twitter's Chill library. Chill is used within Flink's KryoSerializer to 
> obtain a Kryo instance that has be initialized for Scala classes. 
> In order to decouple {{flink-java}} from Scala, its dependency on Chill needs 
> to be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3266) LocalFlinkMiniCluster leaks resources when multiple jobs are submitted

2016-01-20 Thread Gabor Gevay (JIRA)
Gabor Gevay created FLINK-3266:
--

 Summary: LocalFlinkMiniCluster leaks resources when multiple jobs 
are submitted
 Key: FLINK-3266
 URL: https://issues.apache.org/jira/browse/FLINK-3266
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 1.0.0
Reporter: Gabor Gevay
Priority: Minor


After a job submitted to a LocalEnvironment finishes, some threads are not 
stopped, and are stuck in waiting forever.

You can observe this, if you enclose the body of the main function of the 
WordCount example with a loop that executes 100 times, and monitor the thread 
count (with VisualVM for example).

(The problem only happens if I use a mini cluster. If I use start-local.sh and 
submit jobs to it, then there is no leak.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3267) Disable reference tracking in Kryo fallback serializer

2016-01-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-3267:

Priority: Blocker  (was: Major)

> Disable reference tracking in Kryo fallback serializer
> --
>
> Key: FLINK-3267
> URL: https://issues.apache.org/jira/browse/FLINK-3267
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.0.0
>
>
> Kryo runs extra logic to track and resolve repeated references to the same 
> object (similar as JavaSerialization)
> We should disable reference tracking
>   - reference tracking is costly
>   - it is virtually always unnecessary in the datatypes used in Flink
>   - most importantly, it is inconsistent with Flink's own serialization 
> (which does not do reference tracking)
>   - It may have problems if elements are read in a different order than they 
> are written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3268) Unstable test JobManagerSubmittedJobGraphsRecoveryITCase

2016-01-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-3268:
---

 Summary: Unstable test JobManagerSubmittedJobGraphsRecoveryITCase
 Key: FLINK-3268
 URL: https://issues.apache.org/jira/browse/FLINK-3268
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.10.1
Reporter: Stephan Ewen
Priority: Critical
 Fix For: 1.0.0


Logs for the failed test: 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/103625073/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3242] Also Set User-specified StateBack...

2016-01-20 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1516#issuecomment-173241440
  
Critical fix.

+1 to merge for both 1.0 and 0.10 branches


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3247) Kafka Connector unusable with quickstarts - shading issue

2016-01-20 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108907#comment-15108907
 ] 

Robert Metzger commented on FLINK-3247:
---

I can not reproduce the issue. 
Also, the {{*}} does not seem to work properly:

{code}
[INFO] --- maven-shade-plugin:2.4.1:shade (default) @ debug-1.0-2.11 ---
[INFO] Including org.apache.flink:flink-scala:jar:1.0-SNAPSHOT in the shaded 
jar.
[INFO] Including org.apache.flink:flink-core:jar:1.0-SNAPSHOT in the shaded jar.
[INFO] Including org.apache.flink:flink-annotations:jar:1.0-SNAPSHOT in the 
shaded jar.
[INFO] Including org.apache.flink:flink-shaded-hadoop2:jar:1.0-SNAPSHOT in the 
shaded jar.
[INFO] Including xmlenc:xmlenc:jar:0.52 in the shaded jar.
[INFO] Including commons-httpclient:commons-httpclient:jar:3.1 in the shaded 
jar.
[INFO] Excluding commons-codec:commons-codec:jar:1.4 from the shaded jar.
[INFO] Excluding commons-io:commons-io:jar:2.4 from the shaded jar.
[INFO] Including commons-net:commons-net:jar:3.1 in the shaded jar.
[INFO] Including javax.servlet:servlet-api:jar:2.5 in the shaded jar.
[INFO] Including com.sun.jersey:jersey-core:jar:1.9 in the shaded jar.
[INFO] Including commons-el:commons-el:jar:1.0 in the shaded jar.
[INFO] Excluding commons-logging:commons-logging:jar:1.1.3 from the shaded jar.
[INFO] Including net.java.dev.jets3t:jets3t:jar:0.9.0 in the shaded jar.
[INFO] Excluding org.apache.httpcomponents:httpclient:jar:4.2.6 from the shaded 
jar.
[INFO] Excluding org.apache.httpcomponents:httpcore:jar:4.2.5 from the shaded 
jar.
[INFO] Including com.jamesmurty.utils:java-xmlbuilder:jar:0.4 in the shaded jar.
[INFO] Excluding commons-lang:commons-lang:jar:2.6 from the shaded jar.
[INFO] Including commons-configuration:commons-configuration:jar:1.6 in the 
shaded jar.
[INFO] Including commons-digester:commons-digester:jar:1.8 in the shaded jar.
[INFO] Including commons-beanutils:commons-beanutils:jar:1.7.0 in the shaded 
jar.
[INFO] Including commons-beanutils:commons-beanutils-core:jar:1.8.0 in the 
shaded jar.
[INFO] Excluding org.codehaus.jackson:jackson-core-asl:jar:1.8.8 from the 
shaded jar.
[INFO] Excluding org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8 from the 
shaded jar.
[INFO] Excluding com.thoughtworks.paranamer:paranamer:jar:2.3 from the shaded 
jar.
[INFO] Excluding org.xerial.snappy:snappy-java:jar:1.0.5 from the shaded jar.
[INFO] Including com.jcraft:jsch:jar:0.1.42 in the shaded jar.
[INFO] Excluding org.apache.commons:commons-compress:jar:1.4.1 from the shaded 
jar.
[INFO] Excluding org.tukaani:xz:jar:1.0 from the shaded jar.
[INFO] Including commons-daemon:commons-daemon:jar:1.0.13 in the shaded jar.
[INFO] Including javax.xml.bind:jaxb-api:jar:2.2.2 in the shaded jar.
[INFO] Including javax.xml.stream:stax-api:jar:1.0-2 in the shaded jar.
[INFO] Including javax.activation:activation:jar:1.1 in the shaded jar.
[INFO] Including com.google.inject:guice:jar:3.0 in the shaded jar.
[INFO] Including javax.inject:javax.inject:jar:1 in the shaded jar.
[INFO] Including aopalliance:aopalliance:jar:1.0 in the shaded jar.
[INFO] Excluding commons-collections:commons-collections:jar:3.2.2 from the 
shaded jar.
[INFO] Excluding com.esotericsoftware.kryo:kryo:jar:2.24.0 from the shaded jar.
[INFO] Excluding com.esotericsoftware.minlog:minlog:jar:1.2 from the shaded jar.
[INFO] Excluding org.objenesis:objenesis:jar:2.1 from the shaded jar.
[INFO] Including org.apache.flink:flink-java:jar:1.0-SNAPSHOT in the shaded jar.
[INFO] Excluding org.apache.avro:avro:jar:1.7.6 from the shaded jar.
[INFO] Excluding com.twitter:chill-java:jar:0.5.2 from the shaded jar.
[INFO] Excluding de.javakaffee:kryo-serializers:jar:0.27 from the shaded jar.
[INFO] Excluding joda-time:joda-time:jar:2.5 from the shaded jar.
[INFO] Including org.joda:joda-convert:jar:1.7 in the shaded jar.
[INFO] Including org.apache.commons:commons-math3:jar:3.5 in the shaded jar.
[INFO] Including org.apache.flink:flink-optimizer:jar:1.0-SNAPSHOT in the 
shaded jar.
[INFO] Excluding org.scala-lang:scala-reflect:jar:2.10.4 from the shaded jar.
[INFO] Excluding org.scala-lang:scala-library:jar:2.10.4 from the shaded jar.
[INFO] Excluding org.scala-lang:scala-compiler:jar:2.10.4 from the shaded jar.
[INFO] Including org.scalamacros:quasiquotes_2.10:jar:2.0.1 in the shaded jar.
[INFO] Excluding org.apache.commons:commons-lang3:jar:3.3.2 from the shaded jar.
[INFO] Excluding org.slf4j:slf4j-api:jar:1.7.7 from the shaded jar.
[INFO] Excluding org.slf4j:slf4j-log4j12:jar:1.7.7 from the shaded jar.
[INFO] Excluding log4j:log4j:jar:1.2.17 from the shaded jar.
[INFO] Including org.apache.flink:flink-streaming-scala:jar:1.0-SNAPSHOT in the 
shaded jar.
[INFO] Including org.apache.flink:flink-streaming-java:jar:1.0-SNAPSHOT in the 
shaded jar.
[INFO] Excluding org.apache.commons:commons-math:jar:2.2 from the shaded jar.
[INFO] Excluding 

[GitHub] flink pull request: [FLINK-3267] Disable reference tracking in Kry...

2016-01-20 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1528#issuecomment-173276572
  
Just curious, was there an issue with this? Or how did you stumble upon the 
reference tracking?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3267) Disable reference tracking in Kryo fallback serializer

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108944#comment-15108944
 ] 

ASF GitHub Bot commented on FLINK-3267:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1528#issuecomment-173276572
  
Just curious, was there an issue with this? Or how did you stumble upon the 
reference tracking?


> Disable reference tracking in Kryo fallback serializer
> --
>
> Key: FLINK-3267
> URL: https://issues.apache.org/jira/browse/FLINK-3267
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.0.0
>
>
> Kryo runs extra logic to track and resolve repeated references to the same 
> object (similar as JavaSerialization)
> We should disable reference tracking
>   - reference tracking is costly
>   - it is virtually always unnecessary in the datatypes used in Flink
>   - most importantly, it is inconsistent with Flink's own serialization 
> (which does not do reference tracking)
>   - It may have problems if elements are read in a different order than they 
> are written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3242) User-specified StateBackend is not Respected if Checkpointing is Disabled

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108746#comment-15108746
 ] 

ASF GitHub Bot commented on FLINK-3242:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1516#issuecomment-173241440
  
Critical fix.

+1 to merge for both 1.0 and 0.10 branches


> User-specified StateBackend is not Respected if Checkpointing is Disabled
> -
>
> Key: FLINK-3242
> URL: https://issues.apache.org/jira/browse/FLINK-3242
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Blocker
>
> In {{StreamingJobGraphGenerator.java}} (line 281) the StateBackend is only 
> set on the {{StreamConfig}} if checkpointing is enabled.
> This is easy to fix but we also need to add test to prevent this from 
> happening in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2439) [py] Expand DataSet feature coverage

2016-01-20 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-2439.
---
Resolution: Fixed

Implemented in ac42d150441f968dfb2eaa578928f0e0d987e828

> [py] Expand DataSet feature coverage
> 
>
> Key: FLINK-2439
> URL: https://issues.apache.org/jira/browse/FLINK-2439
> Project: Flink
>  Issue Type: Sub-task
>  Components: Python API
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> An upcoming commit of mine will add the following methods to the Python API's 
> DataSet class:
> first
> distinct
> partitionByHash
> rebalance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3267) Disable reference tracking in Kryo fallback serializer

2016-01-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-3267:
---

 Summary: Disable reference tracking in Kryo fallback serializer
 Key: FLINK-3267
 URL: https://issues.apache.org/jira/browse/FLINK-3267
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.10.1
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.0.0


Kryo runs extra logic to track and resolve repeated references to the same 
object (similar as JavaSerialization)

We should disable reference tracking
  - reference tracking is costly
  - it is virtually always unnecessary in the datatypes used in Flink
  - most importantly, it is inconsistent with Flink's own serialization (which 
does not do reference tracking)
  - It may have problems if elements are read in a different order than they 
are written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3265) RabbitMQ not threadsafe: ConcurrentModificationException

2016-01-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-3265:

Priority: Critical  (was: Major)

> RabbitMQ not threadsafe: ConcurrentModificationException
> 
>
> Key: FLINK-3265
> URL: https://issues.apache.org/jira/browse/FLINK-3265
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
>Priority: Critical
>  Labels: test-stability
>
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.454 sec <<< 
> FAILURE! - in org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest
> testCheckpointing(org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest)
>   Time elapsed: 0.902 sec  <<< ERROR!
> java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.remove(HashMap.java:1443)
>   at java.util.AbstractSet.removeAll(AbstractSet.java:178)
>   at 
> org.apache.flink.streaming.api.functions.source.MessageAcknowledgingSourceBase.notifyCheckpointComplete(MessageAcknowledgingSourceBase.java:198)
>   at 
> org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest.testCheckpointing(RMQSourceTest.java:144)
> Results :
> Tests in error: 
>   RMQSourceTest.testCheckpointing:144 » ConcurrentModification
> {code}
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/103452897/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3265) RabbitMQ not threadsafe: ConcurrentModificationException

2016-01-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-3265:

Priority: Blocker  (was: Critical)

> RabbitMQ not threadsafe: ConcurrentModificationException
> 
>
> Key: FLINK-3265
> URL: https://issues.apache.org/jira/browse/FLINK-3265
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
>Priority: Blocker
>  Labels: test-stability
>
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest
> Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.454 sec <<< 
> FAILURE! - in org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest
> testCheckpointing(org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest)
>   Time elapsed: 0.902 sec  <<< ERROR!
> java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.remove(HashMap.java:1443)
>   at java.util.AbstractSet.removeAll(AbstractSet.java:178)
>   at 
> org.apache.flink.streaming.api.functions.source.MessageAcknowledgingSourceBase.notifyCheckpointComplete(MessageAcknowledgingSourceBase.java:198)
>   at 
> org.apache.flink.streaming.connectors.rabbitmq.RMQSourceTest.testCheckpointing(RMQSourceTest.java:144)
> Results :
> Tests in error: 
>   RMQSourceTest.testCheckpointing:144 » ConcurrentModification
> {code}
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/103452897/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3269) Return value from DataInputStream#read() should be checked in PythonPlanReceiver

2016-01-20 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3269:
-

 Summary: Return value from DataInputStream#read() should be 
checked in PythonPlanReceiver
 Key: FLINK-3269
 URL: https://issues.apache.org/jira/browse/FLINK-3269
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


There're several places where the return value from DataInputStream#read() 
isn't checked.
Here is an example from BytesDeserializer :
{code}
  byte[] buffer = new byte[size];
  input.read(buffer);
  return buffer;
{code}
Buffer of size bytes is allocated.
However, the read operation may not fill size bytes.

The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3269) Return value from DataInputStream#read() should be checked in PythonPlanReceiver

2016-01-20 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109300#comment-15109300
 ] 

Chesnay Schepler commented on FLINK-3269:
-

looks like i messed up while rebasing, will fix it tomorrow.

> Return value from DataInputStream#read() should be checked in 
> PythonPlanReceiver
> 
>
> Key: FLINK-3269
> URL: https://issues.apache.org/jira/browse/FLINK-3269
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Chesnay Schepler
>Priority: Minor
>
> There're several places where the return value from DataInputStream#read() 
> isn't checked.
> Here is an example from BytesDeserializer :
> {code}
>   byte[] buffer = new byte[size];
>   input.read(buffer);
>   return buffer;
> {code}
> Buffer of size bytes is allocated.
> However, the read operation may not fill size bytes.
> The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3058) Add Kafka consumer for new 0.9.0.0 Kafka API

2016-01-20 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-3058.
---
   Resolution: Fixed
Fix Version/s: 1.0.0

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/81320c1c

> Add Kafka consumer for new 0.9.0.0 Kafka API
> 
>
> Key: FLINK-3058
> URL: https://issues.apache.org/jira/browse/FLINK-3058
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.0.0
>
>
> The Apache Kafka project is about to release a new consumer API . They also 
> changed their internal protocol so Kafka 0.9.0.0 users will need an updated 
> consumer from Flink.
> Also, I would like to let Flink be among the first stream processors 
> supporting Kafka 0.9.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3269) Return value from DataInputStream#read() should be checked in PythonPlanReceiver

2016-01-20 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-3269:
---

Assignee: Chesnay Schepler

> Return value from DataInputStream#read() should be checked in 
> PythonPlanReceiver
> 
>
> Key: FLINK-3269
> URL: https://issues.apache.org/jira/browse/FLINK-3269
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Chesnay Schepler
>Priority: Minor
>
> There're several places where the return value from DataInputStream#read() 
> isn't checked.
> Here is an example from BytesDeserializer :
> {code}
>   byte[] buffer = new byte[size];
>   input.read(buffer);
>   return buffer;
> {code}
> Buffer of size bytes is allocated.
> However, the read operation may not fill size bytes.
> The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3109) Join two streams with two different buffer time

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109407#comment-15109407
 ] 

ASF GitHub Bot commented on FLINK-3109:
---

Github user wangyangjun commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50318604
  
--- Diff: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/CoGroupJoinITCase.java
 ---
@@ -234,6 +234,109 @@ public void invoke(String value) throws Exception {
Assert.assertEquals(expectedResult, testResults);
}
 
+
+   // TODO: design buffer join test
--- End diff --

@tillrohrmann Do you know why the checks have failed? There are 5 build 
jobs, only 3 of them passed. This is my first time to commit to an open source 
project. I have no idea how my code affects the failed tests.


> Join two streams with two different buffer time
> ---
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.1
>Reporter: Wang Yangjun
>  Labels: easyfix, patch
> Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. 
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. 
> The tuple in which could be described as (id, showed timestamp). The other 
> one is click stream -- (id, clicked timestamp). We want get a joined stream, 
> which includes all the advertisement that is clicked by user in 20 minutes 
> after showed.
> It is possible that after an advertisement is shown, some user click it 
> immediately. It is possible that "click" message arrives server earlier than 
> "show" message because of Internet delay. We assume that the maximum delay is 
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3109]Join two streams with two differen...

2016-01-20 Thread wangyangjun
Github user wangyangjun commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50318008
  
--- Diff: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/CoGroupJoinITCase.java
 ---
@@ -234,6 +234,109 @@ public void invoke(String value) throws Exception {
Assert.assertEquals(expectedResult, testResults);
}
 
+
+   // TODO: design buffer join test
--- End diff --

As you can see in the code, the test case is implemented. I forgot to 
remove this comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3122) Generalize value type in LabelPropagation

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109423#comment-15109423
 ] 

ASF GitHub Bot commented on FLINK-3122:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1521#issuecomment-173360370
  
Merging this.


> Generalize value type in LabelPropagation
> -
>
> Key: FLINK-3122
> URL: https://issues.apache.org/jira/browse/FLINK-3122
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>
> Currently, {{LabelPropagation}} expects a {{Long}} value as label. This can 
> be generalized to anything which extends {{Comparable}}. The issue depends on 
> FLINK-3118 to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3109) Join two streams with two different buffer time

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109390#comment-15109390
 ] 

ASF GitHub Bot commented on FLINK-3109:
---

Github user wangyangjun commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50318034
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/StreamJoinOperator.java
 ---
@@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators;
+
+import org.apache.flink.api.common.functions.JoinFunction;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.core.memory.DataInputView;
+import org.apache.flink.runtime.state.StateBackend;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import 
org.apache.flink.streaming.runtime.operators.windowing.buffers.HeapWindowBuffer;
+import 
org.apache.flink.streaming.runtime.streamrecord.MultiplexingStreamRecordSerializer;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.streaming.runtime.tasks.StreamTaskState;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.Objects.requireNonNull;
+
+
+public class StreamJoinOperator
+   extends AbstractUdfStreamOperator>
+   implements TwoInputStreamOperator {
+
+   private static final long serialVersionUID = 8650694601687319011L;
+   private static final Logger LOG = 
LoggerFactory.getLogger(StreamJoinOperator.class);
+
+   private HeapWindowBuffer stream1Buffer;
+   private HeapWindowBuffer stream2Buffer;
+   private final KeySelector keySelector1;
+   private final KeySelector keySelector2;
+   private long stream1WindowLength;
+   private long stream2WindowLength;
+
+   protected transient long currentWatermark1 = -1L;
+   protected transient long currentWatermark2 = -1L;
+   protected transient long currentWatermark = -1L;
+
+   private TypeSerializer inputSerializer1;
+   private TypeSerializer inputSerializer2;
+   /**
+* If this is true. The current processing time is set as the timestamp 
of incoming elements.
+* This for use with a {@link 
org.apache.flink.streaming.api.windowing.evictors.TimeEvictor}
+* if eviction should happen based on processing time.
+*/
+   private boolean setProcessingTime = false;
+
+   public StreamJoinOperator(JoinFunction userFunction,
+   KeySelector keySelector1,
+   KeySelector keySelector2,
+   long stream1WindowLength,
+   long stream2WindowLength,
+   TypeSerializer inputSerializer1,
+   TypeSerializer inputSerializer2) {
+   super(userFunction);
+   this.keySelector1 = requireNonNull(keySelector1);
+   this.keySelector2 = requireNonNull(keySelector2);
+
+   this.stream1WindowLength = requireNonNull(stream1WindowLength);
+   this.stream2WindowLength = requireNonNull(stream2WindowLength);
+
+   this.inputSerializer1 = requireNonNull(inputSerializer1);
+   this.inputSerializer2 = requireNonNull(inputSerializer2);
+   }
+
+   @Override
+   public void open() throws Exception {
+   super.open();
+   if (null == inputSerializer1 || null == inputSerializer2) {
+   throw new IllegalStateException("Input serializer was 
not set.");
+   }
+
+   

[jira] [Commented] (FLINK-3109) Join two streams with two different buffer time

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109388#comment-15109388
 ] 

ASF GitHub Bot commented on FLINK-3109:
---

Github user wangyangjun commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50318008
  
--- Diff: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/CoGroupJoinITCase.java
 ---
@@ -234,6 +234,109 @@ public void invoke(String value) throws Exception {
Assert.assertEquals(expectedResult, testResults);
}
 
+
+   // TODO: design buffer join test
--- End diff --

As you can see in the code, the test case is implemented. I forgot to 
remove this comment.


> Join two streams with two different buffer time
> ---
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.1
>Reporter: Wang Yangjun
>  Labels: easyfix, patch
> Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. 
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. 
> The tuple in which could be described as (id, showed timestamp). The other 
> one is click stream -- (id, clicked timestamp). We want get a joined stream, 
> which includes all the advertisement that is clicked by user in 20 minutes 
> after showed.
> It is possible that after an advertisement is shown, some user click it 
> immediately. It is possible that "click" message arrives server earlier than 
> "show" message because of Internet delay. We assume that the maximum delay is 
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3109]Join two streams with two differen...

2016-01-20 Thread wangyangjun
Github user wangyangjun commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50318034
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/StreamJoinOperator.java
 ---
@@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.operators;
+
+import org.apache.flink.api.common.functions.JoinFunction;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.core.memory.DataInputView;
+import org.apache.flink.runtime.state.StateBackend;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import 
org.apache.flink.streaming.runtime.operators.windowing.buffers.HeapWindowBuffer;
+import 
org.apache.flink.streaming.runtime.streamrecord.MultiplexingStreamRecordSerializer;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.streaming.runtime.tasks.StreamTaskState;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.Objects.requireNonNull;
+
+
+public class StreamJoinOperator
+   extends AbstractUdfStreamOperator>
+   implements TwoInputStreamOperator {
+
+   private static final long serialVersionUID = 8650694601687319011L;
+   private static final Logger LOG = 
LoggerFactory.getLogger(StreamJoinOperator.class);
+
+   private HeapWindowBuffer stream1Buffer;
+   private HeapWindowBuffer stream2Buffer;
+   private final KeySelector keySelector1;
+   private final KeySelector keySelector2;
+   private long stream1WindowLength;
+   private long stream2WindowLength;
+
+   protected transient long currentWatermark1 = -1L;
+   protected transient long currentWatermark2 = -1L;
+   protected transient long currentWatermark = -1L;
+
+   private TypeSerializer inputSerializer1;
+   private TypeSerializer inputSerializer2;
+   /**
+* If this is true. The current processing time is set as the timestamp 
of incoming elements.
+* This for use with a {@link 
org.apache.flink.streaming.api.windowing.evictors.TimeEvictor}
+* if eviction should happen based on processing time.
+*/
+   private boolean setProcessingTime = false;
+
+   public StreamJoinOperator(JoinFunction userFunction,
+   KeySelector keySelector1,
+   KeySelector keySelector2,
+   long stream1WindowLength,
+   long stream2WindowLength,
+   TypeSerializer inputSerializer1,
+   TypeSerializer inputSerializer2) {
+   super(userFunction);
+   this.keySelector1 = requireNonNull(keySelector1);
+   this.keySelector2 = requireNonNull(keySelector2);
+
+   this.stream1WindowLength = requireNonNull(stream1WindowLength);
+   this.stream2WindowLength = requireNonNull(stream2WindowLength);
+
+   this.inputSerializer1 = requireNonNull(inputSerializer1);
+   this.inputSerializer2 = requireNonNull(inputSerializer2);
+   }
+
+   @Override
+   public void open() throws Exception {
+   super.open();
+   if (null == inputSerializer1 || null == inputSerializer2) {
+   throw new IllegalStateException("Input serializer was 
not set.");
+   }
+
+   this.stream1Buffer = new 
HeapWindowBuffer.Factory().create();
+   this.stream2Buffer = new 
HeapWindowBuffer.Factory().create();
+   }
+
+   /**
+* @param element record of stream1
+* @throws Exception
+

[GitHub] flink pull request: [FLINK-3109]Join two streams with two differen...

2016-01-20 Thread wangyangjun
Github user wangyangjun commented on a diff in the pull request:

https://github.com/apache/flink/pull/1527#discussion_r50318604
  
--- Diff: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/CoGroupJoinITCase.java
 ---
@@ -234,6 +234,109 @@ public void invoke(String value) throws Exception {
Assert.assertEquals(expectedResult, testResults);
}
 
+
+   // TODO: design buffer join test
--- End diff --

@tillrohrmann Do you know why the checks have failed? There are 5 build 
jobs, only 3 of them passed. This is my first time to commit to an open source 
project. I have no idea how my code affects the failed tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3122] [Gelly] Generic vertex value in l...

2016-01-20 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1521#issuecomment-173360370
  
Merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3222) Incorrect shift amount in OperatorCheckpointStats#hashCode()

2016-01-20 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3222:
--
Description: 
Here is related code:
{code}
result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
>>> 32));
{code}

subTaskStats.length is an int.
The shift amount is greater than 31 bits.

  was:
Here is related code:
{code}
result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
>>> 32));
{code}
subTaskStats.length is an int.
The shift amount is greater than 31 bits.


> Incorrect shift amount in OperatorCheckpointStats#hashCode()
> 
>
> Key: FLINK-3222
> URL: https://issues.apache.org/jira/browse/FLINK-3222
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
> >>> 32));
> {code}
> subTaskStats.length is an int.
> The shift amount is greater than 31 bits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3196) InputStream should be closed in EnvironmentInformation#getRevisionInformation()

2016-01-20 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3196:
--
Description: 
Here is related code:
{code}
  InputStream propFile = 
EnvironmentInformation.class.getClassLoader().getResourceAsStream(".version.properties");
  if (propFile != null) {
Properties properties = new Properties();
properties.load(propFile);
{code}

propFile should be closed upon leaving the method.

  was:
Here is related code:
{code}
  InputStream propFile = 
EnvironmentInformation.class.getClassLoader().getResourceAsStream(".version.properties");
  if (propFile != null) {
Properties properties = new Properties();
properties.load(propFile);
{code}
propFile should be closed upon leaving the method.


> InputStream should be closed in 
> EnvironmentInformation#getRevisionInformation()
> ---
>
> Key: FLINK-3196
> URL: https://issues.apache.org/jira/browse/FLINK-3196
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
>   InputStream propFile = 
> EnvironmentInformation.class.getClassLoader().getResourceAsStream(".version.properties");
>   if (propFile != null) {
> Properties properties = new Properties();
> properties.load(propFile);
> {code}
> propFile should be closed upon leaving the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3256) Invalid execution graph cleanup for jobs with colocation groups

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108743#comment-15108743
 ] 

ASF GitHub Bot commented on FLINK-3256:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1526#issuecomment-173240931
  
Thanks for fixing this, looks like a good issue!

One suggestion for a change: Rather than collecting the CoLocationGroups in 
an extra set in the ExecutionGraph, can you iterate over the 
ExecutionJobvertices and simply reset them there?

I think each field less on the ExecutionGraph makes this simpler to 
maintain and more future-proof.


> Invalid execution graph cleanup for jobs with colocation groups
> ---
>
> Key: FLINK-3256
> URL: https://issues.apache.org/jira/browse/FLINK-3256
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Reporter: Paris Carbone
>Assignee: Paris Carbone
>Priority: Blocker
>
> Currently, upon restarting an execution graph, we clean-up the colocation 
> constraints for each group present in an ExecutionJobVertex respectively.
> This can lead to invalid reconfiguration upon a restart or any other activity 
> that relies on state cleanup of the execution graph. For example, upon 
> restarting a DataStream job with iterations the following steps are executed:
> 1) IterationSource colgroup constraints are reset
> 2) IterationSource execution vertices reset and create new colocation 
> constraints
> 3) IterationSink colgroup constraints are reset
> 4) IterationSink execution vertices reset and create different colocation 
> constraints.
> This can be trivially fixed by reseting colocation groups independently from 
> ExecutionJobVertices, thus, updating them once per reconfiguration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3058) Add Kafka consumer for new 0.9.0.0 Kafka API

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108757#comment-15108757
 ] 

ASF GitHub Bot commented on FLINK-3058:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1489#issuecomment-173243638
  
I addressed all concerns and rebased to master.

Once the tests have passed, I'll merge the change.


> Add Kafka consumer for new 0.9.0.0 Kafka API
> 
>
> Key: FLINK-3058
> URL: https://issues.apache.org/jira/browse/FLINK-3058
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> The Apache Kafka project is about to release a new consumer API . They also 
> changed their internal protocol so Kafka 0.9.0.0 users will need an updated 
> consumer from Flink.
> Also, I would like to let Flink be among the first stream processors 
> supporting Kafka 0.9.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3058] Add support for Kafka 0.9.0.0

2016-01-20 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1489#issuecomment-173243638
  
I addressed all concerns and rebased to master.

Once the tests have passed, I'll merge the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-2933) Flink scala libraries exposed with maven should carry scala version

2016-01-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned FLINK-2933:
-

Assignee: Maximilian Michels

> Flink scala libraries exposed with maven should carry scala version
> ---
>
> Key: FLINK-2933
> URL: https://issues.apache.org/jira/browse/FLINK-2933
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Reporter: Frederick F. Kautz IV
>Assignee: Maximilian Michels
>Priority: Minor
>
> [If I put this on the wrong component, can someone please update?]
> Major versions of scala are not forward nor backwards compatible. Libraries 
> build for 2.10 will not work with 2.11 or vice versa.
> In order to avoid build related problems, it is strongly recommended to 
> append the scala version it is compatible within the artifact id. This 
> ensures the correct version of the library is pulled in rather than deferring 
> the problem to a future build or runtime error.
> For example, akka exposes the following packages for the same version:
> {code}
> 
>   com.typesafe.akka
>   akka-actor_2.10
>   2.3.14
> 
> {code}
> {code}
> 
>   com.typesafe.akka
>   akka-actor_2.11
>   2.3.14
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2940) Deploy multiple Scala versions for Maven artifacts

2016-01-20 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned FLINK-2940:
-

Assignee: Maximilian Michels

> Deploy multiple Scala versions for Maven artifacts
> --
>
> Key: FLINK-2940
> URL: https://issues.apache.org/jira/browse/FLINK-2940
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 0.9, 0.10.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.0.0
>
>
> Flink implicitly defaults to Scala 2.10 at the moment. For 0.10 we already 
> built Scala 2.11 artifacts but did not deploy them in the Maven repository. 
> For the 1.0 release, multiple Scala versions should be offered via Maven.
> The artifacts which depend on a version of Scala, should contain the Scala 
> version in the artifact name. This is a common convention for Scala 
> artifacts. For example, {{flink-scala}} should be renamed to 
> {{flink-scala_2.10}} and there should be a {{flink-scala_2.11}} artifact as 
> well.
> Scala dependencies should be avoided where possible. For instance, 
> {{flink-java}} depends on Chill which depends on a specific version of Scala. 
> So we would have {{flink-java_2.10/11}}. We can either try to get rid of the 
> Scala dependency for these modules or shade our version of Scala.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3262][web-dashboard] Remove fuzzy versi...

2016-01-20 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1525#issuecomment-173371188
  
Ah, okay, so I ran the UI but neglected to note that the "fonts" were 
missing. Turns out the spec for [bower.json](
https://github.com/bower/spec/blob/master/json.md#main) now prefers 
pre-processed files. Font Awesome was 
[updated](https://github.com/FortAwesome/Font-Awesome/commit/7cde41ea934f43e66dc123ea6411d8535b50a4f7)
 for the new spec and our `gulpfile.js` needed to be modified to include 
`font-awesome.css` in the aggregated `vendor.css`.

Rather than downgrading Font Awesome back to 4.3.0 or including an override 
in `bower.json` I updated `gulpfile.js` to pre-process and include any `LESS` 
files in bower dependencies (as was already being done for Flink's 
`bootstrap_custom.less`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3262) Remove fuzzy versioning from Bower dependencies

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109525#comment-15109525
 ] 

ASF GitHub Bot commented on FLINK-3262:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1525#issuecomment-173371188
  
Ah, okay, so I ran the UI but neglected to note that the "fonts" were 
missing. Turns out the spec for [bower.json](
https://github.com/bower/spec/blob/master/json.md#main) now prefers 
pre-processed files. Font Awesome was 
[updated](https://github.com/FortAwesome/Font-Awesome/commit/7cde41ea934f43e66dc123ea6411d8535b50a4f7)
 for the new spec and our `gulpfile.js` needed to be modified to include 
`font-awesome.css` in the aggregated `vendor.css`.

Rather than downgrading Font Awesome back to 4.3.0 or including an override 
in `bower.json` I updated `gulpfile.js` to pre-process and include any `LESS` 
files in bower dependencies (as was already being done for Flink's 
`bootstrap_custom.less`).


> Remove fuzzy versioning from Bower dependencies
> ---
>
> Key: FLINK-3262
> URL: https://issues.apache.org/jira/browse/FLINK-3262
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Affects Versions: 1.00
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
>
> {{bower.json}} is currently defined with fuzzy versions, i.e. {{"bootstrap": 
> "~3.3.5"}}, which silently pull in patch updates. When a user compiles the 
> web frontend the new versions are creating changes in the compiled Javascript 
> and CSS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3058] Add support for Kafka 0.9.0.0

2016-01-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1489


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3058) Add Kafka consumer for new 0.9.0.0 Kafka API

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109542#comment-15109542
 ] 

ASF GitHub Bot commented on FLINK-3058:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1489


> Add Kafka consumer for new 0.9.0.0 Kafka API
> 
>
> Key: FLINK-3058
> URL: https://issues.apache.org/jira/browse/FLINK-3058
> Project: Flink
>  Issue Type: New Feature
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.0.0
>
>
> The Apache Kafka project is about to release a new consumer API . They also 
> changed their internal protocol so Kafka 0.9.0.0 users will need an updated 
> consumer from Flink.
> Also, I would like to let Flink be among the first stream processors 
> supporting Kafka 0.9.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3122) Generalize value type in LabelPropagation

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109541#comment-15109541
 ] 

ASF GitHub Bot commented on FLINK-3122:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1521


> Generalize value type in LabelPropagation
> -
>
> Key: FLINK-3122
> URL: https://issues.apache.org/jira/browse/FLINK-3122
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>
> Currently, {{LabelPropagation}} expects a {{Long}} value as label. This can 
> be generalized to anything which extends {{Comparable}}. The issue depends on 
> FLINK-3118 to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3122] [Gelly] Generic vertex value in l...

2016-01-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1521


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2933] Flink scala libraries exposed wit...

2016-01-20 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1529#issuecomment-173300875
  
Thanks for the PR! This fixes an issue that has been reported several times 
by users.
IIRC, the outcome of the discussion on the dev ML was to remove the Scala 
dependency from `flink-java` before adding the suffix. 

So we have to resolve FLINK-2972 first (one way or the other). Otherwise, 
we would update the `flink-java` SNAPSHOT artifacts twice and cause quite a bit 
trouble for our bleeding-edge users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3122) Generalize value type in LabelPropagation

2016-01-20 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-3122:
-
Fix Version/s: 1.0.0

> Generalize value type in LabelPropagation
> -
>
> Key: FLINK-3122
> URL: https://issues.apache.org/jira/browse/FLINK-3122
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.10.0, 0.10.1
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
> Fix For: 1.0.0
>
>
> Currently, {{LabelPropagation}} expects a {{Long}} value as label. This can 
> be generalized to anything which extends {{Comparable}}. The issue depends on 
> FLINK-3118 to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3122) Generalize value type in LabelPropagation

2016-01-20 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri updated FLINK-3122:
-
Affects Version/s: (was: 1.0.0)
   0.10.0
   0.10.1

> Generalize value type in LabelPropagation
> -
>
> Key: FLINK-3122
> URL: https://issues.apache.org/jira/browse/FLINK-3122
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.10.0, 0.10.1
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
> Fix For: 1.0.0
>
>
> Currently, {{LabelPropagation}} expects a {{Long}} value as label. This can 
> be generalized to anything which extends {{Comparable}}. The issue depends on 
> FLINK-3118 to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3122) Generalize value type in LabelPropagation

2016-01-20 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-3122.
--
Resolution: Fixed

> Generalize value type in LabelPropagation
> -
>
> Key: FLINK-3122
> URL: https://issues.apache.org/jira/browse/FLINK-3122
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 0.10.0, 0.10.1
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
> Fix For: 1.0.0
>
>
> Currently, {{LabelPropagation}} expects a {{Long}} value as label. This can 
> be generalized to anything which extends {{Comparable}}. The issue depends on 
> FLINK-3118 to be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Tidied up some docs. Fixed some non-idiomatic ...

2016-01-20 Thread sksamuel
GitHub user sksamuel opened a pull request:

https://github.com/apache/flink/pull/1531

Tidied up some docs. Fixed some non-idiomatic usage of return in scal…

…a code.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sksamuel/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1531.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1531


commit 3362d20ed583a19a65f96f99b8d7b913e4de0b8b
Author: sksamuel 
Date:   2016-01-20T23:26:02Z

Tidied up some docs. Fixed some non-idiomatic usage of return in scala code.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2315) Hadoop Writables cannot exploit implementing NormalizableKey

2016-01-20 Thread Martin Junghanns (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108203#comment-15108203
 ] 

Martin Junghanns commented on FLINK-2315:
-

Hi [~StephanEwen], I checked my 'NormalizableKey & Writable' implementation [1] 
against a more complex program involving sorts [2]. The methods actually get 
called, so this issue is not valid anymore. And thanks for the clarification of 
raw bytes vs. normalizable keys, I will update our code accordingly!

[1] 
https://github.com/dbs-leipzig/gradoop/blob/master/gradoop-core/src/main/java/org/gradoop/model/impl/id/GradoopId.java
[2] 
https://github.com/dbs-leipzig/gradoop/blob/master/gradoop-examples/src/main/java/org/gradoop/examples/SocialNetworkExample2.java#L137-L194

> Hadoop Writables cannot exploit implementing NormalizableKey
> 
>
> Key: FLINK-2315
> URL: https://issues.apache.org/jira/browse/FLINK-2315
> Project: Flink
>  Issue Type: Bug
>  Components: DataSet API
>Affects Versions: 0.9
>Reporter: Stephan Ewen
> Fix For: 1.0.0
>
>
> When one implements a type that extends {{hadoop.io.Writable}} and it 
> implements {{NormalizableKey}}, this is still not exploited.
> The Writable comparator fails to recognize that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3263) Log task statistics on TaskManager

2016-01-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108258#comment-15108258
 ] 

Stephan Ewen commented on FLINK-3263:
-

What approach would you use to get these stats per thread?

> Log task statistics on TaskManager
> --
>
> Key: FLINK-3263
> URL: https://issues.apache.org/jira/browse/FLINK-3263
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> Similar to how memory statistics can be written to the TaskMangers' log files 
> by configuring {{taskmanager.debug.memory.startLogThread}} and 
> {{taskmanager.debug.memory.logIntervalMs}}, it would be useful to have 
> statistics written for each task within a job.
> One use case is to reconstruct progress to analyze why TaskManagers take 
> different amounts of time to process the same quantity of data.
> I envision this being the same statistics which are displayed on the web 
> frontend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2021) Rework examples to use new ParameterTool

2016-01-20 Thread Stefano Baghino (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108268#comment-15108268
 ] 

Stefano Baghino commented on FLINK-2021:


I've noticed that the debate took place in the ML with a positive response but 
that no work has been done on this issue so far. Would it be ok if I take care 
of this issue?

> Rework examples to use new ParameterTool
> 
>
> Key: FLINK-2021
> URL: https://issues.apache.org/jira/browse/FLINK-2021
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Minor
>  Labels: starter
>
> In FLINK-1525, we introduced the {{ParameterTool}}.
> We should port the examples to use the tool.
> The examples could look like this (we should maybe discuss it first on the 
> mailing lists):
> {code}
> public static void main(String[] args) throws Exception {
> ParameterTool pt = ParameterTool.fromArgs(args);
> boolean fileOutput = pt.getNumberOfParameters() == 2;
> String textPath = null;
> String outputPath = null;
> if(fileOutput) {
> textPath = pt.getRequired("input");
> outputPath = pt.getRequired("output");
> }
> // set up the execution environment
> final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> env.getConfig().setUserConfig(pt);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2871] support outer join for hash join ...

2016-01-20 Thread ChengXiangLi
Github user ChengXiangLi commented on a diff in the pull request:

https://github.com/apache/flink/pull/1469#discussion_r50232198
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/hash/MutableHashTable.java
 ---
@@ -1558,7 +1641,209 @@ public void reset() {
}
 
} // end HashBucketIterator
+   
+   /**
+* Iterate all the elements in memory which has not been probed during 
probe phase.
+*/
+   public static class UnmatchedBuildIterator implements 
MutableObjectIterator {
+   
+   private final TypeSerializer accessor;
+   
+   private final long totalBucketNumber;
+   
+   private final int bucketsPerSegmentBits;
+   
+   private final int bucketsPerSegmentMask;
+   
+   private final MemorySegment[] buckets;
+   
+   private final ArrayList> 
partitionsBeingBuilt;
+   
+   private final BitSet probedSet;
+   
+   private MemorySegment bucket;
+   
+   private MemorySegment[] overflowSegments;
+   
+   private HashPartition partition;
+   
+   private int scanCount;
+   
+   private int bucketInSegmentOffset;
+   
+   private int countInSegment;
+   
+   private int numInSegment;
+   
+   UnmatchedBuildIterator(
+   TypeSerializer accessor,
+   long totalBucketNumber,
+   int bucketsPerSegmentBits,
+   int bucketsPerSegmentMask,
+   MemorySegment[] buckets,
+   ArrayList> partitionsBeingBuilt,
+   BitSet probedSet) {
+   
+   this.accessor = accessor;
+   this.totalBucketNumber = totalBucketNumber;
+   this.bucketsPerSegmentBits = bucketsPerSegmentBits;
+   this.bucketsPerSegmentMask = bucketsPerSegmentMask;
+   this.buckets = buckets;
+   this.partitionsBeingBuilt = partitionsBeingBuilt;
+   this.probedSet = probedSet;
+   init();
+   }
+   
+   private void init() {
+   scanCount = -1;
+   while (!moveToNextBucket()) {
+   if (scanCount >= totalBucketNumber) {
+   break;
+   }
+   }
+   }
+   
+   public BT next(BT reuse) {
+   while (true) {
+   BT result = nextInBucket(reuse);
+   if (result == null) {
+   while (!moveToNextBucket()) {
+   if (scanCount >= 
totalBucketNumber) {
+   return null;
+   }
+   }
+   } else {
+   return result;
+   }
+   }
+   }
+   
+   public BT next() {
+   while (true) {
+   BT result = nextInBucket();
+   if (result == null) {
+   while (!moveToNextBucket()) {
+   if (scanCount >= 
totalBucketNumber) {
+   return null;
+   }
+   }
+   } else {
+   return result;
+   }
+   }
+   }
+   
+   private boolean moveToNextBucket() {
--- End diff --

Next bucket may be spilled on disk, so we need a loop here to make sure we 
move to next on-heap bucket.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2871) Add OuterJoin strategy with HashTable on outer side

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108291#comment-15108291
 ] 

ASF GitHub Bot commented on FLINK-2871:
---

Github user ChengXiangLi commented on a diff in the pull request:

https://github.com/apache/flink/pull/1469#discussion_r50232198
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/hash/MutableHashTable.java
 ---
@@ -1558,7 +1641,209 @@ public void reset() {
}
 
} // end HashBucketIterator
+   
+   /**
+* Iterate all the elements in memory which has not been probed during 
probe phase.
+*/
+   public static class UnmatchedBuildIterator implements 
MutableObjectIterator {
+   
+   private final TypeSerializer accessor;
+   
+   private final long totalBucketNumber;
+   
+   private final int bucketsPerSegmentBits;
+   
+   private final int bucketsPerSegmentMask;
+   
+   private final MemorySegment[] buckets;
+   
+   private final ArrayList> 
partitionsBeingBuilt;
+   
+   private final BitSet probedSet;
+   
+   private MemorySegment bucket;
+   
+   private MemorySegment[] overflowSegments;
+   
+   private HashPartition partition;
+   
+   private int scanCount;
+   
+   private int bucketInSegmentOffset;
+   
+   private int countInSegment;
+   
+   private int numInSegment;
+   
+   UnmatchedBuildIterator(
+   TypeSerializer accessor,
+   long totalBucketNumber,
+   int bucketsPerSegmentBits,
+   int bucketsPerSegmentMask,
+   MemorySegment[] buckets,
+   ArrayList> partitionsBeingBuilt,
+   BitSet probedSet) {
+   
+   this.accessor = accessor;
+   this.totalBucketNumber = totalBucketNumber;
+   this.bucketsPerSegmentBits = bucketsPerSegmentBits;
+   this.bucketsPerSegmentMask = bucketsPerSegmentMask;
+   this.buckets = buckets;
+   this.partitionsBeingBuilt = partitionsBeingBuilt;
+   this.probedSet = probedSet;
+   init();
+   }
+   
+   private void init() {
+   scanCount = -1;
+   while (!moveToNextBucket()) {
+   if (scanCount >= totalBucketNumber) {
+   break;
+   }
+   }
+   }
+   
+   public BT next(BT reuse) {
+   while (true) {
+   BT result = nextInBucket(reuse);
+   if (result == null) {
+   while (!moveToNextBucket()) {
+   if (scanCount >= 
totalBucketNumber) {
+   return null;
+   }
+   }
+   } else {
+   return result;
+   }
+   }
+   }
+   
+   public BT next() {
+   while (true) {
+   BT result = nextInBucket();
+   if (result == null) {
+   while (!moveToNextBucket()) {
+   if (scanCount >= 
totalBucketNumber) {
+   return null;
+   }
+   }
+   } else {
+   return result;
+   }
+   }
+   }
+   
+   private boolean moveToNextBucket() {
--- End diff --

Next bucket may be spilled on disk, so we need a loop here to make sure we 
move to next on-heap bucket.


> Add OuterJoin strategy with HashTable on outer side
> ---
>
> Key: FLINK-2871
> URL: https://issues.apache.org/jira/browse/FLINK-2871
> Project: Flink
>  Issue Type: New Feature
>  Components: Local 

[GitHub] flink pull request: [FLINK-3262][web-dashboard] Remove fuzzy versi...

2016-01-20 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1525#issuecomment-173135590
  
Changes look good.

Did you check that the UI works properly (no glitches due to the changes in 
dependencies)?

If all works well, +1 to merge this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3262) Remove fuzzy versioning from Bower dependencies

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108251#comment-15108251
 ] 

ASF GitHub Bot commented on FLINK-3262:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1525#issuecomment-173135590
  
Changes look good.

Did you check that the UI works properly (no glitches due to the changes in 
dependencies)?

If all works well, +1 to merge this


> Remove fuzzy versioning from Bower dependencies
> ---
>
> Key: FLINK-3262
> URL: https://issues.apache.org/jira/browse/FLINK-3262
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Affects Versions: 1.00
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
>
> {{bower.json}} is currently defined with fuzzy versions, i.e. {{"bootstrap": 
> "~3.3.5"}}, which silently pull in patch updates. When a user compiles the 
> web frontend the new versions are creating changes in the compiled Javascript 
> and CSS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3063) [py] Remove combiner

2016-01-20 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-3063.
---
Resolution: Fixed

> [py] Remove combiner
> 
>
> Key: FLINK-3063
> URL: https://issues.apache.org/jira/browse/FLINK-3063
> Project: Flink
>  Issue Type: Improvement
>  Components: Python API
>Affects Versions: 0.10.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.0.0
>
>
> The current combiner implementation in the PythonAPI is quite a mess. It adds 
> a lot of unreadable clutter, is inefficient at times, and can straight up 
> break in some edge cases.
> I will revisit this feature after FLINK-2501 is resolved. Several changes for 
> that issue will make the reimplementation easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3063) [py] Remove combiner

2016-01-20 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108264#comment-15108264
 ] 

Chesnay Schepler commented on FLINK-3063:
-

Resolved in 54b52c9be66a981360fed121fcfd7f1be28ac1cb

> [py] Remove combiner
> 
>
> Key: FLINK-3063
> URL: https://issues.apache.org/jira/browse/FLINK-3063
> Project: Flink
>  Issue Type: Improvement
>  Components: Python API
>Affects Versions: 0.10.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.0.0
>
>
> The current combiner implementation in the PythonAPI is quite a mess. It adds 
> a lot of unreadable clutter, is inefficient at times, and can straight up 
> break in some edge cases.
> I will revisit this feature after FLINK-2501 is resolved. Several changes for 
> that issue will make the reimplementation easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3261) Tasks should eagerly report back when they cannot start a checkpoint

2016-01-20 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reassigned FLINK-3261:
---

Assignee: Aljoscha Krettek

> Tasks should eagerly report back when they cannot start a checkpoint
> 
>
> Key: FLINK-3261
> URL: https://issues.apache.org/jira/browse/FLINK-3261
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Affects Versions: 0.10.1
>Reporter: Stephan Ewen
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.0.0
>
>
> With very fast checkpoint intervals (few 100 msecs), it can happen that a 
> Task is not ready to start a checkpoint by the time it gets the first 
> checkpoint trigger message.
> If some other tasks are ready already and commence a checkpoint, the stream 
> alignment will make the non-participating task wait until the checkpoint 
> expires (default: 10 minutes).
> A simple way to fix this is that tasks report back when they could not start 
> a checkpoint. The checkpoint coordinator can then abort that checkpoint and 
> unblock the streams by starting new checkpoint (where all tasks will 
> participate).
> An optimization would be to send a special "abort checkpoints barrier" that 
> tells the barrier buffers for stream alignment to unblock a checkpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2871] support outer join for hash join ...

2016-01-20 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/1469#issuecomment-173151704
  
I did simple regression test based on `HashVsSortMiniBenchmark`, the result 
looks like:

Test | Before | After
-- | -- | 
testBuildFirst | 6.63s | 6.65s
testBuildSecond | 3.7s | 3.8s

The inner join performance is not influenced by this PR, which fit into my 
expectation. There is a flag called `buildsideOuterJoin` in `MutableHashTable`, 
all the extra effort only happens while `buildSideOuterJoin` is true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2871) Add OuterJoin strategy with HashTable on outer side

2016-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108313#comment-15108313
 ] 

ASF GitHub Bot commented on FLINK-2871:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/1469#issuecomment-173151704
  
I did simple regression test based on `HashVsSortMiniBenchmark`, the result 
looks like:

Test | Before | After
-- | -- | 
testBuildFirst | 6.63s | 6.65s
testBuildSecond | 3.7s | 3.8s

The inner join performance is not influenced by this PR, which fit into my 
expectation. There is a flag called `buildsideOuterJoin` in `MutableHashTable`, 
all the extra effort only happens while `buildSideOuterJoin` is true.


> Add OuterJoin strategy with HashTable on outer side
> ---
>
> Key: FLINK-2871
> URL: https://issues.apache.org/jira/browse/FLINK-2871
> Project: Flink
>  Issue Type: New Feature
>  Components: Local Runtime, Optimizer
>Affects Versions: 0.10.0
>Reporter: Fabian Hueske
>Assignee: Chengxiang Li
>Priority: Minor
>
> Outer joins are currently supported with two local execution strategies:
> - sort-merge join
> - hash join where the hash table is built on the inner side. Hence, this 
> strategy is only supported for left and right outer joins.
> In order to support hash-tables on the outer side, we need a special hash 
> table implementation that gives access to all records which have not been 
> accessed during the probe phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3264) Add load shedding policy into Kafka Consumers

2016-01-20 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-3264:
-

 Summary: Add load shedding policy into Kafka Consumers
 Key: FLINK-3264
 URL: https://issues.apache.org/jira/browse/FLINK-3264
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector
Reporter: Robert Metzger


There are situations when Flink's Kafka Consumer is not able to consume 
everything produced into a topic, for example when one Flink instance is 
subscribed to a busy Kafka topic (See user request: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Frequent-exceptions-killing-streaming-job-td4323.html
 )

I think we should allow users to control the behavior of the Kafka consumer in 
those situations.

I had an offline discussion with [~StephanEwen] about this and we think that 
the allowing users to pass a LoadSheddingPolicy to the KafkaConsumer would be 
the best solution.
In the policy, users can define a frequency for the consumer to request the 
latest offsets in the subscribed partitions (the requests can either be based 
on time (every n ms) or on record count (every n'th record). Then, the policy 
can decide to skip a certain amount of offsets (maybe even set to the latest 
offset).
With the "offset skipping" approach, we'll avoid fetching records we can not 
process anyways.

In the 0.9 consumer, there doesn't seem to be an API for requesting the latest 
offset of a topicPartition. I'll ask on the Kafka ML whats the status there.
With {{seek()}} we can fetch from any offset.

In the 0.8 SimpleConsumer, there is a method for requesting the offsets:
{code}
kafka.javaapi.OffsetRequest request = new 
kafka.javaapi.OffsetRequest(requestInfo, 
kafka.api.OffsetRequest.CurrentVersion(), consumer.clientId());
OffsetResponse response = 
consumer.getOffsetsBefore(request);
{code}
The fetch offset is controlled within the {{LegacyFetcher}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2021) Rework examples to use ParameterTool

2016-01-20 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-2021:
--
Summary: Rework examples to use ParameterTool  (was: Rework examples to use 
new ParameterTool)

> Rework examples to use ParameterTool
> 
>
> Key: FLINK-2021
> URL: https://issues.apache.org/jira/browse/FLINK-2021
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Minor
>  Labels: starter
>
> In FLINK-1525, we introduced the {{ParameterTool}}.
> We should port the examples to use the tool.
> The examples could look like this (we should maybe discuss it first on the 
> mailing lists):
> {code}
> public static void main(String[] args) throws Exception {
> ParameterTool pt = ParameterTool.fromArgs(args);
> boolean fileOutput = pt.getNumberOfParameters() == 2;
> String textPath = null;
> String outputPath = null;
> if(fileOutput) {
> textPath = pt.getRequired("input");
> outputPath = pt.getRequired("output");
> }
> // set up the execution environment
> final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> env.getConfig().setUserConfig(pt);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2021) Rework examples to use ParameterTool

2016-01-20 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108398#comment-15108398
 ] 

Robert Metzger commented on FLINK-2021:
---

Hi Stefano,

cool. I think nobody is currently working on this issue.
Some thoughts on this: 
I think that using the ParameterTool will improve the readability of the 
examples. In the beginning I was a bit hesitant with pushing the ParameterTool 
into all examples because I wasn't sure if users would like it. But I've seen 
many happy users and a wide adoption of it.

We recently added RequiredParameters into Flink as well. They'll also allow you 
to print a help text. Maybe it makes sense to use the RequiredParameters in the 
examples.

How about you introduce the tool into one or two examples, we review the change 
together and then you apply it to all examples?

Also note, that we need to carefully check and rework the examples in the 
documentation to be in sync after the change.

> Rework examples to use ParameterTool
> 
>
> Key: FLINK-2021
> URL: https://issues.apache.org/jira/browse/FLINK-2021
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Minor
>  Labels: starter
>
> In FLINK-1525, we introduced the {{ParameterTool}}.
> We should port the examples to use the tool.
> The examples could look like this (we should maybe discuss it first on the 
> mailing lists):
> {code}
> public static void main(String[] args) throws Exception {
> ParameterTool pt = ParameterTool.fromArgs(args);
> boolean fileOutput = pt.getNumberOfParameters() == 2;
> String textPath = null;
> String outputPath = null;
> if(fileOutput) {
> textPath = pt.getRequired("input");
> outputPath = pt.getRequired("output");
> }
> // set up the execution environment
> final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> env.getConfig().setUserConfig(pt);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >