[jira] [Created] (FLINK-2520) StreamFaultToleranceTestBase does not allow for multiple tests

2015-08-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2520:
---

 Summary: StreamFaultToleranceTestBase does not allow for multiple 
tests
 Key: FLINK-2520
 URL: https://issues.apache.org/jira/browse/FLINK-2520
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 0.10
Reporter: Stephan Ewen
Priority: Minor


It would be good to implement more tests into the same class (or user at least 
the same driver class), in order to re-use mini clusters and reducer overall 
test times.

The {{StreamFaultToleranceTestBase}} does not support that.
I think the pattern where the {{@Test}} methods are in a base class, and the 
actual tests only implement hook methods is not a good pattern.

The batch API tests used this initially and we abandoned it everywhere for a 
more flexible implementation where the base class only sets up the. 
cluster/environment and the {{@Test}} methods go int the subclass.

The {{MultipleProgramsTestBase}} is a good example of a flexible test base.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2517] Minor fix to streaming guide

2015-08-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1013


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-2517) Wrong KafkaSink arguments in streaming guide

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2517.
-
   Resolution: Fixed
Fix Version/s: 0.10

Fixed via 51872d73b83a86603fd1f2e8e481d3cceb755e38

Thank you for the contribution!

 Wrong KafkaSink arguments in streaming guide
 

 Key: FLINK-2517
 URL: https://issues.apache.org/jira/browse/FLINK-2517
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Nezih Yigitbasi
Priority: Trivial
 Fix For: 0.10


 For the {{KafkaSink}} example in the streaming guide, the doc says zookeeper 
 host/port should be specified in the constructor. But it should be the list 
 of brokers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2519) BarrierBuffers get stuck infinitely when some inputs end early

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2519.
---

 BarrierBuffers get stuck infinitely when some inputs end early
 --

 Key: FLINK-2519
 URL: https://issues.apache.org/jira/browse/FLINK-2519
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 0.10


 When some sources exit early, the barrier buffer may start an alignment, but 
 will never complete it, because some inputs never deliver a checkpoint 
 barrier.
 [FLINK-2515] mitigates most of the problem, by not starting checkpoints any 
 more when some sources already finished. When a checkpoint is started while a 
 source is finishing, the barrier buffer may still align infinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2519) BarrierBuffers get stuck infinitely when some inputs end early

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2519.
-
Resolution: Fixed

Fixed via e1d1bd0a224b32f6a488e400f5f07e4ab4b65869

 BarrierBuffers get stuck infinitely when some inputs end early
 --

 Key: FLINK-2519
 URL: https://issues.apache.org/jira/browse/FLINK-2519
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 0.10


 When some sources exit early, the barrier buffer may start an alignment, but 
 will never complete it, because some inputs never deliver a checkpoint 
 barrier.
 [FLINK-2515] mitigates most of the problem, by not starting checkpoints any 
 more when some sources already finished. When a checkpoint is started while a 
 source is finishing, the barrier buffer may still align infinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2517) Wrong KafkaSink arguments in streaming guide

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696902#comment-14696902
 ] 

ASF GitHub Bot commented on FLINK-2517:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1013


 Wrong KafkaSink arguments in streaming guide
 

 Key: FLINK-2517
 URL: https://issues.apache.org/jira/browse/FLINK-2517
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Nezih Yigitbasi
Priority: Trivial
 Fix For: 0.10


 For the {{KafkaSink}} example in the streaming guide, the doc says zookeeper 
 host/port should be specified in the constructor. But it should be the list 
 of brokers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2515) CheckpointCoordinator triggers checkpoints even if not all sources are running any more

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2515.
-
Resolution: Fixed
  Assignee: Stephan Ewen

Fixed via 06e2da352fb63f7922f634e6aaf5381d89de57a5

 CheckpointCoordinator triggers checkpoints even if not all sources are 
 running any more
 ---

 Key: FLINK-2515
 URL: https://issues.apache.org/jira/browse/FLINK-2515
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 0.10


 When some sources finish early, they will not emit checkpoint barriers any 
 more. That means that pending checkpoint alignments will never be able to 
 complete, locking the flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2515) CheckpointCoordinator triggers checkpoints even if not all sources are running any more

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2515.
---

 CheckpointCoordinator triggers checkpoints even if not all sources are 
 running any more
 ---

 Key: FLINK-2515
 URL: https://issues.apache.org/jira/browse/FLINK-2515
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 0.10


 When some sources finish early, they will not emit checkpoint barriers any 
 more. That means that pending checkpoint alignments will never be able to 
 complete, locking the flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2516) Remove unwanted log.isInfoEnabled check

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696922#comment-14696922
 ] 

ASF GitHub Bot commented on FLINK-2516:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1012#issuecomment-131085420
  
I follow @StephanEwen. Because computing memory usage statistics is 
expensive, we need to check log level.


 Remove unwanted log.isInfoEnabled check
 ---

 Key: FLINK-2516
 URL: https://issues.apache.org/jira/browse/FLINK-2516
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.8.1
Reporter: fangfengbin
Assignee: fangfengbin
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2477][Add]Add tests for SocketClientSin...

2015-08-14 Thread HuangWHWHW
Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/977#issuecomment-131001646
  
@StephanEwen 
Hi, I did some changes few days ago.
Could you take a look again?
Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2477) Add test for SocketClientSink

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696622#comment-14696622
 ] 

ASF GitHub Bot commented on FLINK-2477:
---

Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/977#issuecomment-131001646
  
@StephanEwen 
Hi, I did some changes few days ago.
Could you take a look again?
Thank you!


 Add test for SocketClientSink
 -

 Key: FLINK-2477
 URL: https://issues.apache.org/jira/browse/FLINK-2477
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.10
 Environment: win7 32bit;linux
Reporter: Huang Wei
Priority: Minor
 Fix For: 0.10

   Original Estimate: 168h
  Remaining Estimate: 168h

 Add some tests for SocketClientSink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696619#comment-14696619
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-131001106
  
Thanks for the review, @thvasilo , fixed all comments in latest commit.


 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative or exact size of the sample, set a seed for 
 reproducibility, and support sampling within iterations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-08-14 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-131001106
  
Thanks for the review, @thvasilo , fixed all comments in latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-08-14 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/949#discussion_r37054584
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/operators/SampleITCase.scala
 ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala.operators
+
+import java.util.{List = JavaList, Random}
+
+import org.apache.flink.api.scala._
+import org.apache.flink.api.scala.util.CollectionDataSets
+import 
org.apache.flink.test.util.MultipleProgramsTestBase.TestExecutionMode
+import org.apache.flink.test.util.{MultipleProgramsTestBase, TestBaseUtils}
+import org.junit.Assert._
+import org.junit.runner.RunWith
+import org.junit.runners.Parameterized
+import org.junit.{Before, After, Test}
+
+import scala.collection.JavaConverters._
+
+@RunWith(classOf[Parameterized])
+class SampleITCase(mode: TestExecutionMode) extends 
MultipleProgramsTestBase(mode) {
+  private val RNG: Random = new Random
+
+  private var result: JavaList[String] = null;
+
+  @Before
+  def initiate {
+ExecutionEnvironment.getExecutionEnvironment.setParallelism(5)
+  }
+
+  @After
+  def after() = {
+TestBaseUtils.containsResultAsText(result, getSourceStrings)
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testSamplerWithFractionWithoutReplacement {
+verifySamplerWithFractionWithoutReplacement(0d)
+verifySamplerWithFractionWithoutReplacement(0.2d)
+verifySamplerWithFractionWithoutReplacement(1.0d)
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testSamplerWithFractionWithReplacement {
+verifySamplerWithFractionWithReplacement(0d)
+verifySamplerWithFractionWithReplacement(0.2d)
+verifySamplerWithFractionWithReplacement(1.0d)
+verifySamplerWithFractionWithReplacement(2.0d)
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testSamplerWithSizeWithoutReplacement {
+verifySamplerWithFixedSizeWithoutReplacement(0)
+verifySamplerWithFixedSizeWithoutReplacement(2)
+verifySamplerWithFixedSizeWithoutReplacement(21)
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testSamplerWithSizeWithReplacement {
+verifySamplerWithFixedSizeWithReplacement(0)
+verifySamplerWithFixedSizeWithReplacement(2)
+verifySamplerWithFixedSizeWithReplacement(21)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFractionWithoutReplacement(fraction: 
Double) {
+verifySamplerWithFractionWithoutReplacement(fraction, RNG.nextLong)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFractionWithoutReplacement(fraction: 
Double, seed: Long) {
+verifySamplerWithFraction(false, fraction, seed)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFractionWithReplacement(fraction: Double) {
+verifySamplerWithFractionWithReplacement(fraction, RNG.nextLong)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFractionWithReplacement(fraction: Double, 
seed: Long) {
+verifySamplerWithFraction(true, fraction, seed)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFraction(withReplacement: Boolean, 
fraction: Double, seed: Long) {
+val ds = getSourceDataSet()
+val sampled = ds.sample(withReplacement, fraction, seed)
+result = sampled.collect.asJava
--- End diff --

Thanks, I missed that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696632#comment-14696632
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/949#discussion_r37054584
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/operators/SampleITCase.scala
 ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala.operators
+
+import java.util.{List = JavaList, Random}
+
+import org.apache.flink.api.scala._
+import org.apache.flink.api.scala.util.CollectionDataSets
+import 
org.apache.flink.test.util.MultipleProgramsTestBase.TestExecutionMode
+import org.apache.flink.test.util.{MultipleProgramsTestBase, TestBaseUtils}
+import org.junit.Assert._
+import org.junit.runner.RunWith
+import org.junit.runners.Parameterized
+import org.junit.{Before, After, Test}
+
+import scala.collection.JavaConverters._
+
+@RunWith(classOf[Parameterized])
+class SampleITCase(mode: TestExecutionMode) extends 
MultipleProgramsTestBase(mode) {
+  private val RNG: Random = new Random
+
+  private var result: JavaList[String] = null;
+
+  @Before
+  def initiate {
+ExecutionEnvironment.getExecutionEnvironment.setParallelism(5)
+  }
+
+  @After
+  def after() = {
+TestBaseUtils.containsResultAsText(result, getSourceStrings)
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testSamplerWithFractionWithoutReplacement {
+verifySamplerWithFractionWithoutReplacement(0d)
+verifySamplerWithFractionWithoutReplacement(0.2d)
+verifySamplerWithFractionWithoutReplacement(1.0d)
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testSamplerWithFractionWithReplacement {
+verifySamplerWithFractionWithReplacement(0d)
+verifySamplerWithFractionWithReplacement(0.2d)
+verifySamplerWithFractionWithReplacement(1.0d)
+verifySamplerWithFractionWithReplacement(2.0d)
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testSamplerWithSizeWithoutReplacement {
+verifySamplerWithFixedSizeWithoutReplacement(0)
+verifySamplerWithFixedSizeWithoutReplacement(2)
+verifySamplerWithFixedSizeWithoutReplacement(21)
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testSamplerWithSizeWithReplacement {
+verifySamplerWithFixedSizeWithReplacement(0)
+verifySamplerWithFixedSizeWithReplacement(2)
+verifySamplerWithFixedSizeWithReplacement(21)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFractionWithoutReplacement(fraction: 
Double) {
+verifySamplerWithFractionWithoutReplacement(fraction, RNG.nextLong)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFractionWithoutReplacement(fraction: 
Double, seed: Long) {
+verifySamplerWithFraction(false, fraction, seed)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFractionWithReplacement(fraction: Double) {
+verifySamplerWithFractionWithReplacement(fraction, RNG.nextLong)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFractionWithReplacement(fraction: Double, 
seed: Long) {
+verifySamplerWithFraction(true, fraction, seed)
+  }
+
+  @throws(classOf[Exception])
+  private def verifySamplerWithFraction(withReplacement: Boolean, 
fraction: Double, seed: Long) {
+val ds = getSourceDataSet()
+val sampled = ds.sample(withReplacement, fraction, seed)
+result = sampled.collect.asJava
--- End diff --

Thanks, I missed that.


 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
  

[jira] [Commented] (FLINK-2480) Improving tests coverage for org.apache.flink.streaming.api

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696603#comment-14696603
 ] 

ASF GitHub Bot commented on FLINK-2480:
---

Github user HuangWHWHW commented on a diff in the pull request:

https://github.com/apache/flink/pull/991#discussion_r37053652
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/functions/PrintSinkFunctionTest.java
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.functions;
+
+import org.apache.flink.api.common.ExecutionConfig;
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.api.common.accumulators.Accumulator;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.runtime.accumulators.AccumulatorRegistry;
+import org.apache.flink.runtime.broadcast.BroadcastVariableManager;
+import org.apache.flink.runtime.execution.Environment;
+import org.apache.flink.runtime.executiongraph.ExecutionAttemptID;
+import org.apache.flink.runtime.io.disk.iomanager.IOManager;
+import 
org.apache.flink.runtime.io.network.api.writer.ResultPartitionWriter;
+import org.apache.flink.runtime.io.network.partition.consumer.InputGate;
+import org.apache.flink.runtime.jobgraph.JobVertexID;
+import org.apache.flink.runtime.jobgraph.tasks.InputSplitProvider;
+import org.apache.flink.runtime.memorymanager.MemoryManager;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.taskmanager.TaskManagerRuntimeInfo;
+import org.apache.flink.streaming.api.functions.sink.PrintSinkFunction;
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext;
+
+import static org.junit.Assert.*;
+
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.concurrent.Future;
+
+/**
+ * Tests for the {@link 
org.apache.flink.streaming.api.functions.sink.PrintSinkFunction}.
+ */
+public class PrintSinkFunctionTestIN extends RichSinkFunctionIN {
+
+   private Environment envForPrefixNull = new Environment() {
+   @Override
+   public JobID getJobID() {
+   return null;
+   }
+
+   @Override
+   public JobVertexID getJobVertexId() {
+   return null;
+   }
+
+   @Override
+   public ExecutionAttemptID getExecutionId() {
+   return null;
+   }
+
+   @Override
+   public Configuration getTaskConfiguration() {
+   return null;
+   }
+
+   @Override
+   public TaskManagerRuntimeInfo getTaskManagerInfo() {
+   return null;
+   }
+
+   @Override
+   public Configuration getJobConfiguration() {
+   return null;
+   }
+
+   @Override
+   public int getNumberOfSubtasks() {
+   return 0;
+   }
+
+   @Override
+   public int getIndexInSubtaskGroup() {
+   return 0;
+   }
+
+   @Override
+   public InputSplitProvider getInputSplitProvider() {
+   return null;
+   }
+
+   @Override
+   public IOManager getIOManager() {
+   return null;
+   }
+
+   @Override
+   public MemoryManager getMemoryManager() {
+   return null;
+   }
+
+   @Override
+   public String getTaskName() {
+   return null;
+   }

[GitHub] flink pull request: [FLINK-2480][test]Add tests for PrintSinkFunct...

2015-08-14 Thread HuangWHWHW
Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/991#issuecomment-130999180
  
Hi, I have done a new changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2480][test]Add tests for PrintSinkFunct...

2015-08-14 Thread HuangWHWHW
Github user HuangWHWHW commented on a diff in the pull request:

https://github.com/apache/flink/pull/991#discussion_r37053652
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/functions/PrintSinkFunctionTest.java
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.functions;
+
+import org.apache.flink.api.common.ExecutionConfig;
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.api.common.accumulators.Accumulator;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.runtime.accumulators.AccumulatorRegistry;
+import org.apache.flink.runtime.broadcast.BroadcastVariableManager;
+import org.apache.flink.runtime.execution.Environment;
+import org.apache.flink.runtime.executiongraph.ExecutionAttemptID;
+import org.apache.flink.runtime.io.disk.iomanager.IOManager;
+import 
org.apache.flink.runtime.io.network.api.writer.ResultPartitionWriter;
+import org.apache.flink.runtime.io.network.partition.consumer.InputGate;
+import org.apache.flink.runtime.jobgraph.JobVertexID;
+import org.apache.flink.runtime.jobgraph.tasks.InputSplitProvider;
+import org.apache.flink.runtime.memorymanager.MemoryManager;
+import org.apache.flink.runtime.state.StateHandle;
+import org.apache.flink.runtime.taskmanager.TaskManagerRuntimeInfo;
+import org.apache.flink.streaming.api.functions.sink.PrintSinkFunction;
+import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
+import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext;
+
+import static org.junit.Assert.*;
+
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.concurrent.Future;
+
+/**
+ * Tests for the {@link 
org.apache.flink.streaming.api.functions.sink.PrintSinkFunction}.
+ */
+public class PrintSinkFunctionTestIN extends RichSinkFunctionIN {
+
+   private Environment envForPrefixNull = new Environment() {
+   @Override
+   public JobID getJobID() {
+   return null;
+   }
+
+   @Override
+   public JobVertexID getJobVertexId() {
+   return null;
+   }
+
+   @Override
+   public ExecutionAttemptID getExecutionId() {
+   return null;
+   }
+
+   @Override
+   public Configuration getTaskConfiguration() {
+   return null;
+   }
+
+   @Override
+   public TaskManagerRuntimeInfo getTaskManagerInfo() {
+   return null;
+   }
+
+   @Override
+   public Configuration getJobConfiguration() {
+   return null;
+   }
+
+   @Override
+   public int getNumberOfSubtasks() {
+   return 0;
+   }
+
+   @Override
+   public int getIndexInSubtaskGroup() {
+   return 0;
+   }
+
+   @Override
+   public InputSplitProvider getInputSplitProvider() {
+   return null;
+   }
+
+   @Override
+   public IOManager getIOManager() {
+   return null;
+   }
+
+   @Override
+   public MemoryManager getMemoryManager() {
+   return null;
+   }
+
+   @Override
+   public String getTaskName() {
+   return null;
+   }
+
+   @Override
+   public String getTaskNameWithSubtasks() {
+   return null;
+   }
+
+   @Override
+   public ClassLoader getUserClassLoader() {
+   

[jira] [Commented] (FLINK-2480) Improving tests coverage for org.apache.flink.streaming.api

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696607#comment-14696607
 ] 

ASF GitHub Bot commented on FLINK-2480:
---

Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/991#issuecomment-130999180
  
Hi, I have done a new changes.


 Improving tests coverage for org.apache.flink.streaming.api
 ---

 Key: FLINK-2480
 URL: https://issues.apache.org/jira/browse/FLINK-2480
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.10
Reporter: Huang Wei
 Fix For: 0.10

   Original Estimate: 504h
  Remaining Estimate: 504h

 The streaming API is quite a bit newer than the other code so it is not that 
 well covered with tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2490) Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696610#comment-14696610
 ] 

ASF GitHub Bot commented on FLINK-2490:
---

Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/992#issuecomment-130999638
  
Hi , I did a new change.


 Remove unwanted boolean check in function 
 SocketTextStreamFunction.streamFromSocket
 ---

 Key: FLINK-2490
 URL: https://issues.apache.org/jira/browse/FLINK-2490
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Huang Wei
Priority: Minor
 Fix For: 0.10

   Original Estimate: 168h
  Remaining Estimate: 168h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2490][FIX]Remove the retryForever check...

2015-08-14 Thread HuangWHWHW
Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/992#issuecomment-130999638
  
Hi , I did a new change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2521] [tests] Adds automatic test name ...

2015-08-14 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/1015

[FLINK-2521] [tests] Adds automatic test name and reason of failure logging 

Adds TestLogger class which automatically logs the currently executed test 
names and the reasons for a failure. The automatic logging is achieved by 
specifying a JUnit Rule which executes a `TestWatcher` for every executed test. 

This PR makes all test bases extend the TestLogger. For future tests which 
don't extend a test base, the test class should extend the TestLogger class to 
add automatic test name logging.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink testLogger

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1015


commit 6e27752dec68094c1f7498bebc2edd842f064daf
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-08-14T13:06:06Z

[FLINK-2521] [tests] Adds TestLogger class which automatically logs the 
currently executed test names and the reasons for a failure.

Makes test bases extend TestLogger to add automatic test name logging




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-08-14 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-131112793
  
I agree that exact k-NN does not scale well. However, we can use your PR as 
a baseline implementation to compare the approximate algorithm against.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697027#comment-14697027
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-131112793
  
I agree that exact k-NN does not scale well. However, we can use your PR as 
a baseline implementation to compare the approximate algorithm against.


 Add exact k-nearest-neighbours algorithm to machine learning library
 

 Key: FLINK-1745
 URL: https://issues.apache.org/jira/browse/FLINK-1745
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
  Labels: ML, Starter

 Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
 it is still used as a mean to classify data and to do regression. This issue 
 focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
 proposed in [2].
 Could be a starter task.
 Resources:
 [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
 [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2521) Add automatic test name logging for tests

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696979#comment-14696979
 ] 

ASF GitHub Bot commented on FLINK-2521:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/1015

[FLINK-2521] [tests] Adds automatic test name and reason of failure logging 

Adds TestLogger class which automatically logs the currently executed test 
names and the reasons for a failure. The automatic logging is achieved by 
specifying a JUnit Rule which executes a `TestWatcher` for every executed test. 

This PR makes all test bases extend the TestLogger. For future tests which 
don't extend a test base, the test class should extend the TestLogger class to 
add automatic test name logging.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink testLogger

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1015


commit 6e27752dec68094c1f7498bebc2edd842f064daf
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-08-14T13:06:06Z

[FLINK-2521] [tests] Adds TestLogger class which automatically logs the 
currently executed test names and the reasons for a failure.

Makes test bases extend TestLogger to add automatic test name logging




 Add automatic test name logging for tests
 -

 Key: FLINK-2521
 URL: https://issues.apache.org/jira/browse/FLINK-2521
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Priority: Minor

 When running tests on travis the Flink components log to a file. This is 
 helpful in case of a failed test to retrieve the error. However, the log does 
 not contain the test name and the reason for the failure. Therefore it is 
 difficult to find the log output which corresponds to the failed test.
 It would be nice to automatically add the test case information to the log. 
 This would ease the debugging process big time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-08-14 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-131107449
  
+1 for closing this and focusing on approximate kNN instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697000#comment-14697000
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-131107449
  
+1 for closing this and focusing on approximate kNN instead.


 Add exact k-nearest-neighbours algorithm to machine learning library
 

 Key: FLINK-1745
 URL: https://issues.apache.org/jira/browse/FLINK-1745
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
  Labels: ML, Starter

 Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
 it is still used as a mean to classify data and to do regression. This issue 
 focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
 proposed in [2].
 Could be a starter task.
 Resources:
 [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
 [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Minor fix to streaming guide

2015-08-14 Thread nezihyigitbasi
GitHub user nezihyigitbasi opened a pull request:

https://github.com/apache/flink/pull/1013

Minor fix to streaming guide

For the `KafkaSink` example in the streaming guide, the doc says zookeeper 
host/port should be specified in the constructor. But it should be the list of 
brokers. This PR fixes that.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nezihyigitbasi/flink streaming-doc-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1013.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1013


commit df6a9886e1c1910775cb8c4d382d81297b147f5f
Author: Nezih Yigitbasi nyigitb...@netflix.com
Date:   2015-08-14T08:34:11Z

Minor fix to streaming guide




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Stale Synchronous Parallel Iterations

2015-08-14 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/967#issuecomment-131030134
  
Hi @nltran,

thanks for updating the PR! 
I don't have time for a thorough review now, but I'll answer your last 
question.
ASF homepage states: *[The ASF desires that all contributors of ideas, 
code, or documentation to the Apache projects complete, sign, and submit [...] 
an Individual Contributor License Agreement.](http://www.apache.org/licenses)*

So, yes it would be very nice, if you would sign and submit an ICLA. In 
case, your employer assigned you to work on an Apache project, also an CCLA is 
necessary (see link above). 
Thank you very much, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2517) Wrong KafkaSink arguments in streaming guide

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696759#comment-14696759
 ] 

ASF GitHub Bot commented on FLINK-2517:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1013#issuecomment-131045187
  
Ah, very nice!

Will merge this!


 Wrong KafkaSink arguments in streaming guide
 

 Key: FLINK-2517
 URL: https://issues.apache.org/jira/browse/FLINK-2517
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Nezih Yigitbasi
Priority: Trivial

 For the {{KafkaSink}} example in the streaming guide, the doc says zookeeper 
 host/port should be specified in the constructor. But it should be the list 
 of brokers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2517) Wrong KafkaSink arguments in streaming guide

2015-08-14 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696758#comment-14696758
 ] 

Stephan Ewen commented on FLINK-2517:
-

Thanks for the pointer!

[~rmetzger] and me are currently improving/reworking the Kafka connectors.

We'll fix the docs as part of that!

 Wrong KafkaSink arguments in streaming guide
 

 Key: FLINK-2517
 URL: https://issues.apache.org/jira/browse/FLINK-2517
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Nezih Yigitbasi
Priority: Trivial

 For the {{KafkaSink}} example in the streaming guide, the doc says zookeeper 
 host/port should be specified in the constructor. But it should be the list 
 of brokers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2517) Wrong KafkaSink arguments in streaming guide

2015-08-14 Thread Nezih Yigitbasi (JIRA)
Nezih Yigitbasi created FLINK-2517:
--

 Summary: Wrong KafkaSink arguments in streaming guide
 Key: FLINK-2517
 URL: https://issues.apache.org/jira/browse/FLINK-2517
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Nezih Yigitbasi
Priority: Trivial


For the {{KafkaSink}} example in the streaming guide, the doc says zookeeper 
host/port should be specified in the constructor. But it should be the list of 
brokers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697106#comment-14697106
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user kno10 commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-131123325
  
On low-dimensional data, exact kNN may be feasible using grid-based 
approaches even for very large data sets. It's not very sexy to implement 
this, but its also not very hard.
Also, a lot of users will be using data sets where pairwise distance 
computations is still possible (it's not as if everybody has exabyte vector 
data), so why deprive them of this option, even if it is too costly for others?
Last but not least, for evaluation purposes, exact kNN can be useful as a 
badeline, too.


 Add exact k-nearest-neighbours algorithm to machine learning library
 

 Key: FLINK-1745
 URL: https://issues.apache.org/jira/browse/FLINK-1745
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
  Labels: ML, Starter

 Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
 it is still used as a mean to classify data and to do regression. This issue 
 focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
 proposed in [2].
 Could be a starter task.
 Resources:
 [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
 [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2523) Increase interrupt timeout in Task Canceling

2015-08-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2523:
---

 Summary: Increase interrupt timeout in Task Canceling
 Key: FLINK-2523
 URL: https://issues.apache.org/jira/browse/FLINK-2523
 Project: Flink
  Issue Type: Improvement
  Components: TaskManager
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


When a task is canceled, the cancellation calls periodically interrupt() on 
the task thread, if the task thread does not cancel with a certain time.

Currently, this value is hard coded to 10 seconds. We should make that time 
configurable.

Until then, I would like to increase the value to 30 seconds, as many tasks 
(here I am observing it for Kafka consumers) can take longer then 10 seconds 
for proper cleanup.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2524) Add getTaskNameWithSubtasks() to RuntimeContext

2015-08-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2524:
---

 Summary: Add getTaskNameWithSubtasks() to RuntimeContext
 Key: FLINK-2524
 URL: https://issues.apache.org/jira/browse/FLINK-2524
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.10
Reporter: Stephan Ewen
 Fix For: 0.10


When printing information to logs or debug output, one frequently needs to 
identify the statement with the originating task (task name and which subtask).

In many places, the system and user code manually construct something like 
MyTask (2/7).

The {{RuntimeContext}} should offer this, because it is too frequently needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-08-14 Thread kno10
Github user kno10 commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-131123325
  
On low-dimensional data, exact kNN may be feasible using grid-based 
approaches even for very large data sets. It's not very sexy to implement 
this, but its also not very hard.
Also, a lot of users will be using data sets where pairwise distance 
computations is still possible (it's not as if everybody has exabyte vector 
data), so why deprive them of this option, even if it is too costly for others?
Last but not least, for evaluation purposes, exact kNN can be useful as a 
badeline, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2524) Add getTaskNameWithSubtasks() to RuntimeContext

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-2524:

Labels: easyfix starter  (was: )

 Add getTaskNameWithSubtasks() to RuntimeContext
 -

 Key: FLINK-2524
 URL: https://issues.apache.org/jira/browse/FLINK-2524
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.10
Reporter: Stephan Ewen
  Labels: easyfix, starter
 Fix For: 0.10


 When printing information to logs or debug output, one frequently needs to 
 identify the statement with the originating task (task name and which 
 subtask).
 In many places, the system and user code manually construct something like 
 MyTask (2/7).
 The {{RuntimeContext}} should offer this, because it is too frequently needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2393) Add a stateless at-least-once mode for streaming

2015-08-14 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697170#comment-14697170
 ] 

Stephan Ewen commented on FLINK-2393:
-

Added the documentation in 21a0c94baafd77297c8eb88367fc8caaac43d8ee

Docs describe the usage of the mode (in the streaming API docs) and the 
functionality (in the internals documentation)

Docs should be accessible on the website as soon as the CI bot builds the docs 
tonight.

 Add a stateless at-least-once mode for streaming
 --

 Key: FLINK-2393
 URL: https://issues.apache.org/jira/browse/FLINK-2393
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Currently, the checkpointing mechanism provides exactly once guarantees. 
 Part of that is the step that temporarily aligns the data streams. This 
 step increases the tuple latency temporarily.
 By offering a version that does not provide exactly-once, but only 
 at-least-once, we can avoid the latency increase. For super-low-latency 
 applications, that tolerate duplicates, this may be an interesting option.
 To realize that, we would use a slightly modified version of the 
 checkpointing algorithm. Effectively, the streams would not be aligned, but 
 tasks would only count the received barriers and emit their own barrier as 
 soon as the saw a barrier from all inputs.
 My feeling is that it makes not sense to implement state backups, when being 
 concerned with this super low latency. The mode would hence be a purely 
 stateless at-least-once mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2393) Add a stateless at-least-once mode for streaming

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2393.
-
Resolution: Fixed

Completed with the addition of the documentation in 
21a0c94baafd77297c8eb88367fc8caaac43d8ee

 Add a stateless at-least-once mode for streaming
 --

 Key: FLINK-2393
 URL: https://issues.apache.org/jira/browse/FLINK-2393
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Currently, the checkpointing mechanism provides exactly once guarantees. 
 Part of that is the step that temporarily aligns the data streams. This 
 step increases the tuple latency temporarily.
 By offering a version that does not provide exactly-once, but only 
 at-least-once, we can avoid the latency increase. For super-low-latency 
 applications, that tolerate duplicates, this may be an interesting option.
 To realize that, we would use a slightly modified version of the 
 checkpointing algorithm. Effectively, the streams would not be aligned, but 
 tasks would only count the received barriers and emit their own barrier as 
 soon as the saw a barrier from all inputs.
 My feeling is that it makes not sense to implement state backups, when being 
 concerned with this super low latency. The mode would hence be a purely 
 stateless at-least-once mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2393) Add a stateless at-least-once mode for streaming

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2393.
---

 Add a stateless at-least-once mode for streaming
 --

 Key: FLINK-2393
 URL: https://issues.apache.org/jira/browse/FLINK-2393
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Currently, the checkpointing mechanism provides exactly once guarantees. 
 Part of that is the step that temporarily aligns the data streams. This 
 step increases the tuple latency temporarily.
 By offering a version that does not provide exactly-once, but only 
 at-least-once, we can avoid the latency increase. For super-low-latency 
 applications, that tolerate duplicates, this may be an interesting option.
 To realize that, we would use a slightly modified version of the 
 checkpointing algorithm. Effectively, the streams would not be aligned, but 
 tasks would only count the received barriers and emit their own barrier as 
 soon as the saw a barrier from all inputs.
 My feeling is that it makes not sense to implement state backups, when being 
 concerned with this super low latency. The mode would hence be a purely 
 stateless at-least-once mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2477][Add]Add tests for SocketClientSin...

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/977#discussion_r37086314
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/functions/SocketClientSinkTest.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.functions;
+
+import java.io.IOException;
+import java.net.Socket;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.SocketClientSink;
+import org.apache.flink.streaming.util.serialization.SerializationSchema;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.junit.Assert.*;
+
+import java.io.BufferedReader;
+import java.io.InputStreamReader;
+import java.net.ServerSocket;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * Tests for the {@link 
org.apache.flink.streaming.api.functions.sink.SocketClientSink}.
+ */
+public class SocketClientSinkTest{
+
+   final AtomicReferenceThrowable error = new 
AtomicReferenceThrowable();
+   private final String host = 127.0.0.1;
+   private int port;
+   private String access;
+   private String value;
+   public SocketServer.ServerThread th;
+
+   public SocketClientSinkTest() {
+   }
+
+   class SocketServer extends Thread {
+
+   private ServerSocket server;
+   private Socket sk;
+   private BufferedReader rdr;
+
+   private SocketServer() {
+   try {
+   this.server = new ServerSocket(0);
+   port = server.getLocalPort();
+   } catch (Exception e) {
+   error.set(e);
+   }
+   }
+
+   public void run() {
+   try {
+   sk = server.accept();
+   access = Connected;
+   th = new ServerThread(sk);
+   th.start();
+   } catch (Exception e) {
+   error.set(e);
+   }
+   }
+
+   class ServerThread extends Thread {
+   Socket sk;
+
+   public ServerThread(Socket sk) {
+   this.sk = sk;
+   }
+
+   public void run() {
+   try {
+   rdr = new BufferedReader(new 
InputStreamReader(sk
+   .getInputStream()));
+   value = rdr.readLine();
+   } catch (IOException e) {
+   error.set(e);
+   }
+   }
+   }
+   }
+
+   @Test
+   public void testSocketSink() throws Exception{
+
+   SocketServer server = new SocketServer();
+   server.start();
+
+   SerializationSchemaString, byte[] simpleSchema = new 
SerializationSchemaString, byte[]() {
+   @Override
+   public byte[] serialize(String element) {
+   return element.getBytes();
+   }
+   };
+
+   SocketClientSinkString simpleSink = new 
SocketClientSinkString(host, port, simpleSchema);
+   simpleSink.open(new Configuration());
+   simpleSink.invoke(testSocketSinkInvoke);
+   simpleSink.close();
+   try {
+   server.join();
+   th.join();
+   }
+   catch (Exception e){
+   Assert.fail(e.getMessage());
+   }
+
+   if 

[jira] [Commented] (FLINK-2477) Add test for SocketClientSink

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697195#comment-14697195
 ] 

ASF GitHub Bot commented on FLINK-2477:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/977#discussion_r37086314
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/functions/SocketClientSinkTest.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.functions;
+
+import java.io.IOException;
+import java.net.Socket;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.SocketClientSink;
+import org.apache.flink.streaming.util.serialization.SerializationSchema;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.junit.Assert.*;
+
+import java.io.BufferedReader;
+import java.io.InputStreamReader;
+import java.net.ServerSocket;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * Tests for the {@link 
org.apache.flink.streaming.api.functions.sink.SocketClientSink}.
+ */
+public class SocketClientSinkTest{
+
+   final AtomicReferenceThrowable error = new 
AtomicReferenceThrowable();
+   private final String host = 127.0.0.1;
+   private int port;
+   private String access;
+   private String value;
+   public SocketServer.ServerThread th;
+
+   public SocketClientSinkTest() {
+   }
+
+   class SocketServer extends Thread {
+
+   private ServerSocket server;
+   private Socket sk;
+   private BufferedReader rdr;
+
+   private SocketServer() {
+   try {
+   this.server = new ServerSocket(0);
+   port = server.getLocalPort();
+   } catch (Exception e) {
+   error.set(e);
+   }
+   }
+
+   public void run() {
+   try {
+   sk = server.accept();
+   access = Connected;
+   th = new ServerThread(sk);
+   th.start();
+   } catch (Exception e) {
+   error.set(e);
+   }
+   }
+
+   class ServerThread extends Thread {
+   Socket sk;
+
+   public ServerThread(Socket sk) {
+   this.sk = sk;
+   }
+
+   public void run() {
+   try {
+   rdr = new BufferedReader(new 
InputStreamReader(sk
+   .getInputStream()));
+   value = rdr.readLine();
+   } catch (IOException e) {
+   error.set(e);
+   }
+   }
+   }
+   }
+
+   @Test
+   public void testSocketSink() throws Exception{
+
+   SocketServer server = new SocketServer();
+   server.start();
+
+   SerializationSchemaString, byte[] simpleSchema = new 
SerializationSchemaString, byte[]() {
+   @Override
+   public byte[] serialize(String element) {
+   return element.getBytes();
+   }
+   };
+
+   SocketClientSinkString simpleSink = new 
SocketClientSinkString(host, port, simpleSchema);
+   simpleSink.open(new Configuration());
+   simpleSink.invoke(testSocketSinkInvoke);
+   

[jira] [Commented] (FLINK-2306) Add support for named streams in Storm compatibility layer

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697200#comment-14697200
 ] 

ASF GitHub Bot commented on FLINK-2306:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1011#issuecomment-131144983
  
Looks like an issue with shaded dependencies.

Have a look here for some background on shading: 
https://cwiki.apache.org/confluence/display/FLINK/Hadoop+Versions+and+Dependency+Shading


 Add support for named streams in Storm compatibility layer
 --

 Key: FLINK-2306
 URL: https://issues.apache.org/jira/browse/FLINK-2306
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax

 Currently, the layer only works on single stream and ignores stream names, 
 ie, each stream is treated as default stream. The declaration of multiple 
 output streams is ignored (all tuples are emitted to the same stream). If 
 multiple input streams are consumed all tuples are merged into a single 
 stream.
 This feature allows operators to declare multiple (named) output streams and 
 emit tuples to different stream. Furthermore, it enables Bolts to distinguish 
 incoming tuples from different streams by stream name (Storm tuple meta 
 information).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2477][Add]Add tests for SocketClientSin...

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/977#issuecomment-131144157
  
I think this is good, minus two small comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2477][Add]Add tests for SocketClientSin...

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/977#discussion_r37086274
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/functions/SocketClientSinkTest.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.functions;
+
+import java.io.IOException;
+import java.net.Socket;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.SocketClientSink;
+import org.apache.flink.streaming.util.serialization.SerializationSchema;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.junit.Assert.*;
+
+import java.io.BufferedReader;
+import java.io.InputStreamReader;
+import java.net.ServerSocket;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * Tests for the {@link 
org.apache.flink.streaming.api.functions.sink.SocketClientSink}.
+ */
+public class SocketClientSinkTest{
+
+   final AtomicReferenceThrowable error = new 
AtomicReferenceThrowable();
+   private final String host = 127.0.0.1;
+   private int port;
+   private String access;
+   private String value;
+   public SocketServer.ServerThread th;
+
+   public SocketClientSinkTest() {
+   }
+
+   class SocketServer extends Thread {
+
+   private ServerSocket server;
+   private Socket sk;
+   private BufferedReader rdr;
+
+   private SocketServer() {
+   try {
+   this.server = new ServerSocket(0);
+   port = server.getLocalPort();
+   } catch (Exception e) {
+   error.set(e);
+   }
+   }
+
+   public void run() {
+   try {
+   sk = server.accept();
+   access = Connected;
+   th = new ServerThread(sk);
+   th.start();
+   } catch (Exception e) {
+   error.set(e);
+   }
+   }
+
+   class ServerThread extends Thread {
+   Socket sk;
+
+   public ServerThread(Socket sk) {
+   this.sk = sk;
+   }
+
+   public void run() {
+   try {
+   rdr = new BufferedReader(new 
InputStreamReader(sk
+   .getInputStream()));
+   value = rdr.readLine();
+   } catch (IOException e) {
+   error.set(e);
+   }
+   }
+   }
+   }
+
+   @Test
+   public void testSocketSink() throws Exception{
+
+   SocketServer server = new SocketServer();
+   server.start();
+
+   SerializationSchemaString, byte[] simpleSchema = new 
SerializationSchemaString, byte[]() {
+   @Override
+   public byte[] serialize(String element) {
+   return element.getBytes();
+   }
+   };
+
+   SocketClientSinkString simpleSink = new 
SocketClientSinkString(host, port, simpleSchema);
+   simpleSink.open(new Configuration());
+   simpleSink.invoke(testSocketSinkInvoke);
+   simpleSink.close();
+   try {
+   server.join();
+   th.join();
+   }
+   catch (Exception e){
+   Assert.fail(e.getMessage());
--- End diff --

You don't need to 

[GitHub] flink pull request: [FLINK-2512]Add client.close() before throw Ru...

2015-08-14 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/1009#issuecomment-131148157
  
All tests are failing though, not sure if bc this patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1004#issuecomment-131148744
  
Wow, looks like a good piece of work. Nicely with tests and everything.
Build also passes, style looks good.

+1 to merge this from my side. I'd like to wait for a day or two to get a 
comment from one of the Gelly people (Vasia or Andra).

One thing, though: In the Batch and streaming APIs, we added a 
completeness check to make sure that methods added to the Java APIs are also 
present in the Scala APIs. Would that be a good thing to add here as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697229#comment-14697229
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/983#issuecomment-131150367
  
Looks good, will merge this.

Will make slight adjustments in the merge (for code style consistency), 
like naming the single instance uppercase `INSTANCE`.


 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2291] [runtime] Adds high availability ...

2015-08-14 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/1016

[FLINK-2291] [runtime] Adds high availability support via ZooKeeper

## Idea

This PR introduces cluster high availability via ZooKeeper. The idea is to 
use ZooKeeper to do leader election among a group of registered `JobManagers`. 
The elected leader writes his akka connection URL and his assigned leader 
session ID to ZooKeeper from where the `TaskManagers` can retrieve it.

## Activation

In order to use the high availability mode, one has to select Flinks 
*zookeeper* **recovery mode** and specify a valid **ZK quorum**. Both is done 
in the `flink-conf.yaml` by setting `recovery.mode: zookeeper` and 
`ha.zookeeper.quorum: address1:2181[,...],addressX:2181` where the zk quorum 
addresses point to ZooKeeper servers.

## Implementation

In order to support HA ZK and also the standalone recovery mode (no HA), 
this PR introduces the `LeaderElectionService` and the 
`LeaderRetrievalService`. The former service is used by leader contenders to be 
elected as the leader. The latter is used to obtain the address of the current 
leader. In standalone mode (`StandaloneLeaderElectionService`, 
`StandaloneLeaderRetrievalService`), these services just return the 
`JobManager` address which was found in the Flink configuration. With 
ZooKeeper, the services use the Curator framework to connect to the ZooKeeper 
quorum to do leader election and to read the ZkNode which contains the 
information of the current leader.

In the wake of introducing these services, the `FlinkMiniCluster` was also 
adapted to support HA with ZooKeeper. The `ForkableFlinkMiniCluster` starts 
automatically a ZK TestingCluster if `recovery.mode` was set to **zookeeper** 
in the provided configuration and if the `ha.zookeeper.quorum` was not set.

## Limitations

Currently in HA ZK mode, one web server is started per `JobManager`. The 
reason for not having a dedicated web server is that parts of the information 
it requires are not serializable yet. Thus, each `JobManager` has a local web 
server showing its own state. Since the web servers are running dedicatedly for 
one `JobManager`, they don't know about the current leader session ID. 
Therefore, it is not possible to cancel jobs via the web servers. This is 
because a `CancelJob` message is a `RequiresLeaderSessionMessage`.

This PR depends on the PR #1015.

Since this PR touches a lot of files, a close review, as far as possible, 
would be helpful. I'll add some more descriptions of the internal workings 
after the weekend.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink ha

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1016


commit c29a6e67ea6b637405a6392e4c54702907679e95
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-08-14T13:06:06Z

[FLINK-2521] [tests] Adds TestLogger class which automatically logs the 
currently executed test names and the reasons for a failure.

Makes test bases extend TestLogger to add automatic test name logging

commit e582b3b6cfef9c2c2f6b038df67270ad4b0259d6
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-06-26T10:07:39Z

[FLINK-2291] [runtime] Add ZooKeeper support to elect a leader from a set 
of JobManager. The leader will then be retrieved from ZooKeeper by the 
TaskManagers.

Refactors FlinkMiniCluster to support multiple JobManager

Adds proper remote address resolution for actors

Clean up of LeaderElection and LeaderRetrievalService. Removes 
synchronization to avoid deadlock.

Adds ZooKeeper start option to TestBaseUtils.startCluster

Removes registration session IDs, using the leader session IDs instead. 
Sets the leader session ID
 directly in the grantLeadership method. Let the LeaderElectionService 
select the leader session I
D. Return leader session ID to LeaderRetrievalListeners.

Removes direct ActorRef interaction

Introduces LeaderRetrievalService for the Client and the CliFrontend.

Make ApplicationClient to use the LeaderRetrievalService for JobManager 
resolution

Adds LeaderElection/Retrieval tests

Added test for exception forwarding from the CuratorFramework to a Contender

Adds test job submission with changing leaders

Adds new test cases for job cleanup after leader election change

Adds new LeaderChangeStateCleanup test case

Adds LeaderElectionRetrievalTestingCluster

commit 0e7b8b777a051159e621cb38f9bd1f6b0a84c778
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-07-29T14:52:38Z

Introduces 

[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/983#issuecomment-131150367
  
Looks good, will merge this.

Will make slight adjustments in the merge (for code style consistency), 
like naming the single instance uppercase `INSTANCE`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2291) Use ZooKeeper to elect JobManager leader and send information to TaskManagers

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697230#comment-14697230
 ] 

ASF GitHub Bot commented on FLINK-2291:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/1016

[FLINK-2291] [runtime] Adds high availability support via ZooKeeper

## Idea

This PR introduces cluster high availability via ZooKeeper. The idea is to 
use ZooKeeper to do leader election among a group of registered `JobManagers`. 
The elected leader writes his akka connection URL and his assigned leader 
session ID to ZooKeeper from where the `TaskManagers` can retrieve it.

## Activation

In order to use the high availability mode, one has to select Flinks 
*zookeeper* **recovery mode** and specify a valid **ZK quorum**. Both is done 
in the `flink-conf.yaml` by setting `recovery.mode: zookeeper` and 
`ha.zookeeper.quorum: address1:2181[,...],addressX:2181` where the zk quorum 
addresses point to ZooKeeper servers.

## Implementation

In order to support HA ZK and also the standalone recovery mode (no HA), 
this PR introduces the `LeaderElectionService` and the 
`LeaderRetrievalService`. The former service is used by leader contenders to be 
elected as the leader. The latter is used to obtain the address of the current 
leader. In standalone mode (`StandaloneLeaderElectionService`, 
`StandaloneLeaderRetrievalService`), these services just return the 
`JobManager` address which was found in the Flink configuration. With 
ZooKeeper, the services use the Curator framework to connect to the ZooKeeper 
quorum to do leader election and to read the ZkNode which contains the 
information of the current leader.

In the wake of introducing these services, the `FlinkMiniCluster` was also 
adapted to support HA with ZooKeeper. The `ForkableFlinkMiniCluster` starts 
automatically a ZK TestingCluster if `recovery.mode` was set to **zookeeper** 
in the provided configuration and if the `ha.zookeeper.quorum` was not set.

## Limitations

Currently in HA ZK mode, one web server is started per `JobManager`. The 
reason for not having a dedicated web server is that parts of the information 
it requires are not serializable yet. Thus, each `JobManager` has a local web 
server showing its own state. Since the web servers are running dedicatedly for 
one `JobManager`, they don't know about the current leader session ID. 
Therefore, it is not possible to cancel jobs via the web servers. This is 
because a `CancelJob` message is a `RequiresLeaderSessionMessage`.

This PR depends on the PR #1015.

Since this PR touches a lot of files, a close review, as far as possible, 
would be helpful. I'll add some more descriptions of the internal workings 
after the weekend.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink ha

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1016


commit c29a6e67ea6b637405a6392e4c54702907679e95
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-08-14T13:06:06Z

[FLINK-2521] [tests] Adds TestLogger class which automatically logs the 
currently executed test names and the reasons for a failure.

Makes test bases extend TestLogger to add automatic test name logging

commit e582b3b6cfef9c2c2f6b038df67270ad4b0259d6
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-06-26T10:07:39Z

[FLINK-2291] [runtime] Add ZooKeeper support to elect a leader from a set 
of JobManager. The leader will then be retrieved from ZooKeeper by the 
TaskManagers.

Refactors FlinkMiniCluster to support multiple JobManager

Adds proper remote address resolution for actors

Clean up of LeaderElection and LeaderRetrievalService. Removes 
synchronization to avoid deadlock.

Adds ZooKeeper start option to TestBaseUtils.startCluster

Removes registration session IDs, using the leader session IDs instead. 
Sets the leader session ID
 directly in the grantLeadership method. Let the LeaderElectionService 
select the leader session I
D. Return leader session ID to LeaderRetrievalListeners.

Removes direct ActorRef interaction

Introduces LeaderRetrievalService for the Client and the CliFrontend.

Make ApplicationClient to use the LeaderRetrievalService for JobManager 
resolution

Adds LeaderElection/Retrieval tests

Added test for exception forwarding from the CuratorFramework to a Contender

Adds test job submission with changing leaders

Adds new test cases for job cleanup after leader 

[GitHub] flink pull request: [WIP][FLINK-2386] Add new Kafka Consumers

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/996#issuecomment-131148950
  
I am in the middle of polishing this, adding more tests, and fixing quite a 
few remaining bugs. Will hopefully open a new pull request soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2386) Implement Kafka connector using the new Kafka Consumer API

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697223#comment-14697223
 ] 

ASF GitHub Bot commented on FLINK-2386:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/996#issuecomment-131148950
  
I am in the middle of polishing this, adding more tests, and fixing quite a 
few remaining bugs. Will hopefully open a new pull request soon.


 Implement Kafka connector using the new Kafka Consumer API
 --

 Key: FLINK-2386
 URL: https://issues.apache.org/jira/browse/FLINK-2386
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector
Reporter: Robert Metzger
Assignee: Robert Metzger

 Once Kafka has released its new consumer API, we should provide a connector 
 for that version.
 The release will probably be called 0.9 or 0.8.3.
 The connector will be mostly compatible with Kafka 0.8.2.x, except for 
 committing offsets to the broker (the new connector expects a coordinator to 
 be available on Kafka). To work around that, we can provide a configuration 
 option to commit offsets to zookeeper (managed by flink code).
 For 0.9/0.8.3 it will be fully compatible.
 It will not be compatible with 0.8.1 because of mismatching Kafka messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2477) Add test for SocketClientSink

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697196#comment-14697196
 ] 

ASF GitHub Bot commented on FLINK-2477:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/977#issuecomment-131144157
  
I think this is good, minus two small comments.


 Add test for SocketClientSink
 -

 Key: FLINK-2477
 URL: https://issues.apache.org/jira/browse/FLINK-2477
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.10
 Environment: win7 32bit;linux
Reporter: Huang Wei
Priority: Minor
 Fix For: 0.10

   Original Estimate: 168h
  Remaining Estimate: 168h

 Add some tests for SocketClientSink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2306] Add support for named streams in ...

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1011#issuecomment-131144983
  
Looks like an issue with shaded dependencies.

Have a look here for some background on shading: 
https://cwiki.apache.org/confluence/display/FLINK/Hadoop+Versions+and+Dependency+Shading


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2512]Add client.close() before throw Ru...

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1009#issuecomment-131146003
  
Looks good, will merge this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2477) Add test for SocketClientSink

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697194#comment-14697194
 ] 

ASF GitHub Bot commented on FLINK-2477:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/977#discussion_r37086274
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/api/functions/SocketClientSinkTest.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.functions;
+
+import java.io.IOException;
+import java.net.Socket;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.streaming.api.functions.sink.SocketClientSink;
+import org.apache.flink.streaming.util.serialization.SerializationSchema;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.junit.Assert.*;
+
+import java.io.BufferedReader;
+import java.io.InputStreamReader;
+import java.net.ServerSocket;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * Tests for the {@link 
org.apache.flink.streaming.api.functions.sink.SocketClientSink}.
+ */
+public class SocketClientSinkTest{
+
+   final AtomicReferenceThrowable error = new 
AtomicReferenceThrowable();
+   private final String host = 127.0.0.1;
+   private int port;
+   private String access;
+   private String value;
+   public SocketServer.ServerThread th;
+
+   public SocketClientSinkTest() {
+   }
+
+   class SocketServer extends Thread {
+
+   private ServerSocket server;
+   private Socket sk;
+   private BufferedReader rdr;
+
+   private SocketServer() {
+   try {
+   this.server = new ServerSocket(0);
+   port = server.getLocalPort();
+   } catch (Exception e) {
+   error.set(e);
+   }
+   }
+
+   public void run() {
+   try {
+   sk = server.accept();
+   access = Connected;
+   th = new ServerThread(sk);
+   th.start();
+   } catch (Exception e) {
+   error.set(e);
+   }
+   }
+
+   class ServerThread extends Thread {
+   Socket sk;
+
+   public ServerThread(Socket sk) {
+   this.sk = sk;
+   }
+
+   public void run() {
+   try {
+   rdr = new BufferedReader(new 
InputStreamReader(sk
+   .getInputStream()));
+   value = rdr.readLine();
+   } catch (IOException e) {
+   error.set(e);
+   }
+   }
+   }
+   }
+
+   @Test
+   public void testSocketSink() throws Exception{
+
+   SocketServer server = new SocketServer();
+   server.start();
+
+   SerializationSchemaString, byte[] simpleSchema = new 
SerializationSchemaString, byte[]() {
+   @Override
+   public byte[] serialize(String element) {
+   return element.getBytes();
+   }
+   };
+
+   SocketClientSinkString simpleSink = new 
SocketClientSinkString(host, port, simpleSchema);
+   simpleSink.open(new Configuration());
+   simpleSink.invoke(testSocketSinkInvoke);
+   

[jira] [Commented] (FLINK-2512) Add client.close() before throw RuntimeException

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697210#comment-14697210
 ] 

ASF GitHub Bot commented on FLINK-2512:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1009#issuecomment-131146003
  
Looks good, will merge this...


 Add client.close() before throw RuntimeException
 

 Key: FLINK-2512
 URL: https://issues.apache.org/jira/browse/FLINK-2512
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: fangfengbin
Assignee: fangfengbin
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2512) Add client.close() before throw RuntimeException

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697224#comment-14697224
 ] 

ASF GitHub Bot commented on FLINK-2512:
---

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/1009#discussion_r37087438
  
--- Diff: 
flink-contrib/flink-storm-compatibility/flink-storm-compatibility-core/src/main/java/org/apache/flink/stormcompatibility/api/FlinkSubmitter.java
 ---
@@ -99,23 +99,24 @@ public static void submitTopology(final String name, 
final Map stormConf, final
 
final String serConf = JSONValue.toJSONString(stormConf);
 
-   final FlinkClient client = 
FlinkClient.getConfiguredClient(stormConf);
-   if (client.getTopologyJobId(name) != null) {
-   throw new RuntimeException(Topology with name ` + 
name + ` already exists on cluster);
-   }
-   String localJar = System.getProperty(storm.jar);
-   if (localJar == null) {
-   try {
-   for (final File file : ((ContextEnvironment) 
ExecutionEnvironment.getExecutionEnvironment())
-   .getJars()) {
-   // TODO verify that there is onnly one 
jar
-   localJar = file.getAbsolutePath();
+   try {
+   final FlinkClient client = 
FlinkClient.getConfiguredClient(stormConf);
--- End diff --

Shouldn't declaration of ``client`` be outside of the try clause?


 Add client.close() before throw RuntimeException
 

 Key: FLINK-2512
 URL: https://issues.apache.org/jira/browse/FLINK-2512
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: fangfengbin
Assignee: fangfengbin
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2512]Add client.close() before throw Ru...

2015-08-14 Thread hsaputra
Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/1009#discussion_r37087438
  
--- Diff: 
flink-contrib/flink-storm-compatibility/flink-storm-compatibility-core/src/main/java/org/apache/flink/stormcompatibility/api/FlinkSubmitter.java
 ---
@@ -99,23 +99,24 @@ public static void submitTopology(final String name, 
final Map stormConf, final
 
final String serConf = JSONValue.toJSONString(stormConf);
 
-   final FlinkClient client = 
FlinkClient.getConfiguredClient(stormConf);
-   if (client.getTopologyJobId(name) != null) {
-   throw new RuntimeException(Topology with name ` + 
name + ` already exists on cluster);
-   }
-   String localJar = System.getProperty(storm.jar);
-   if (localJar == null) {
-   try {
-   for (final File file : ((ContextEnvironment) 
ExecutionEnvironment.getExecutionEnvironment())
-   .getJars()) {
-   // TODO verify that there is onnly one 
jar
-   localJar = file.getAbsolutePath();
+   try {
+   final FlinkClient client = 
FlinkClient.getConfiguredClient(stormConf);
--- End diff --

Shouldn't declaration of ``client`` be outside of the try clause?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2512) Add client.close() before throw RuntimeException

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697219#comment-14697219
 ] 

ASF GitHub Bot commented on FLINK-2512:
---

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/1009#issuecomment-131148157
  
All tests are failing though, not sure if bc this patch


 Add client.close() before throw RuntimeException
 

 Key: FLINK-2512
 URL: https://issues.apache.org/jira/browse/FLINK-2512
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: fangfengbin
Assignee: fangfengbin
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697222#comment-14697222
 ] 

ASF GitHub Bot commented on FLINK-1962:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1004#issuecomment-131148744
  
Wow, looks like a good piece of work. Nicely with tests and everything.
Build also passes, style looks good.

+1 to merge this from my side. I'd like to wait for a day or two to get a 
comment from one of the Gelly people (Vasia or Andra).

One thing, though: In the Batch and streaming APIs, we added a 
completeness check to make sure that methods added to the Java APIs are also 
present in the Scala APIs. Would that be a good thing to add here as well?


 Add Gelly Scala API
 ---

 Key: FLINK-1962
 URL: https://issues.apache.org/jira/browse/FLINK-1962
 Project: Flink
  Issue Type: Task
  Components: Gelly, Scala API
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: PJ Van Aeken





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2

2015-08-14 Thread PieterJanVanAeken
Github user PieterJanVanAeken commented on the pull request:

https://github.com/apache/flink/pull/1004#issuecomment-131156825
  
I would postpone adding the completeness check as it will currently fail. 
Since I started working on this, the Java Gelly API has changed and while I 
modified my work to be compatible with the changes, not all new Java Gelly 
methods have a Scala counterpart yet. It was discussed briefly in the initial 
PR #808 and it was decided that adding these new methods should go into a 
separate JIRA issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697258#comment-14697258
 ] 

ASF GitHub Bot commented on FLINK-1962:
---

Github user PieterJanVanAeken commented on the pull request:

https://github.com/apache/flink/pull/1004#issuecomment-131156825
  
I would postpone adding the completeness check as it will currently fail. 
Since I started working on this, the Java Gelly API has changed and while I 
modified my work to be compatible with the changes, not all new Java Gelly 
methods have a Scala counterpart yet. It was discussed briefly in the initial 
PR #808 and it was decided that adding these new methods should go into a 
separate JIRA issue.


 Add Gelly Scala API
 ---

 Key: FLINK-1962
 URL: https://issues.apache.org/jira/browse/FLINK-1962
 Project: Flink
  Issue Type: Task
  Components: Gelly, Scala API
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: PJ Van Aeken





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Framesize fix

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/934#issuecomment-131208713
  
I will have a look in a bit.

Am currently stuck with some streaming runtime fixes and the Kafka 
integration. Overflowing a bit with issues right now.

Thanks for bearing with us!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2523) Make task canceling interrupt interval configurable

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-2523:

Summary: Make task canceling interrupt interval configurable  (was: 
Increase interrupt timeout in Task Canceling)

 Make task canceling interrupt interval configurable
 ---

 Key: FLINK-2523
 URL: https://issues.apache.org/jira/browse/FLINK-2523
 Project: Flink
  Issue Type: Improvement
  Components: TaskManager
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 When a task is canceled, the cancellation calls periodically interrupt() on 
 the task thread, if the task thread does not cancel with a certain time.
 Currently, this value is hard coded to 10 seconds. We should make that time 
 configurable.
 Until then, I would like to increase the value to 30 seconds, as many tasks 
 (here I am observing it for Kafka consumers) can take longer then 10 seconds 
 for proper cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Framesize fix

2015-08-14 Thread kl0u
Github user kl0u commented on the pull request:

https://github.com/apache/flink/pull/934#issuecomment-131208217
  
Just rebased with the new version of the master.
Please have a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Framesize fix

2015-08-14 Thread kl0u
Github user kl0u commented on the pull request:

https://github.com/apache/flink/pull/934#issuecomment-131209661
  
No problem! 
This message was just a reminder.

Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-2431) [py] refactor PlanBinder/OperationInfo

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2431.
-
   Resolution: Fixed
Fix Version/s: 0.10

Fixed via f350e45d97015c5d36a1d0f02025e6c6eeca44fe

 [py] refactor PlanBinder/OperationInfo
 --

 Key: FLINK-2431
 URL: https://issues.apache.org/jira/browse/FLINK-2431
 Project: Flink
  Issue Type: Improvement
  Components: Python API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
Priority: Minor
 Fix For: 0.10


 These two classes deserve a restructuring to become more readable and 
 consistent with PythonPlanBinder/PythonOperationInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2523) Make task canceling interrupt interval configurable

2015-08-14 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697527#comment-14697527
 ] 

Stephan Ewen commented on FLINK-2523:
-

Initially increased to 30 seconds in 852d19c6d24adb951a5f85e82324e300ae6f7dea

 Make task canceling interrupt interval configurable
 ---

 Key: FLINK-2523
 URL: https://issues.apache.org/jira/browse/FLINK-2523
 Project: Flink
  Issue Type: Improvement
  Components: TaskManager
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 When a task is canceled, the cancellation calls periodically interrupt() on 
 the task thread, if the task thread does not cancel with a certain time.
 Currently, this value is hard coded to 10 seconds. We should make that time 
 configurable.
 Until then, I would like to increase the value to 30 seconds, as many tasks 
 (here I am observing it for Kafka consumers) can take longer then 10 seconds 
 for proper cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2457) Integrate Tuple0

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697522#comment-14697522
 ] 

ASF GitHub Bot commented on FLINK-2457:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/983


 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2431) [py] refactor PlanBinder/OperationInfo

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697521#comment-14697521
 ] 

ASF GitHub Bot commented on FLINK-2431:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/961


 [py] refactor PlanBinder/OperationInfo
 --

 Key: FLINK-2431
 URL: https://issues.apache.org/jira/browse/FLINK-2431
 Project: Flink
  Issue Type: Improvement
  Components: Python API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
Priority: Minor
 Fix For: 0.10


 These two classes deserve a restructuring to become more readable and 
 consistent with PythonPlanBinder/PythonOperationInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2457) Integrate Tuple0

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2457.
---

 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor
 Fix For: 0.10


 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2457) Integrate Tuple0

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2457.
-
   Resolution: Fixed
Fix Version/s: 0.10

Implemented in fab9ce5d87976d22c2fec0cfb732fb6526d6ee15

Thank you for the contribution

 Integrate Tuple0
 

 Key: FLINK-2457
 URL: https://issues.apache.org/jira/browse/FLINK-2457
 Project: Flink
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor
 Fix For: 0.10


 Tuple0 is not cleanly integrated:
   - missing serialization/deserialization support in runtime
  - Tuple.getTupleClass(int arity) cannot handle arity zero, ie, cannot create 
 an instance of Tuple0
 Tuple0 is currently only used in Python API, but will be integrated into 
 Storm compatibility, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2431] Refactor PlanBinder/OperationInfo

2015-08-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/961


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2457] Integrate Tuple0

2015-08-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/983


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2431) [py] refactor PlanBinder/OperationInfo

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2431.
---

 [py] refactor PlanBinder/OperationInfo
 --

 Key: FLINK-2431
 URL: https://issues.apache.org/jira/browse/FLINK-2431
 Project: Flink
  Issue Type: Improvement
  Components: Python API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
Priority: Minor
 Fix For: 0.10


 These two classes deserve a restructuring to become more readable and 
 consistent with PythonPlanBinder/PythonOperationInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2431] Refactor PlanBinder/OperationInfo

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/961#issuecomment-131167834
  
Will merge this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2431) [py] refactor PlanBinder/OperationInfo

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697321#comment-14697321
 ] 

ASF GitHub Bot commented on FLINK-2431:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/961#issuecomment-131167834
  
Will merge this...


 [py] refactor PlanBinder/OperationInfo
 --

 Key: FLINK-2431
 URL: https://issues.apache.org/jira/browse/FLINK-2431
 Project: Flink
  Issue Type: Improvement
  Components: Python API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
Priority: Minor

 These two classes deserve a restructuring to become more readable and 
 consistent with PythonPlanBinder/PythonOperationInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API v2

2015-08-14 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1004#issuecomment-131169939
  
+1 from me too. Thanks for your great work @PieterJanVanAeken!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---



[GitHub] flink pull request: [FLINK-2486]Remove unwanted null check in remo...

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/989#issuecomment-131165838
  
Sorry for catching this so late.

I actually put such sanity checks in a lot of places in the code I write, 
and they have served well.
In the case where everything is all right, they are not necessary.
As soon as someone changes the code that calls this function (may happen in 
the future), this can help catching bugs (fail fast principle).

I would like to keep such checks, they really do not hurt, and possibly 
help in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697327#comment-14697327
 ] 

ASF GitHub Bot commented on FLINK-1962:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1004#issuecomment-131169939
  
+1 from me too. Thanks for your great work @PieterJanVanAeken!


 Add Gelly Scala API
 ---

 Key: FLINK-1962
 URL: https://issues.apache.org/jira/browse/FLINK-1962
 Project: Flink
  Issue Type: Task
  Components: Gelly, Scala API
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: PJ Van Aeken





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2521) Add automatic test name logging for tests

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697838#comment-14697838
 ] 

ASF GitHub Bot commented on FLINK-2521:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1015#issuecomment-131248356
  
Looks very nice!

I would actually make the log statements about start and stop more 
prominent (for example frame them with an ascii ruler) to make them visually 
easier locateable in the log file.


 Add automatic test name logging for tests
 -

 Key: FLINK-2521
 URL: https://issues.apache.org/jira/browse/FLINK-2521
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Priority: Minor

 When running tests on travis the Flink components log to a file. This is 
 helpful in case of a failed test to retrieve the error. However, the log does 
 not contain the test name and the reason for the failure. Therefore it is 
 difficult to find the log output which corresponds to the failed test.
 It would be nice to automatically add the test case information to the log. 
 This would ease the debugging process big time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2521] [tests] Adds automatic test name ...

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1015#issuecomment-131248356
  
Looks very nice!

I would actually make the log statements about start and stop more 
prominent (for example frame them with an ascii ruler) to make them visually 
easier locateable in the log file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2495) Add a null point check in API DataStream.union

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2495.
---

 Add a null point check in API DataStream.union
 --

 Key: FLINK-2495
 URL: https://issues.apache.org/jira/browse/FLINK-2495
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Huang Wei
   Original Estimate: 168h
  Remaining Estimate: 168h

 The API(public DataStreamOUT union(DataStreamOUT... streams)) is a  
 external interface for user.
 The parameter streams maybe null and it will throw NullPointerException 
 error.
 This test below can be intuitive to explain this problem:
 package org.apache.flink.streaming.api;
 import org.apache.flink.api.common.functions.MapFunction;
 import org.apache.flink.streaming.api.datastream.DataStream;
 import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
 import 
 org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
 import org.junit.Test;
 /**
  * Created by HuangWHWHW on 2015/8/7.
  */
 public class test {
   public static class sourceFunction extends 
 RichParallelSourceFunctionString {
   public sourceFunction() {
   }
   @Override
   public void run(SourceContextString sourceContext) throws 
 Exception {
   sourceContext.collect(a);
   }
   @Override
   public void cancel() {
   }
   }
   @Test
   public void testUnion(){
   StreamExecutionEnvironment env = 
 StreamExecutionEnvironment.getExecutionEnvironment();
   env.setParallelism(1);
   DataStreamString source = env.addSource(new sourceFunction());
   DataStreamString temp1 = null;
   DataStreamString temp2 = source.map(new MapFunctionString, 
 String() {
   @Override
   public String map(String value) throws Exception {
   if (value == a) {
   return This is for test temp2.;
   }
   return null;
   }
   });
   DataStreamString sink = temp2.union(temp1);
   sink.print();
   try {
   env.execute();
   }catch (Exception e){
   e.printStackTrace();
   }
   }
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2495) Add a null point check in API DataStream.union

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2495.
-
   Resolution: Won't Fix
Fix Version/s: (was: 0.10)

Decided to not fix (see issue discussion)

 Add a null point check in API DataStream.union
 --

 Key: FLINK-2495
 URL: https://issues.apache.org/jira/browse/FLINK-2495
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Huang Wei
   Original Estimate: 168h
  Remaining Estimate: 168h

 The API(public DataStreamOUT union(DataStreamOUT... streams)) is a  
 external interface for user.
 The parameter streams maybe null and it will throw NullPointerException 
 error.
 This test below can be intuitive to explain this problem:
 package org.apache.flink.streaming.api;
 import org.apache.flink.api.common.functions.MapFunction;
 import org.apache.flink.streaming.api.datastream.DataStream;
 import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
 import 
 org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
 import org.junit.Test;
 /**
  * Created by HuangWHWHW on 2015/8/7.
  */
 public class test {
   public static class sourceFunction extends 
 RichParallelSourceFunctionString {
   public sourceFunction() {
   }
   @Override
   public void run(SourceContextString sourceContext) throws 
 Exception {
   sourceContext.collect(a);
   }
   @Override
   public void cancel() {
   }
   }
   @Test
   public void testUnion(){
   StreamExecutionEnvironment env = 
 StreamExecutionEnvironment.getExecutionEnvironment();
   env.setParallelism(1);
   DataStreamString source = env.addSource(new sourceFunction());
   DataStreamString temp1 = null;
   DataStreamString temp2 = source.map(new MapFunctionString, 
 String() {
   @Override
   public String map(String value) throws Exception {
   if (value == a) {
   return This is for test temp2.;
   }
   return null;
   }
   });
   DataStreamString sink = temp2.union(temp1);
   sink.print();
   try {
   env.execute();
   }catch (Exception e){
   e.printStackTrace();
   }
   }
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2462) Wrong exception reporting in streaming jobs

2015-08-14 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen reassigned FLINK-2462:
---

Assignee: Stephan Ewen

 Wrong exception reporting in streaming jobs
 ---

 Key: FLINK-2462
 URL: https://issues.apache.org/jira/browse/FLINK-2462
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 0.10


 When streaming tasks are fail and are canceled, they report a plethora of 
 followup exceptions.
 The batch operators have a clear model that makes sure that root causes are 
 reported, and followup exceptions are not reported. That makes debugging much 
 easier.
 A big part of that is to have a single consistent place that logs exceptions, 
 and that has a view of whether the operation is still running, or whether it 
 has been canceled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2482) Document sreaming processing guarantees

2015-08-14 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697802#comment-14697802
 ] 

Stephan Ewen commented on FLINK-2482:
-

A lot is done as of 21a0c94baafd77297c8eb88367fc8caaac43d8ee 
(https://github.com/apache/flink/commit/21a0c94baafd77297c8eb88367fc8caaac43d8ee)

Is that sufficient?

 Document sreaming processing guarantees
 ---

 Key: FLINK-2482
 URL: https://issues.apache.org/jira/browse/FLINK-2482
 Project: Flink
  Issue Type: Bug
  Components: Streaming, Tests
Affects Versions: 0.10
Reporter: Márton Balassi
 Fix For: 0.10


 Note the latency benefit of at least once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2291] [runtime] Adds high availability ...

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1016#issuecomment-131246273
  
Big piece of work.

I'd like to have a look at this, but it may take a few days...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2291) Use ZooKeeper to elect JobManager leader and send information to TaskManagers

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697831#comment-14697831
 ] 

ASF GitHub Bot commented on FLINK-2291:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1016#issuecomment-131246273
  
Big piece of work.

I'd like to have a look at this, but it may take a few days...


 Use ZooKeeper to elect JobManager leader and send information to TaskManagers
 -

 Key: FLINK-2291
 URL: https://issues.apache.org/jira/browse/FLINK-2291
 Project: Flink
  Issue Type: Sub-task
  Components: JobManager, TaskManager
Affects Versions: 0.10
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 0.10


 Use ZooKeeper to determine the leader of a set of {{JobManagers}} which will 
 act as the responsible {{JobManager}} for all {{TaskManager}}. The 
 {{TaskManager}} will get the address of the leader from ZooKeeper.
 Related Wiki: 
 [https://cwiki.apache.org/confluence/display/FLINK/JobManager+High+Availability]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2291) Use ZooKeeper to elect JobManager leader and send information to TaskManagers

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697839#comment-14697839
 ] 

ASF GitHub Bot commented on FLINK-2291:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1016#issuecomment-131249089
  
From a first glance this looks super nice! Very excited to get this in and 
try it out seriously on a cluster!

Good day, today :-)


 Use ZooKeeper to elect JobManager leader and send information to TaskManagers
 -

 Key: FLINK-2291
 URL: https://issues.apache.org/jira/browse/FLINK-2291
 Project: Flink
  Issue Type: Sub-task
  Components: JobManager, TaskManager
Affects Versions: 0.10
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 0.10


 Use ZooKeeper to determine the leader of a set of {{JobManagers}} which will 
 act as the responsible {{JobManager}} for all {{TaskManager}}. The 
 {{TaskManager}} will get the address of the leader from ZooKeeper.
 Related Wiki: 
 [https://cwiki.apache.org/confluence/display/FLINK/JobManager+High+Availability]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2291] [runtime] Adds high availability ...

2015-08-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1016#issuecomment-131249089
  
From a first glance this looks super nice! Very excited to get this in and 
try it out seriously on a cluster!

Good day, today :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2462] [streaming] Major cleanup of stre...

2015-08-14 Thread StephanEwen
GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/1017

[FLINK-2462] [streaming] Major cleanup of streaming task structure

This pull request addresses exception handling, code duplication, and 
missed resource cleanups in the streaming operators.

I mixed multiple issues in this pull request, which would have been better 
separated, but many were recognized in the rework, and it was tricky to pull 
the fixes apart.

**NOTE** I have not managed to adjust all tests, yet, but I wanted to open 
this early for feedback.

## Exception handling

The exceptions are no longer logged by the operators themselves. Operators 
perform only cleanup in reaction to exceptions.

Exceptions are reported only the the root Task object, which knows whether 
this is the first failure-causing exception (root cause), or is a subsequent 
exception, or whether the task was actually canceled already. In the later 
case, exceptions are ignored, because many cancellations lead to meaningless 
exceptions.

Added more exception in signatures, less exception wrapping where not needed

## Unified setup / teardown structure in streaming tasks

Core resource acquisition/release logic is in `StreamTask`, reducing code 
duplication.
Subtasks (e.g., `OneInputStreamTask`, `IterationTailStreamTask`) implement 
slim methods for certain parts of the life cycle. The `OneInputStreamTask` 
becomes as simple as this

```java
public void init() throws Exception {
TypeSerializerIN inSerializer = 
configuration.getTypeSerializerIn1(getUserCodeClassLoader());
InputGate[] inputGates = getEnvironment().getAllInputGates();
inputProcessor = new StreamInputProcessorIN(inputGates, inSerializer,
getCheckpointBarrierListener(), 
configuration.getCheckpointMode(),
getEnvironment().getIOManager(),
getExecutionConfig().areTimestampsEnabled());

// make sure that stream tasks report their I/O statistics
AccumulatorRegistry registry = 
getEnvironment().getAccumulatorRegistry();
AccumulatorRegistry.Reporter reporter = registry.getReadWriteReporter();
inputProcessor.setReporter(reporter);
}

protected void run() throws Exception {
while (running  inputProcessor.processInput(streamOperator));
}

protected void cleanup() throws Exception {
inputProcessor.cleanup();
}

protected void cancelTask() {
running = false;
}
```
Guaranteed cleanup of output buffer and input buffer resources (formerly 
missed when other exceptions where encountered).

Unified `StreamRecordWriter` and `RecordWriter` usage.

## Cleanup in the StreamSource

Fix mixup in instantiation of source contexts in the stream source task

Auto watermark generators correctly shut down their interval scheduler

## General

Improve use of generics, got rid of many raw types

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink stream_cleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1017.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1017


commit 68efed0a3b4184980de956bd57ba301569adac86
Author: Stephan Ewen se...@apache.org
Date:   2015-08-14T21:32:35Z

[FLINK-2462] [streaming] Major cleanup of operator structure for exception 
handling and code simplication

  - The exceptions are no longer logged by the operators themselves.
Operators perform only cleanup in reaction to exceptions.
Exceptions are reported only the the root Task object, which knows 
whether this is the first
failure-causing exception (root cause), or is a subsequent exception, 
or whether the task was
actually canceled already. In the later case, exceptions are ignored, 
because many
cancellations lead to meaningless exceptions.

  - more exception in signatures, less wrapping where not needed

  - Core resource acquisition/release logic is in one streaming task, 
reducing code duplication

  - Guaranteed cleanup of output buffer and input buffer resources 
(formerly missed when other exceptions where encountered)

  - Fix mixup in instantiation of source contexts in the stream source task

  - Auto watermark generators correctly shut down their interval scheduler

  - Improve use of generics, got rid of many raw types




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes 

[jira] [Commented] (FLINK-2462) Wrong exception reporting in streaming jobs

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697827#comment-14697827
 ] 

ASF GitHub Bot commented on FLINK-2462:
---

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/1017

[FLINK-2462] [streaming] Major cleanup of streaming task structure

This pull request addresses exception handling, code duplication, and 
missed resource cleanups in the streaming operators.

I mixed multiple issues in this pull request, which would have been better 
separated, but many were recognized in the rework, and it was tricky to pull 
the fixes apart.

**NOTE** I have not managed to adjust all tests, yet, but I wanted to open 
this early for feedback.

## Exception handling

The exceptions are no longer logged by the operators themselves. Operators 
perform only cleanup in reaction to exceptions.

Exceptions are reported only the the root Task object, which knows whether 
this is the first failure-causing exception (root cause), or is a subsequent 
exception, or whether the task was actually canceled already. In the later 
case, exceptions are ignored, because many cancellations lead to meaningless 
exceptions.

Added more exception in signatures, less exception wrapping where not needed

## Unified setup / teardown structure in streaming tasks

Core resource acquisition/release logic is in `StreamTask`, reducing code 
duplication.
Subtasks (e.g., `OneInputStreamTask`, `IterationTailStreamTask`) implement 
slim methods for certain parts of the life cycle. The `OneInputStreamTask` 
becomes as simple as this

```java
public void init() throws Exception {
TypeSerializerIN inSerializer = 
configuration.getTypeSerializerIn1(getUserCodeClassLoader());
InputGate[] inputGates = getEnvironment().getAllInputGates();
inputProcessor = new StreamInputProcessorIN(inputGates, inSerializer,
getCheckpointBarrierListener(), 
configuration.getCheckpointMode(),
getEnvironment().getIOManager(),
getExecutionConfig().areTimestampsEnabled());

// make sure that stream tasks report their I/O statistics
AccumulatorRegistry registry = 
getEnvironment().getAccumulatorRegistry();
AccumulatorRegistry.Reporter reporter = registry.getReadWriteReporter();
inputProcessor.setReporter(reporter);
}

protected void run() throws Exception {
while (running  inputProcessor.processInput(streamOperator));
}

protected void cleanup() throws Exception {
inputProcessor.cleanup();
}

protected void cancelTask() {
running = false;
}
```
Guaranteed cleanup of output buffer and input buffer resources (formerly 
missed when other exceptions where encountered).

Unified `StreamRecordWriter` and `RecordWriter` usage.

## Cleanup in the StreamSource

Fix mixup in instantiation of source contexts in the stream source task

Auto watermark generators correctly shut down their interval scheduler

## General

Improve use of generics, got rid of many raw types

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink stream_cleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1017.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1017


commit 68efed0a3b4184980de956bd57ba301569adac86
Author: Stephan Ewen se...@apache.org
Date:   2015-08-14T21:32:35Z

[FLINK-2462] [streaming] Major cleanup of operator structure for exception 
handling and code simplication

  - The exceptions are no longer logged by the operators themselves.
Operators perform only cleanup in reaction to exceptions.
Exceptions are reported only the the root Task object, which knows 
whether this is the first
failure-causing exception (root cause), or is a subsequent exception, 
or whether the task was
actually canceled already. In the later case, exceptions are ignored, 
because many
cancellations lead to meaningless exceptions.

  - more exception in signatures, less wrapping where not needed

  - Core resource acquisition/release logic is in one streaming task, 
reducing code duplication

  - Guaranteed cleanup of output buffer and input buffer resources 
(formerly missed when other exceptions where encountered)

  - Fix mixup in instantiation of source contexts in the stream source task

  - Auto watermark generators correctly shut down their interval 

[GitHub] flink pull request: [FLINK-2486]Remove unwanted null check in remo...

2015-08-14 Thread ffbin
Github user ffbin closed the pull request at:

https://github.com/apache/flink/pull/989


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2486]Remove unwanted null check in remo...

2015-08-14 Thread ffbin
Github user ffbin commented on the pull request:

https://github.com/apache/flink/pull/989#issuecomment-131276961
  
@StephanEwen Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2486) Remove unwanted null check in removeInstance function

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698042#comment-14698042
 ] 

ASF GitHub Bot commented on FLINK-2486:
---

Github user ffbin closed the pull request at:

https://github.com/apache/flink/pull/989


 Remove unwanted null check in removeInstance function
 -

 Key: FLINK-2486
 URL: https://issues.apache.org/jira/browse/FLINK-2486
 Project: Flink
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 0.8.1
Reporter: fangfengbin
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2516) Remove unwanted log.isInfoEnabled check

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698035#comment-14698035
 ] 

ASF GitHub Bot commented on FLINK-2516:
---

Github user ffbin commented on the pull request:

https://github.com/apache/flink/pull/1012#issuecomment-131276733
  
@StephanEwen @chiwanpark  Thank you very much.


 Remove unwanted log.isInfoEnabled check
 ---

 Key: FLINK-2516
 URL: https://issues.apache.org/jira/browse/FLINK-2516
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.8.1
Reporter: fangfengbin
Assignee: fangfengbin
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2516) Remove unwanted log.isInfoEnabled check

2015-08-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698036#comment-14698036
 ] 

ASF GitHub Bot commented on FLINK-2516:
---

Github user ffbin closed the pull request at:

https://github.com/apache/flink/pull/1012


 Remove unwanted log.isInfoEnabled check
 ---

 Key: FLINK-2516
 URL: https://issues.apache.org/jira/browse/FLINK-2516
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.8.1
Reporter: fangfengbin
Assignee: fangfengbin
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2516]Remove unwanted log.isInfoEnabled ...

2015-08-14 Thread ffbin
Github user ffbin closed the pull request at:

https://github.com/apache/flink/pull/1012


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >