[jira] [Commented] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741325#comment-14741325
 ] 

ASF GitHub Bot commented on FLINK-2373:
---

Github user akunft commented on the pull request:

https://github.com/apache/flink/pull/1066#issuecomment-139622654
  
* rebased
* replaced collection output with ```collect()``` in 
RemoteEnvironmentITCase and ExecutionEnvironmentITCase
* added ```createRemoteEnvironment``` method with config parameter in Scala 
ExecutionEnvironment


> Add configuration parameter to createRemoteEnvironment method
> -
>
> Key: FLINK-2373
> URL: https://issues.apache.org/jira/browse/FLINK-2373
> Project: Flink
>  Issue Type: Bug
>  Components: other
>Reporter: Andreas Kunft
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently there is no way to provide a custom configuration upon creation of 
> a remote environment (via ExecutionEnvironment.createRemoteEnvironment(...)).
> This leads to errors when the submitted job exceeds the default value for the 
> max. payload size in Akka, as we can not increase the configuration value 
> (akka.remote.OversizedPayloadException: Discarding oversized payload...)
> Providing an overloaded method with a configuration parameter for the remote 
> environment fixes that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2373] Add configuration parameter to cr...

2015-09-11 Thread akunft
Github user akunft commented on the pull request:

https://github.com/apache/flink/pull/1066#issuecomment-139622654
  
* rebased
* replaced collection output with ```collect()``` in 
RemoteEnvironmentITCase and ExecutionEnvironmentITCase
* added ```createRemoteEnvironment``` method with config parameter in Scala 
ExecutionEnvironment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2566) FlinkTopologyContext no populated completely

2015-09-11 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741223#comment-14741223
 ] 

Matthias J. Sax commented on FLINK-2566:


The PR for this feature is almost ready. It depend on FLINK-2525. Thus, I need 
to wait until this is merged, so I can rebase and finish the code.

> FlinkTopologyContext no populated completely
> 
>
> Key: FLINK-2566
> URL: https://issues.apache.org/jira/browse/FLINK-2566
> Project: Flink
>  Issue Type: Improvement
>  Components: Storm Compatibility
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Minor
>
> Currently FlinkTopologyContext is not populated completely. It only contains 
> enough information to make WordCount example work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [hotfix][streaming]Improve throwing exception ...

2015-09-11 Thread HuangWHWHW
GitHub user HuangWHWHW opened a pull request:

https://github.com/apache/flink/pull/1119

[hotfix][streaming]Improve throwing exception for WriteSinkFunction.java

When call the `cleanFile()` method in WriteSinkFunction, it will only print 
"File not found" when error occurred.
And the FileNotFoundException is not only the "File not found" but also 
others(.e.g "Permission denied").
So I changed the print path to print exception.

For example, when a "Permission denied" happened, here is the exception 
information following before change:
`java.lang.RuntimeException: File not found /result.txt
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:101)
at java.io.PrintWriter.(PrintWriter.java:184)
at 
org.apache.flink.streaming.api.functions.sink.WriteSinkFunction.cleanFile(WriteSinkFunction.java:55)
at 
org.apache.flink.streaming.api.functions.sink.WriteSinkFunction.(WriteSinkFunction.java:43)
at 
org.apache.flink.streaming.api.functions.sink.WriteSinkFunctionTest$2.(WriteSinkFunctionTest.java:22)
at 
org.apache.flink.streaming.api.functions.sink.WriteSinkFunctionTest.testFileNotFound(WriteSinkFunctionTest.java:22)`

And here is the info after change:
`java.lang.RuntimeException: An error occured while cleaning the 
file:java.io.FileNotFoundException: /result.txt (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:101)
at java.io.PrintWriter.(PrintWriter.java:184)
at 
org.apache.flink.streaming.api.functions.sink.WriteSinkFunction.cleanFile(WriteSinkFunction.java:55)
at 
org.apache.flink.streaming.api.functions.sink.WriteSinkFunction.(WriteSinkFunction.java:43)
at 
org.apache.flink.streaming.api.functions.sink.WriteSinkFunctionTest$2.(WriteSinkFunctionTest.java:22)
at 
org.apache.flink.streaming.api.functions.sink.WriteSinkFunctionTest.testFileNotFound(WriteSinkFunctionTest.java:22)`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HuangWHWHW/flink FLINK-2480-9-11-new

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1119


commit 22a1b10ad6bf04a3c25056a1aec8930373927704
Author: HuangWHWHW <404823...@qq.com>
Date:   2015-09-11T08:28:48Z

[hotfix][streaming]Improve throwing exception for WriteSinkFunction.java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2373] Add configuration parameter to cr...

2015-09-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1066#discussion_r39253434
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/RemoteEnvironmentITCase.java
 ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.test.javaApiOperators;
+
+import org.apache.flink.api.common.functions.RichMapPartitionFunction;
+import org.apache.flink.api.common.io.GenericInputFormat;
+import org.apache.flink.api.common.operators.util.TestNonRichInputFormat;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.io.LocalCollectionOutputFormat;
+import org.apache.flink.client.program.ProgramInvocationException;
+import org.apache.flink.configuration.ConfigConstants;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.io.GenericInputSplit;
+import org.apache.flink.test.util.ForkableFlinkMiniCluster;
+import org.apache.flink.util.Collector;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+@SuppressWarnings("serial")
+public class RemoteEnvironmentITCase {
+
+   private static final int TM_SLOTS = 4;
+
+   private static final int NUM_TM = 1;
+
+   private static final int USER_DOP = 2;
+
+   private static final String INVALID_STARTUP_TIMEOUT = "0.001 ms";
+
+   private static final String VALID_STARTUP_TIMEOUT = "100 s";
+
+   private static ForkableFlinkMiniCluster cluster;
+
+   @BeforeClass
+   public static void setupCluster() {
+   try {
+   Configuration config = new Configuration();
+   
config.setInteger(ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, NUM_TM);
+   
config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, TM_SLOTS);
+   cluster = new ForkableFlinkMiniCluster(config, false);
+   cluster.start();
+   }
+   catch (Exception e) {
+   e.printStackTrace();
+   fail("Error starting test cluster: " + e.getMessage());
+   }
+   }
+
+   @AfterClass
+   public static void tearDownCluster() {
+   try {
+   cluster.stop();
+   }
+   catch (Throwable t) {
+   t.printStackTrace();
+   fail("Cluster shutdown caused an exception: " + 
t.getMessage());
+   }
+   }
+
+   /**
+* Ensure that that Akka configuration parameters can be set.
+*/
+   @Test(expected=IllegalArgumentException.class)
+   public void testInvalidAkkaConfiguration() throws Throwable {
+   Configuration config = new Configuration();
+   config.setString(ConfigConstants.AKKA_STARTUP_TIMEOUT, 
INVALID_STARTUP_TIMEOUT);
+
+   final ExecutionEnvironment env = 
ExecutionEnvironment.createRemoteEnvironment(
+   cluster.hostname(),
+   cluster.getLeaderRPCPort(),
+   config
+   );
+   env.getConfig().disableSysoutLogging();
+
+   DataSet result = env.createInput(new 
TestNonRichInputFormat());
+   result.output(new LocalCollectionOutputFormat(new 
ArrayList()));
+   try {
+   env.execute();
+   Assert.fail("Program should not run successfully, cause 
of invalid akka settings.");
+   } catch (ProgramInvocationException ex) {
+   throw ex.getCause();
+   }
+   }
+
+   /**

[jira] [Commented] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740450#comment-14740450
 ] 

ASF GitHub Bot commented on FLINK-2373:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1066#discussion_r39253434
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/RemoteEnvironmentITCase.java
 ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.test.javaApiOperators;
+
+import org.apache.flink.api.common.functions.RichMapPartitionFunction;
+import org.apache.flink.api.common.io.GenericInputFormat;
+import org.apache.flink.api.common.operators.util.TestNonRichInputFormat;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.io.LocalCollectionOutputFormat;
+import org.apache.flink.client.program.ProgramInvocationException;
+import org.apache.flink.configuration.ConfigConstants;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.io.GenericInputSplit;
+import org.apache.flink.test.util.ForkableFlinkMiniCluster;
+import org.apache.flink.util.Collector;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+@SuppressWarnings("serial")
+public class RemoteEnvironmentITCase {
+
+   private static final int TM_SLOTS = 4;
+
+   private static final int NUM_TM = 1;
+
+   private static final int USER_DOP = 2;
+
+   private static final String INVALID_STARTUP_TIMEOUT = "0.001 ms";
+
+   private static final String VALID_STARTUP_TIMEOUT = "100 s";
+
+   private static ForkableFlinkMiniCluster cluster;
+
+   @BeforeClass
+   public static void setupCluster() {
+   try {
+   Configuration config = new Configuration();
+   
config.setInteger(ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, NUM_TM);
+   
config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, TM_SLOTS);
+   cluster = new ForkableFlinkMiniCluster(config, false);
+   cluster.start();
+   }
+   catch (Exception e) {
+   e.printStackTrace();
+   fail("Error starting test cluster: " + e.getMessage());
+   }
+   }
+
+   @AfterClass
+   public static void tearDownCluster() {
+   try {
+   cluster.stop();
+   }
+   catch (Throwable t) {
+   t.printStackTrace();
+   fail("Cluster shutdown caused an exception: " + 
t.getMessage());
+   }
+   }
+
+   /**
+* Ensure that that Akka configuration parameters can be set.
+*/
+   @Test(expected=IllegalArgumentException.class)
+   public void testInvalidAkkaConfiguration() throws Throwable {
+   Configuration config = new Configuration();
+   config.setString(ConfigConstants.AKKA_STARTUP_TIMEOUT, 
INVALID_STARTUP_TIMEOUT);
+
+   final ExecutionEnvironment env = 
ExecutionEnvironment.createRemoteEnvironment(
+   cluster.hostname(),
+   cluster.getLeaderRPCPort(),
+   config
+   );
+   env.getConfig().disableSysoutLogging();
+
+   DataSet result = env.createInput(new 
TestNonRichInputFormat());
+   result.output(new LocalCollectionOutputFormat(new 
ArrayList()));
+   try {
+   env.execute();
+   

[GitHub] flink pull request: [FLINK-2373] Add configuration parameter to cr...

2015-09-11 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1066#issuecomment-139494208
  
LGTM modulo one minor comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740456#comment-14740456
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139494655
  
The semantics of `NO_OVERWRITE` are correct and pretty clear, IMO. 
If we would change `NO_OVERWRITE` to even fail if there is no file, it's 
semantics would change to `NO_WRITE` and always fail. 


> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-1737: Kronecker product

2015-09-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1078#discussion_r39251962
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/DenseVector.scala
 ---
@@ -102,6 +102,38 @@ case class DenseVector(
 }
   }
 
+  /** Returns the outer product (a.k.a. Kronecker product) of `this`
+* with `other`. The result will given in 
[[org.apache.flink.ml.math.SparseMatrix]]
+* representation if `other` is sparse and as 
[[org.apache.flink.ml.math.DenseMatrix]] otherwise.
+*
+* @param other a Vector
+* @return the [[org.apache.flink.ml.math.Matrix]] which equals the 
outer product of `this`
+* with `other.`
+*/
+  override def outer(other: Vector): Matrix = {
+val numRows = size
+val numCols = other.size
+
+other match {
+  case SparseVector(size, indices, data_) =>
+val entries: Array[(Int, Int, Double)] = for {
--- End diff --

Why do you need an `Array` here. An `Iterable` should be enough for the 
method `SparseMatrix.fromCOO`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1737) Add statistical whitening transformation to machine learning library

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740414#comment-14740414
 ] 

ASF GitHub Bot commented on FLINK-1737:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1078#discussion_r39251962
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/DenseVector.scala
 ---
@@ -102,6 +102,38 @@ case class DenseVector(
 }
   }
 
+  /** Returns the outer product (a.k.a. Kronecker product) of `this`
+* with `other`. The result will given in 
[[org.apache.flink.ml.math.SparseMatrix]]
+* representation if `other` is sparse and as 
[[org.apache.flink.ml.math.DenseMatrix]] otherwise.
+*
+* @param other a Vector
+* @return the [[org.apache.flink.ml.math.Matrix]] which equals the 
outer product of `this`
+* with `other.`
+*/
+  override def outer(other: Vector): Matrix = {
+val numRows = size
+val numCols = other.size
+
+other match {
+  case SparseVector(size, indices, data_) =>
+val entries: Array[(Int, Int, Double)] = for {
--- End diff --

Why do you need an `Array` here. An `Iterable` should be enough for the 
method `SparseMatrix.fromCOO`.


> Add statistical whitening transformation to machine learning library
> 
>
> Key: FLINK-1737
> URL: https://issues.apache.org/jira/browse/FLINK-1737
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Pape
>  Labels: ML, Starter
>
> The statistical whitening transformation [1] is a preprocessing step for 
> different ML algorithms. It decorrelates the individual dimensions and sets 
> its variance to 1.
> Statistical whitening should be implemented as a {{Transfomer}}.
> Resources:
> [1] [http://en.wikipedia.org/wiki/Whitening_transformation]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1737) Add statistical whitening transformation to machine learning library

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740444#comment-14740444
 ] 

ASF GitHub Bot commented on FLINK-1737:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1078#issuecomment-139493422
  
Thank you very much @daniel-pape for you contribution. Looks really good. I 
had only some minor comments.


> Add statistical whitening transformation to machine learning library
> 
>
> Key: FLINK-1737
> URL: https://issues.apache.org/jira/browse/FLINK-1737
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Pape
>  Labels: ML, Starter
>
> The statistical whitening transformation [1] is a preprocessing step for 
> different ML algorithms. It decorrelates the individual dimensions and sets 
> its variance to 1.
> Statistical whitening should be implemented as a {{Transfomer}}.
> Resources:
> [1] [http://en.wikipedia.org/wiki/Whitening_transformation]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1737) Add statistical whitening transformation to machine learning library

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740443#comment-14740443
 ] 

ASF GitHub Bot commented on FLINK-1737:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1078#discussion_r39253106
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/DenseVector.scala
 ---
@@ -102,6 +102,38 @@ case class DenseVector(
 }
   }
 
+  /** Returns the outer product (a.k.a. Kronecker product) of `this`
+* with `other`. The result will given in 
[[org.apache.flink.ml.math.SparseMatrix]]
+* representation if `other` is sparse and as 
[[org.apache.flink.ml.math.DenseMatrix]] otherwise.
+*
+* @param other a Vector
+* @return the [[org.apache.flink.ml.math.Matrix]] which equals the 
outer product of `this`
+* with `other.`
+*/
+  override def outer(other: Vector): Matrix = {
+val numRows = size
+val numCols = other.size
+
+other match {
+  case SparseVector(size, indices, data_) =>
+val entries: Array[(Int, Int, Double)] = for {
+  i <- (0 until numRows).toArray
+  j <- indices
+  value = this(i) * other(j)
--- End diff --

It might make sense to work directly on the `data` array here, because 
every `other(j)` call entails a binary search operation. If you zip `data` with 
`indices`, then you should have all information necessary to access `this(i)` 
and to have the value for `other(j)`.


> Add statistical whitening transformation to machine learning library
> 
>
> Key: FLINK-1737
> URL: https://issues.apache.org/jira/browse/FLINK-1737
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Pape
>  Labels: ML, Starter
>
> The statistical whitening transformation [1] is a preprocessing step for 
> different ML algorithms. It decorrelates the individual dimensions and sets 
> its variance to 1.
> Statistical whitening should be implemented as a {{Transfomer}}.
> Resources:
> [1] [http://en.wikipedia.org/wiki/Whitening_transformation]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-1737: Kronecker product

2015-09-11 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1078#issuecomment-139493422
  
Thank you very much @daniel-pape for you contribution. Looks really good. I 
had only some minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740454#comment-14740454
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1098#discussion_r39253553
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-scala/src/test/java/org/apache/flink/streaming/scala/api/CsvOutputFormatITCase.java
 ---
@@ -36,6 +37,10 @@ protected void preSubmit() throws Exception {
@Override
protected void testProgram() throws Exception {
OutputFormatTestPrograms.wordCountToCsv(WordCountData.TEXT, 
resultPath);
+   OutputFormatTestPrograms.wordCountToCsv(WordCountData.TEXT, 
resultPath, 1);
--- End diff --

It is expected that the test fails if you set the overwrite mode to 
`NO_OVERWRITE` because you start multiple programs that all write to the same 
location.

I debugged the issue and found that the `ForkableFlinkMiniCluster` sets the 
overwrite mode to OVERWRITE. That explains the behavior. However, this also 
means that you cannot test the correctness of your methods by setting the 
overwrite mode to OVERWRITE because this is the default behavior.

I would change the `CsvOuputFormatITCase` to extend 
`StreamingMultipleProgramsTestBase` and run each program in a dedicated test 
method. 


> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740463#comment-14740463
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user HuangWHWHW commented on a diff in the pull request:

https://github.com/apache/flink/pull/1098#discussion_r39254440
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-scala/src/test/java/org/apache/flink/streaming/scala/api/CsvOutputFormatITCase.java
 ---
@@ -36,6 +37,10 @@ protected void preSubmit() throws Exception {
@Override
protected void testProgram() throws Exception {
OutputFormatTestPrograms.wordCountToCsv(WordCountData.TEXT, 
resultPath);
+   OutputFormatTestPrograms.wordCountToCsv(WordCountData.TEXT, 
resultPath, 1);
--- End diff --

If it is expected to fail, could you tell me how to pass the test?
Just I have no idea about this.


> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread HuangWHWHW
Github user HuangWHWHW commented on a diff in the pull request:

https://github.com/apache/flink/pull/1098#discussion_r39254440
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-scala/src/test/java/org/apache/flink/streaming/scala/api/CsvOutputFormatITCase.java
 ---
@@ -36,6 +37,10 @@ protected void preSubmit() throws Exception {
@Override
protected void testProgram() throws Exception {
OutputFormatTestPrograms.wordCountToCsv(WordCountData.TEXT, 
resultPath);
+   OutputFormatTestPrograms.wordCountToCsv(WordCountData.TEXT, 
resultPath, 1);
--- End diff --

If it is expected to fail, could you tell me how to pass the test?
Just I have no idea about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Reopened] (FLINK-2639) Building Flink for specific HDP versions fails

2015-09-11 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reopened FLINK-2639:
---

> Building Flink for specific HDP versions fails
> --
>
> Key: FLINK-2639
> URL: https://issues.apache.org/jira/browse/FLINK-2639
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.10, 0.9.2
>
>
> Building Flink for the latest Hadoop version in HDP 2.2 fails
> {{ mvn clean install -DskipTests -Dhadoop.version=2.6.0.2.2.6.0-2800 
> -Pvendor-repos}}
> fails with
> {code}
> [ERROR] Failed to execute goal on project flink-shaded-include-yarn-tests: 
> Could not resolve dependencies for project 
> org.apache.flink:flink-shaded-include-yarn-tests:jar:0.10-SNAPSHOT: The 
> following artifacts could not be resolved: 
> org.mortbay.jetty:jetty:jar:6.1.26.hwx, 
> org.mortbay.jetty:jetty-util:jar:6.1.26.hwx: Could not find artifact 
> org.mortbay.jetty:jetty:jar:6.1.26.hwx in cloudera-releases 
> (https://repository.cloudera.com/artifactory/cloudera-repos) -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2639) Building Flink for specific HDP versions fails

2015-09-11 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-2639:
--
Fix Version/s: 0.9.2

> Building Flink for specific HDP versions fails
> --
>
> Key: FLINK-2639
> URL: https://issues.apache.org/jira/browse/FLINK-2639
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.10, 0.9.2
>
>
> Building Flink for the latest Hadoop version in HDP 2.2 fails
> {{ mvn clean install -DskipTests -Dhadoop.version=2.6.0.2.2.6.0-2800 
> -Pvendor-repos}}
> fails with
> {code}
> [ERROR] Failed to execute goal on project flink-shaded-include-yarn-tests: 
> Could not resolve dependencies for project 
> org.apache.flink:flink-shaded-include-yarn-tests:jar:0.10-SNAPSHOT: The 
> following artifacts could not be resolved: 
> org.mortbay.jetty:jetty:jar:6.1.26.hwx, 
> org.mortbay.jetty:jetty-util:jar:6.1.26.hwx: Could not find artifact 
> org.mortbay.jetty:jetty:jar:6.1.26.hwx in cloudera-releases 
> (https://repository.cloudera.com/artifactory/cloudera-repos) -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2639) Building Flink for specific HDP versions fails

2015-09-11 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740508#comment-14740508
 ] 

Robert Metzger commented on FLINK-2639:
---

Pushed fix to "release-0.9" for 0.9.2: 
http://git-wip-us.apache.org/repos/asf/flink/commit/267376f4

> Building Flink for specific HDP versions fails
> --
>
> Key: FLINK-2639
> URL: https://issues.apache.org/jira/browse/FLINK-2639
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.10, 0.9.2
>
>
> Building Flink for the latest Hadoop version in HDP 2.2 fails
> {{ mvn clean install -DskipTests -Dhadoop.version=2.6.0.2.2.6.0-2800 
> -Pvendor-repos}}
> fails with
> {code}
> [ERROR] Failed to execute goal on project flink-shaded-include-yarn-tests: 
> Could not resolve dependencies for project 
> org.apache.flink:flink-shaded-include-yarn-tests:jar:0.10-SNAPSHOT: The 
> following artifacts could not be resolved: 
> org.mortbay.jetty:jetty:jar:6.1.26.hwx, 
> org.mortbay.jetty:jetty-util:jar:6.1.26.hwx: Could not find artifact 
> org.mortbay.jetty:jetty:jar:6.1.26.hwx in cloudera-releases 
> (https://repository.cloudera.com/artifactory/cloudera-repos) -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2639) Building Flink for specific HDP versions fails

2015-09-11 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-2639.
-
Resolution: Fixed

> Building Flink for specific HDP versions fails
> --
>
> Key: FLINK-2639
> URL: https://issues.apache.org/jira/browse/FLINK-2639
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 0.10, 0.9.2
>
>
> Building Flink for the latest Hadoop version in HDP 2.2 fails
> {{ mvn clean install -DskipTests -Dhadoop.version=2.6.0.2.2.6.0-2800 
> -Pvendor-repos}}
> fails with
> {code}
> [ERROR] Failed to execute goal on project flink-shaded-include-yarn-tests: 
> Could not resolve dependencies for project 
> org.apache.flink:flink-shaded-include-yarn-tests:jar:0.10-SNAPSHOT: The 
> following artifacts could not be resolved: 
> org.mortbay.jetty:jetty:jar:6.1.26.hwx, 
> org.mortbay.jetty:jetty-util:jar:6.1.26.hwx: Could not find artifact 
> org.mortbay.jetty:jetty:jar:6.1.26.hwx in cloudera-releases 
> (https://repository.cloudera.com/artifactory/cloudera-repos) -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2641) Integrate the off-heap memory configuration with the TaskManager start script

2015-09-11 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740333#comment-14740333
 ] 

Ufuk Celebi commented on FLINK-2641:


Sounds good to me :) I think {{taskmanager.jvm.memory}} is better than the 
current {{mb}}.

I would like to just use {{task manager.memory.fraction}}.

Another thought: Instead of {{taskmanager.jvm.memory}} we could also go for 
{{taskmanager.memory.jvm}} and have all memory settings under the memory node 
(if you think about the config as a tree). 

> Integrate the off-heap memory configuration with the TaskManager start script
> -
>
> Key: FLINK-2641
> URL: https://issues.apache.org/jira/browse/FLINK-2641
> Project: Flink
>  Issue Type: New Feature
>  Components: Start-Stop Scripts
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Maximilian Michels
> Fix For: 0.10
>
>
> The TaskManager start script needs to adjust the {{-Xmx}}, {{-Xms}}, and 
> {{-XX:MaxDirectMemorySize}} parameters according to the off-heap memory 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2655] Minimize intermediate merging of ...

2015-09-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1118#discussion_r39250889
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/UnilateralSortMerger.java
 ---
@@ -1525,34 +1525,41 @@ protected final void disposeSortBuffers(boolean 
releaseMemory) {
final List 
allReadBuffers, final List writeBuffers)
throws IOException
{
-   final double numMerges = Math.ceil(channelIDs.size() / 
((double) this.maxFanIn));
-   final int channelsToMergePerStep = (int) 
Math.ceil(channelIDs.size() / numMerges);
-   
+   // A channel list with length maxFanIni can 
be merged to maxFanIn files in i-1 rounds where every merge
+   // is a full merge with maxFanIn input channels. A 
partial round includes merges with fewer than maxFanIn
+   // inputs. It is most efficient to perform the partial 
round first.
+   final double scale = 
Math.ceil(Math.log(channelIDs.size()) / Math.log(this.maxFanIn)) - 1;
+
+   final int numStart = channelIDs.size();
+   final int numEnd = (int) Math.pow(this.maxFanIn, scale);
+
+   final int numMerges = (int) Math.ceil((numStart - 
numEnd) / (double) (this.maxFanIn - 1));
--- End diff --

Yes, `maxFanIn` is a configuration parameter and must be > 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2655) Minimize intermediate merging of spilled buffers

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740392#comment-14740392
 ] 

ASF GitHub Bot commented on FLINK-2655:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1118#discussion_r39250889
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/UnilateralSortMerger.java
 ---
@@ -1525,34 +1525,41 @@ protected final void disposeSortBuffers(boolean 
releaseMemory) {
final List 
allReadBuffers, final List writeBuffers)
throws IOException
{
-   final double numMerges = Math.ceil(channelIDs.size() / 
((double) this.maxFanIn));
-   final int channelsToMergePerStep = (int) 
Math.ceil(channelIDs.size() / numMerges);
-   
+   // A channel list with length maxFanIni can 
be merged to maxFanIn files in i-1 rounds where every merge
+   // is a full merge with maxFanIn input channels. A 
partial round includes merges with fewer than maxFanIn
+   // inputs. It is most efficient to perform the partial 
round first.
+   final double scale = 
Math.ceil(Math.log(channelIDs.size()) / Math.log(this.maxFanIn)) - 1;
+
+   final int numStart = channelIDs.size();
+   final int numEnd = (int) Math.pow(this.maxFanIn, scale);
+
+   final int numMerges = (int) Math.ceil((numStart - 
numEnd) / (double) (this.maxFanIn - 1));
--- End diff --

Yes, `maxFanIn` is a configuration parameter and must be > 1.


> Minimize intermediate merging of spilled buffers
> 
>
> Key: FLINK-2655
> URL: https://issues.apache.org/jira/browse/FLINK-2655
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Affects Versions: master
>Reporter: Greg Hogan
>
> If the number of spilled buffers exceeds taskmanager.runtime.max-fan then the 
> number of files must reduced with an intermediate merge by reading, merging, 
> and spilling into a single, larger file.
> The current implementation performs an intermediate merge on all files. An 
> optimal implementation minimizes the amount of merged data by performing 
> partial merges first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2641) Integrate the off-heap memory configuration with the TaskManager start script

2015-09-11 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740352#comment-14740352
 ] 

Fabian Hueske edited comment on FLINK-2641 at 9/11/15 8:31 AM:
---

+1 for having a consistent hierarchy in the config parameters.
But {{taskmanager.memory.jvm}} does not make a lot of sense to me from a 
hierarchy point of view.


was (Author: fhueske):
+1 for having a consistent hierarchy in the config parameters.
But {{taskmanager.memory.jvm}} does not make a lot of sense to me from a 
hierarchy point of view. How about {{taskmanager.memory.size}} or 
{{taskmanager.memory.mb}}?

> Integrate the off-heap memory configuration with the TaskManager start script
> -
>
> Key: FLINK-2641
> URL: https://issues.apache.org/jira/browse/FLINK-2641
> Project: Flink
>  Issue Type: New Feature
>  Components: Start-Stop Scripts
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Maximilian Michels
> Fix For: 0.10
>
>
> The TaskManager start script needs to adjust the {{-Xmx}}, {{-Xms}}, and 
> {{-XX:MaxDirectMemorySize}} parameters according to the off-heap memory 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2656] Fix behavior of FlinkKafkaConsume...

2015-09-11 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1117#discussion_r39251890
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/LegacyFetcher.java
 ---
@@ -356,29 +357,24 @@ public void run() {
// make sure that all partitions have some 
offsets to start with
// those partitions that do not have an offset 
from a checkpoint need to get
// their start offset from ZooKeeper
-   
-   List partitionsToGetOffsetsFor 
= new ArrayList<>();
+   {
--- End diff --

I'm opening a new scope for the `partitionsToGetOffsetsFor` variable.
I'm using another list with the same name later in the code. WIth a new 
scope for the operation here, I can use the same variable name later again.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Getter for wrapped StreamExecutionEnvironment ...

2015-09-11 Thread lofifnc
Github user lofifnc closed the pull request at:

https://github.com/apache/flink/pull/1061


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2617) ConcurrentModificationException when using HCatRecordReader to access a hive table

2015-09-11 Thread Arnaud Linz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740440#comment-14740440
 ] 

Arnaud Linz commented on FLINK-2617:


Well, I've searched apache snapshot repository for 0.9-SNAPSHOT and I've looked 
at http://stratosphere-bin.s3.amazonaws.com/flink-0.9-SNAPSHOT-bin-hadoop2.tgz. 
I was trying to avoid building it from the sources until I can get the time to 
configure my env.

> ConcurrentModificationException when using HCatRecordReader to access a hive 
> table
> --
>
> Key: FLINK-2617
> URL: https://issues.apache.org/jira/browse/FLINK-2617
> Project: Flink
>  Issue Type: Bug
>  Components: Hadoop Compatibility
>Reporter: Arnaud Linz
>Assignee: Fabian Hueske
>Priority: Critical
> Fix For: 0.9, 0.10
>
>
> I don't know if it's a hcat or a flink problem, but when reading a hive table 
> in a cluster with many slots (20 threads per container), I systematically run 
> into a {{ConcurrentModificationException}} in a copy method of a 
> {{Configuration}} object that change during the copy.
> From what I understand, this object comes from 
> {{TaskAttemptContext.getConfiguration()}} created by 
> {{HadoopUtils.instantiateTaskAttemptContext(configuration, new 
> TaskAttemptID());}} 
> Maybe the {{job.Configuration}} object passed to the constructor of 
> {{HadoopInputFormatBase}} should be cloned somewhere?
> Stack trace is :
> {code}
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed.
> org.apache.flink.client.program.Client.run(Client.java:413)
> org.apache.flink.client.program.Client.run(Client.java:356)
> org.apache.flink.client.program.Client.run(Client.java:349)
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
> com.bouygtel.kuberasdk.main.ApplicationBatch.execCluster(ApplicationBatch.java:73)
> com.bouygtel.kubera.main.segment.ApplicationGeoSegment.batchExec(ApplicationGeoSegment.java:69)
> com.bouygtel.kuberasdk.main.ApplicationBatch.exec(ApplicationBatch.java:50)
> com.bouygtel.kuberasdk.main.Application.mainMethod(Application.java:88)
> com.bouygtel.kubera.main.segment.MainGeoSmooth.main(MainGeoSmooth.java:44)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> org.apache.flink.client.program.Client.run(Client.java:315)
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:582)
> org.apache.flink.client.CliFrontend.run(CliFrontend.java:288)
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:878)
> org.apache.flink.client.CliFrontend.main(CliFrontend.java:920)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:100)
> scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
> akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> akka.actor.ActorCell.invoke(ActorCell.scala:487)
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> akka.dispatch.Mailbox.run(Mailbox.scala:221)
> akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 

[jira] [Commented] (FLINK-1737) Add statistical whitening transformation to machine learning library

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740439#comment-14740439
 ] 

ASF GitHub Bot commented on FLINK-1737:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1078#discussion_r39253000
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseVector.scala
 ---
@@ -85,6 +85,34 @@ case class SparseVector(
 }
   }
 
+  /** Returns the outer product (a.k.a. Kronecker product) of `this`
+* with `other`. The result is given in 
[[org.apache.flink.ml.math.SparseMatrix]]
+* representation.
+*
+* @param other a Vector
+* @return the [[org.apache.flink.ml.math.SparseMatrix]] which equals 
the outer product of `this`
+* with `other.`
+*/
+  override def outer(other: Vector): SparseMatrix = {
+val numRows = size
+val numCols = other.size
+
+val otherIndices = other match {
+  case sv @ SparseVector(_, _, _) => sv.indices
+  case dv @ DenseVector(_) => (0 until dv.size).toArray
+}
+
+val entries = for {
+  i <- indices
+  j <- otherIndices
+  value = this(i) * other(j)
--- End diff --

It might make sense to directly operate on the `SparseVector's` data array 
because every `apply` call entails a binary search and, thus, having a 
complexity of `O(log n)`. The same holds true for the `other` vector if it is a 
`SparseVector`.


> Add statistical whitening transformation to machine learning library
> 
>
> Key: FLINK-1737
> URL: https://issues.apache.org/jira/browse/FLINK-1737
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Pape
>  Labels: ML, Starter
>
> The statistical whitening transformation [1] is a preprocessing step for 
> different ML algorithms. It decorrelates the individual dimensions and sets 
> its variance to 1.
> Statistical whitening should be implemented as a {{Transfomer}}.
> Resources:
> [1] [http://en.wikipedia.org/wiki/Whitening_transformation]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-1737: Kronecker product

2015-09-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1078#discussion_r39253000
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseVector.scala
 ---
@@ -85,6 +85,34 @@ case class SparseVector(
 }
   }
 
+  /** Returns the outer product (a.k.a. Kronecker product) of `this`
+* with `other`. The result is given in 
[[org.apache.flink.ml.math.SparseMatrix]]
+* representation.
+*
+* @param other a Vector
+* @return the [[org.apache.flink.ml.math.SparseMatrix]] which equals 
the outer product of `this`
+* with `other.`
+*/
+  override def outer(other: Vector): SparseMatrix = {
+val numRows = size
+val numCols = other.size
+
+val otherIndices = other match {
+  case sv @ SparseVector(_, _, _) => sv.indices
+  case dv @ DenseVector(_) => (0 until dv.size).toArray
+}
+
+val entries = for {
+  i <- indices
+  j <- otherIndices
+  value = this(i) * other(j)
--- End diff --

It might make sense to directly operate on the `SparseVector's` data array 
because every `apply` call entails a binary search and, thus, having a 
complexity of `O(log n)`. The same holds true for the `other` vector if it is a 
`SparseVector`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-1737: Kronecker product

2015-09-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1078#discussion_r39253106
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/DenseVector.scala
 ---
@@ -102,6 +102,38 @@ case class DenseVector(
 }
   }
 
+  /** Returns the outer product (a.k.a. Kronecker product) of `this`
+* with `other`. The result will given in 
[[org.apache.flink.ml.math.SparseMatrix]]
+* representation if `other` is sparse and as 
[[org.apache.flink.ml.math.DenseMatrix]] otherwise.
+*
+* @param other a Vector
+* @return the [[org.apache.flink.ml.math.Matrix]] which equals the 
outer product of `this`
+* with `other.`
+*/
+  override def outer(other: Vector): Matrix = {
+val numRows = size
+val numCols = other.size
+
+other match {
+  case SparseVector(size, indices, data_) =>
+val entries: Array[(Int, Int, Double)] = for {
+  i <- (0 until numRows).toArray
+  j <- indices
+  value = this(i) * other(j)
--- End diff --

It might make sense to work directly on the `data` array here, because 
every `other(j)` call entails a binary search operation. If you zip `data` with 
`indices`, then you should have all information necessary to access `this(i)` 
and to have the value for `other(j)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139494655
  
The semantics of `NO_OVERWRITE` are correct and pretty clear, IMO. 
If we would change `NO_OVERWRITE` to even fail if there is no file, it's 
semantics would change to `NO_WRITE` and always fail. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1725]- New Partitioner for better load ...

2015-09-11 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1069#issuecomment-13947
  
Will do, once I've cleared the rest of my backlog.

On Thu, Sep 10, 2015 at 4:26 PM, Fabian Hueske 
wrote:

> @tillrohrmann , @mbalassi
>  can you have another look at this PR and
> check if it is good to merge? Thanks!
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1725) New Partitioner for better load balancing for skewed data

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740376#comment-14740376
 ] 

ASF GitHub Bot commented on FLINK-1725:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1069#issuecomment-13947
  
Will do, once I've cleared the rest of my backlog.

On Thu, Sep 10, 2015 at 4:26 PM, Fabian Hueske 
wrote:

> @tillrohrmann , @mbalassi
>  can you have another look at this PR and
> check if it is good to merge? Thanks!
>
> —
> Reply to this email directly or view it on GitHub
> .
>



> New Partitioner for better load balancing for skewed data
> -
>
> Key: FLINK-1725
> URL: https://issues.apache.org/jira/browse/FLINK-1725
> Project: Flink
>  Issue Type: Improvement
>  Components: New Components
>Affects Versions: 0.8.1
>Reporter: Anis Nasir
>Assignee: Anis Nasir
>  Labels: LoadBalancing, Partitioner
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Hi,
> We have recently studied the problem of load balancing in Storm [1].
> In particular, we focused on key distribution of the stream for skewed data.
> We developed a new stream partitioning scheme (which we call Partial Key 
> Grouping). It achieves better load balancing than key grouping while being 
> more scalable than shuffle grouping in terms of memory.
> In the paper we show a number of mining algorithms that are easy to implement 
> with partial key grouping, and whose performance can benefit from it. We 
> think that it might also be useful for a larger class of algorithms.
> Partial key grouping is very easy to implement: it requires just a few lines 
> of code in Java when implemented as a custom grouping in Storm [2].
> For all these reasons, we believe it will be a nice addition to the standard 
> Partitioners available in Flink. If the community thinks it's a good idea, we 
> will be happy to offer support in the porting.
> References:
> [1]. 
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> [2]. https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2617) ConcurrentModificationException when using HCatRecordReader to access a hive table

2015-09-11 Thread Arnaud Linz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740386#comment-14740386
 ] 

Arnaud Linz commented on FLINK-2617:


Hi, today (11-09)  I've tested the 0.10 nightly build with success, thank you 
very much.
However, I did not find the patch in the 0.9 snapshot ; is it normal?

> ConcurrentModificationException when using HCatRecordReader to access a hive 
> table
> --
>
> Key: FLINK-2617
> URL: https://issues.apache.org/jira/browse/FLINK-2617
> Project: Flink
>  Issue Type: Bug
>  Components: Hadoop Compatibility
>Reporter: Arnaud Linz
>Assignee: Fabian Hueske
>Priority: Critical
> Fix For: 0.9, 0.10
>
>
> I don't know if it's a hcat or a flink problem, but when reading a hive table 
> in a cluster with many slots (20 threads per container), I systematically run 
> into a {{ConcurrentModificationException}} in a copy method of a 
> {{Configuration}} object that change during the copy.
> From what I understand, this object comes from 
> {{TaskAttemptContext.getConfiguration()}} created by 
> {{HadoopUtils.instantiateTaskAttemptContext(configuration, new 
> TaskAttemptID());}} 
> Maybe the {{job.Configuration}} object passed to the constructor of 
> {{HadoopInputFormatBase}} should be cloned somewhere?
> Stack trace is :
> {code}
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed.
> org.apache.flink.client.program.Client.run(Client.java:413)
> org.apache.flink.client.program.Client.run(Client.java:356)
> org.apache.flink.client.program.Client.run(Client.java:349)
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
> com.bouygtel.kuberasdk.main.ApplicationBatch.execCluster(ApplicationBatch.java:73)
> com.bouygtel.kubera.main.segment.ApplicationGeoSegment.batchExec(ApplicationGeoSegment.java:69)
> com.bouygtel.kuberasdk.main.ApplicationBatch.exec(ApplicationBatch.java:50)
> com.bouygtel.kuberasdk.main.Application.mainMethod(Application.java:88)
> com.bouygtel.kubera.main.segment.MainGeoSmooth.main(MainGeoSmooth.java:44)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> org.apache.flink.client.program.Client.run(Client.java:315)
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:582)
> org.apache.flink.client.CliFrontend.run(CliFrontend.java:288)
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:878)
> org.apache.flink.client.CliFrontend.main(CliFrontend.java:920)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:100)
> scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
> akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> akka.actor.ActorCell.invoke(ActorCell.scala:487)
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> akka.dispatch.Mailbox.run(Mailbox.scala:221)
> akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by:  java.util.ConcurrentModificationException
> 

[jira] [Commented] (FLINK-2656) FlinkKafkaConsumer is failing with OutOfRangeException

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740424#comment-14740424
 ] 

ASF GitHub Bot commented on FLINK-2656:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1117#issuecomment-139491178
  
Sorry Henry. I've updated the description.


> FlinkKafkaConsumer is failing with OutOfRangeException
> --
>
> Key: FLINK-2656
> URL: https://issues.apache.org/jira/browse/FLINK-2656
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 0.10, 0.9.1
>Reporter: Robert Metzger
>Priority: Critical
>
> FlinkKafkaConsumer is failing with an OutOfRangeException. There is actually 
> a configuration parameter for the high level kafka consumer how to handle 
> these situations (the high level c) doesn't fail on that exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2655) Minimize intermediate merging of spilled buffers

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740428#comment-14740428
 ] 

ASF GitHub Bot commented on FLINK-2655:
---

Github user HuangWHWHW commented on a diff in the pull request:

https://github.com/apache/flink/pull/1118#discussion_r39252417
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/UnilateralSortMerger.java
 ---
@@ -1525,34 +1525,41 @@ protected final void disposeSortBuffers(boolean 
releaseMemory) {
final List 
allReadBuffers, final List writeBuffers)
throws IOException
{
-   final double numMerges = Math.ceil(channelIDs.size() / 
((double) this.maxFanIn));
-   final int channelsToMergePerStep = (int) 
Math.ceil(channelIDs.size() / numMerges);
-   
+   // A channel list with length maxFanIni can 
be merged to maxFanIn files in i-1 rounds where every merge
+   // is a full merge with maxFanIn input channels. A 
partial round includes merges with fewer than maxFanIn
+   // inputs. It is most efficient to perform the partial 
round first.
+   final double scale = 
Math.ceil(Math.log(channelIDs.size()) / Math.log(this.maxFanIn)) - 1;
+
+   final int numStart = channelIDs.size();
+   final int numEnd = (int) Math.pow(this.maxFanIn, scale);
+
+   final int numMerges = (int) Math.ceil((numStart - 
numEnd) / (double) (this.maxFanIn - 1));
--- End diff --

Ah, thanks for this info :)


> Minimize intermediate merging of spilled buffers
> 
>
> Key: FLINK-2655
> URL: https://issues.apache.org/jira/browse/FLINK-2655
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Affects Versions: master
>Reporter: Greg Hogan
>
> If the number of spilled buffers exceeds taskmanager.runtime.max-fan then the 
> number of files must reduced with an intermediate merge by reading, merging, 
> and spilling into a single, larger file.
> The current implementation performs an intermediate merge on all files. An 
> optimal implementation minimizes the amount of merged data by performing 
> partial merges first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2655] Minimize intermediate merging of ...

2015-09-11 Thread HuangWHWHW
Github user HuangWHWHW commented on a diff in the pull request:

https://github.com/apache/flink/pull/1118#discussion_r39252417
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/UnilateralSortMerger.java
 ---
@@ -1525,34 +1525,41 @@ protected final void disposeSortBuffers(boolean 
releaseMemory) {
final List 
allReadBuffers, final List writeBuffers)
throws IOException
{
-   final double numMerges = Math.ceil(channelIDs.size() / 
((double) this.maxFanIn));
-   final int channelsToMergePerStep = (int) 
Math.ceil(channelIDs.size() / numMerges);
-   
+   // A channel list with length maxFanIni can 
be merged to maxFanIn files in i-1 rounds where every merge
+   // is a full merge with maxFanIn input channels. A 
partial round includes merges with fewer than maxFanIn
+   // inputs. It is most efficient to perform the partial 
round first.
+   final double scale = 
Math.ceil(Math.log(channelIDs.size()) / Math.log(this.maxFanIn)) - 1;
+
+   final int numStart = channelIDs.size();
+   final int numEnd = (int) Math.pow(this.maxFanIn, scale);
+
+   final int numMerges = (int) Math.ceil((numStart - 
numEnd) / (double) (this.maxFanIn - 1));
--- End diff --

Ah, thanks for this info :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1737) Add statistical whitening transformation to machine learning library

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740427#comment-14740427
 ] 

ASF GitHub Bot commented on FLINK-1737:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1078#discussion_r39252383
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseVector.scala
 ---
@@ -85,6 +85,34 @@ case class SparseVector(
 }
   }
 
+  /** Returns the outer product (a.k.a. Kronecker product) of `this`
+* with `other`. The result is given in 
[[org.apache.flink.ml.math.SparseMatrix]]
+* representation.
+*
+* @param other a Vector
+* @return the [[org.apache.flink.ml.math.SparseMatrix]] which equals 
the outer product of `this`
+* with `other.`
+*/
+  override def outer(other: Vector): SparseMatrix = {
+val numRows = size
+val numCols = other.size
+
+val otherIndices = other match {
--- End diff --

it should be enough to write `otherIndices: Iterable[Int]` and then remove 
the `toArray` method from the `Range` object.


> Add statistical whitening transformation to machine learning library
> 
>
> Key: FLINK-1737
> URL: https://issues.apache.org/jira/browse/FLINK-1737
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Pape
>  Labels: ML, Starter
>
> The statistical whitening transformation [1] is a preprocessing step for 
> different ML algorithms. It decorrelates the individual dimensions and sets 
> its variance to 1.
> Statistical whitening should be implemented as a {{Transfomer}}.
> Resources:
> [1] [http://en.wikipedia.org/wiki/Whitening_transformation]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Getter for wrapped StreamExecutionEnvironment ...

2015-09-11 Thread lofifnc
GitHub user lofifnc reopened a pull request:

https://github.com/apache/flink/pull/1061

Getter for wrapped StreamExecutionEnvironment by scala api

Currently it is not possible to pass an ExecutionEnvironment defined with 
the scala api to a method written in java that works with the 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.
Added a getter to the scala api StreamExecutionEnvironment for java scala 
interoperability.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lofifnc/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1061.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1061


commit dda4ad1784e5600800e7d0f9a05383bba988
Author: Alexander Kolb 
Date:   2015-08-26T15:02:46Z

added getter for wrapped StreamExecutionEnvironment by scala api

commit c3532186d44f6a7f3c2086faf8ae5f47b82e354d
Author: Alexander Kolb 
Date:   2015-09-02T11:24:42Z

Merge branch 'master' of https://github.com/apache/flink

commit 9ad598e45708cf36c1356e176545f21109724c90
Author: Alexander Kolb 
Date:   2015-09-09T07:38:50Z

Merge https://github.com/apache/flink




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2617) ConcurrentModificationException when using HCatRecordReader to access a hive table

2015-09-11 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740401#comment-14740401
 ] 

Fabian Hueske commented on FLINK-2617:
--

Hi Arnaud,

Thanks for verifying the fix! Where did you look for the 0.9 patch? 
We are pushing fixes for minor version 0.9 to the 
[release-0.9|https://github.com/apache/flink/tree/release-0.9] branch and the 
patch for this issue is in.

> ConcurrentModificationException when using HCatRecordReader to access a hive 
> table
> --
>
> Key: FLINK-2617
> URL: https://issues.apache.org/jira/browse/FLINK-2617
> Project: Flink
>  Issue Type: Bug
>  Components: Hadoop Compatibility
>Reporter: Arnaud Linz
>Assignee: Fabian Hueske
>Priority: Critical
> Fix For: 0.9, 0.10
>
>
> I don't know if it's a hcat or a flink problem, but when reading a hive table 
> in a cluster with many slots (20 threads per container), I systematically run 
> into a {{ConcurrentModificationException}} in a copy method of a 
> {{Configuration}} object that change during the copy.
> From what I understand, this object comes from 
> {{TaskAttemptContext.getConfiguration()}} created by 
> {{HadoopUtils.instantiateTaskAttemptContext(configuration, new 
> TaskAttemptID());}} 
> Maybe the {{job.Configuration}} object passed to the constructor of 
> {{HadoopInputFormatBase}} should be cloned somewhere?
> Stack trace is :
> {code}
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed.
> org.apache.flink.client.program.Client.run(Client.java:413)
> org.apache.flink.client.program.Client.run(Client.java:356)
> org.apache.flink.client.program.Client.run(Client.java:349)
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
> com.bouygtel.kuberasdk.main.ApplicationBatch.execCluster(ApplicationBatch.java:73)
> com.bouygtel.kubera.main.segment.ApplicationGeoSegment.batchExec(ApplicationGeoSegment.java:69)
> com.bouygtel.kuberasdk.main.ApplicationBatch.exec(ApplicationBatch.java:50)
> com.bouygtel.kuberasdk.main.Application.mainMethod(Application.java:88)
> com.bouygtel.kubera.main.segment.MainGeoSmooth.main(MainGeoSmooth.java:44)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> org.apache.flink.client.program.Client.run(Client.java:315)
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:582)
> org.apache.flink.client.CliFrontend.run(CliFrontend.java:288)
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:878)
> org.apache.flink.client.CliFrontend.main(CliFrontend.java:920)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:100)
> scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
> akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> akka.actor.ActorCell.invoke(ActorCell.scala:487)
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> akka.dispatch.Mailbox.run(Mailbox.scala:221)
> akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by:  

[jira] [Commented] (FLINK-1934) Add approximative k-nearest-neighbours (kNN) algorithm to machine learning library

2015-09-11 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740399#comment-14740399
 ] 

Till Rohrmann commented on FLINK-1934:
--

Hi [~danielblazevski], great to hear that you want to contribute to Flink :-) I 
think [~raghav.chalapa...@gmail.com] is no longer working on this issue since 
he doesn't show activity anymore. If you want, I can assign the issue to you. 
What are your plans for the approximative kNN implementation. Are you gonna 
implement the H-zkNNJ?

> Add approximative k-nearest-neighbours (kNN) algorithm to machine learning 
> library
> --
>
> Key: FLINK-1934
> URL: https://issues.apache.org/jira/browse/FLINK-1934
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Raghav Chalapathy
>  Labels: ML
>
> kNN is still a widely used algorithm for classification and regression. 
> However, due to the computational costs of an exact implementation, it does 
> not scale well to large amounts of data. Therefore, it is worthwhile to also 
> add an approximative kNN implementation as proposed in [1,2].
> Resources:
> [1] https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf
> [2] http://www.computer.org/csdl/proceedings/wacv/2007/2794/00/27940028.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2656) FlinkKafkaConsumer is failing with OutOfRangeException

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740412#comment-14740412
 ] 

ASF GitHub Bot commented on FLINK-2656:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/1117#discussion_r39251890
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/LegacyFetcher.java
 ---
@@ -356,29 +357,24 @@ public void run() {
// make sure that all partitions have some 
offsets to start with
// those partitions that do not have an offset 
from a checkpoint need to get
// their start offset from ZooKeeper
-   
-   List partitionsToGetOffsetsFor 
= new ArrayList<>();
+   {
--- End diff --

I'm opening a new scope for the `partitionsToGetOffsetsFor` variable.
I'm using another list with the same name later in the code. WIth a new 
scope for the operation here, I can use the same variable name later again.



> FlinkKafkaConsumer is failing with OutOfRangeException
> --
>
> Key: FLINK-2656
> URL: https://issues.apache.org/jira/browse/FLINK-2656
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 0.10, 0.9.1
>Reporter: Robert Metzger
>Priority: Critical
>
> FlinkKafkaConsumer is failing with an OutOfRangeException. There is actually 
> a configuration parameter for the high level kafka consumer how to handle 
> these situations (the high level c) doesn't fail on that exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2656] Fix behavior of FlinkKafkaConsume...

2015-09-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1117#issuecomment-139491178
  
Sorry Henry. I've updated the description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-1737: Kronecker product

2015-09-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1078#discussion_r39252383
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/SparseVector.scala
 ---
@@ -85,6 +85,34 @@ case class SparseVector(
 }
   }
 
+  /** Returns the outer product (a.k.a. Kronecker product) of `this`
+* with `other`. The result is given in 
[[org.apache.flink.ml.math.SparseMatrix]]
+* representation.
+*
+* @param other a Vector
+* @return the [[org.apache.flink.ml.math.SparseMatrix]] which equals 
the outer product of `this`
+* with `other.`
+*/
+  override def outer(other: Vector): SparseMatrix = {
+val numRows = size
+val numCols = other.size
+
+val otherIndices = other match {
--- End diff --

it should be enough to write `otherIndices: Iterable[Int]` and then remove 
the `toArray` method from the `Range` object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740475#comment-14740475
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139499573
  
What's the purpose of adding `javaSet` to the Table API? 
This looks like a separate issue to me and should not be part of this PR if 
that is the case.


> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139499573
  
What's the purpose of adding `javaSet` to the Table API? 
This looks like a separate issue to me and should not be part of this PR if 
that is the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740473#comment-14740473
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139499368
  
There is no append. `OVERWRITE` simply overwrites existing files and 
`NO_OVERWRITE` fails if the output file exists.


> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139499368
  
There is no append. `OVERWRITE` simply overwrites existing files and 
`NO_OVERWRITE` fails if the output file exists.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740489#comment-14740489
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139500923
  
`What's the purpose of adding javaSet to the Table API? 
This looks like a separate issue to me and should not be part of this PR if 
that is the case.`

It is because of this following:

![image](https://cloud.githubusercontent.com/assets/13193847/9811803/11922a28-58ac-11e5-9f3f-4451f7db3b02.png)



> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread HuangWHWHW
Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139500923
  
`What's the purpose of adding javaSet to the Table API? 
This looks like a separate issue to me and should not be part of this PR if 
that is the case.`

It is because of this following:

![image](https://cloud.githubusercontent.com/assets/13193847/9811803/11922a28-58ac-11e5-9f3f-4451f7db3b02.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2617) ConcurrentModificationException when using HCatRecordReader to access a hive table

2015-09-11 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740495#comment-14740495
 ] 

Fabian Hueske commented on FLINK-2617:
--

I see. 
SNAPSHOT builds are only uploaded if all tests pass. The last build failed due 
to some flaky tests. 
I will push an empty commit to trigger another build. *fingersCrossed* that 
this build passes.

> ConcurrentModificationException when using HCatRecordReader to access a hive 
> table
> --
>
> Key: FLINK-2617
> URL: https://issues.apache.org/jira/browse/FLINK-2617
> Project: Flink
>  Issue Type: Bug
>  Components: Hadoop Compatibility
>Reporter: Arnaud Linz
>Assignee: Fabian Hueske
>Priority: Critical
> Fix For: 0.9, 0.10
>
>
> I don't know if it's a hcat or a flink problem, but when reading a hive table 
> in a cluster with many slots (20 threads per container), I systematically run 
> into a {{ConcurrentModificationException}} in a copy method of a 
> {{Configuration}} object that change during the copy.
> From what I understand, this object comes from 
> {{TaskAttemptContext.getConfiguration()}} created by 
> {{HadoopUtils.instantiateTaskAttemptContext(configuration, new 
> TaskAttemptID());}} 
> Maybe the {{job.Configuration}} object passed to the constructor of 
> {{HadoopInputFormatBase}} should be cloned somewhere?
> Stack trace is :
> {code}
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed.
> org.apache.flink.client.program.Client.run(Client.java:413)
> org.apache.flink.client.program.Client.run(Client.java:356)
> org.apache.flink.client.program.Client.run(Client.java:349)
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
> com.bouygtel.kuberasdk.main.ApplicationBatch.execCluster(ApplicationBatch.java:73)
> com.bouygtel.kubera.main.segment.ApplicationGeoSegment.batchExec(ApplicationGeoSegment.java:69)
> com.bouygtel.kuberasdk.main.ApplicationBatch.exec(ApplicationBatch.java:50)
> com.bouygtel.kuberasdk.main.Application.mainMethod(Application.java:88)
> com.bouygtel.kubera.main.segment.MainGeoSmooth.main(MainGeoSmooth.java:44)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> org.apache.flink.client.program.Client.run(Client.java:315)
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:582)
> org.apache.flink.client.CliFrontend.run(CliFrontend.java:288)
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:878)
> org.apache.flink.client.CliFrontend.main(CliFrontend.java:920)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:100)
> scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
> akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> akka.actor.ActorCell.invoke(ActorCell.scala:487)
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> akka.dispatch.Mailbox.run(Mailbox.scala:221)
> akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by:  

[jira] [Commented] (FLINK-2641) Integrate the off-heap memory configuration with the TaskManager start script

2015-09-11 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740352#comment-14740352
 ] 

Fabian Hueske commented on FLINK-2641:
--

+1 for having a consistent hierarchy in the config parameters.
But {{taskmanager.memory.jvm}} does not make a lot of sense to me from a 
hierarchy point of view. How about {{taskmanager.memory.size}} or 
{{taskmanager.memory.mb}}?

> Integrate the off-heap memory configuration with the TaskManager start script
> -
>
> Key: FLINK-2641
> URL: https://issues.apache.org/jira/browse/FLINK-2641
> Project: Flink
>  Issue Type: New Feature
>  Components: Start-Stop Scripts
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Maximilian Michels
> Fix For: 0.10
>
>
> The TaskManager start script needs to adjust the {{-Xmx}}, {{-Xms}}, and 
> {{-XX:MaxDirectMemorySize}} parameters according to the off-heap memory 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2015-09-11 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740404#comment-14740404
 ] 

Till Rohrmann commented on FLINK-1745:
--

The status of the exact kNN is that I have to review the PR. From a quick 
glance over the PR I think it's in a good shape and should soon be merged. We 
could use this implementation then to compare against the approximate kNN 
implementations. Concerning the R-trees, I think there is nothing planned yet. 
You could give it a try if you want to.

> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2641) Integrate the off-heap memory configuration with the TaskManager start script

2015-09-11 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740421#comment-14740421
 ] 

Maximilian Michels commented on FLINK-2641:
---

{quote}
Just a thought: Can we actually work without the 
taskmanager.memory.off-heap-ratio parameter? Simply use the 
taskmanager.memory.fraction as well for the off-heap case? We could multiply 
the specified heap size with the faction, take that as direct memory, and then 
take the remainder as the actual heap size.
{quote}

+1 That way, we keep the config lean.

{quote}
How about taskmanager.memory.size or taskmanager.memory.mb?
{quote}

:) {{taskmanager.memory.size}} is the memory size of the managed memory. 
{{taskmanager.memory.mb}} could be confused with the latter one as well. How 
about we rename that one to {{taskmanager.memory.managed.size}} and the 
fraction to {{taskmanager.memory.managed.fraction}}? That would also make sense 
from a hierarchical point of view. 

We could then name the combined memory {{taskmanager.memory.size}}. This may 
seem problematic because it breaks backwards-compatibility. We could solve that 
by detecting old configuration parameters and throwing an error. On the other 
hand, it's not that problematic because {{taskmanager.memory.size}} was not 
included in the configuration and probably few people know about it. 

---

Here's my proposal:

|| old || new || description ||
|{{taskmanager.memory.size}} | {{taskmanager.memory.managed.size}} | |

|{{taskmanager.memory.fraction}} | {{taskmanager.memory.managed.fraction}} | 
(and let it work independent of heap and off-heap memory) |

| {{taskmanager.heap.mb}} | {{taskmanager.memory.size}} | (and change it's 
meaning to combined JVM heap + off-heap memory) |

| {{jobmanager.heap.mb}} | {{jobmanager.memory.size}} | |

> Integrate the off-heap memory configuration with the TaskManager start script
> -
>
> Key: FLINK-2641
> URL: https://issues.apache.org/jira/browse/FLINK-2641
> Project: Flink
>  Issue Type: New Feature
>  Components: Start-Stop Scripts
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Maximilian Michels
> Fix For: 0.10
>
>
> The TaskManager start script needs to adjust the {{-Xmx}}, {{-Xms}}, and 
> {{-XX:MaxDirectMemorySize}} parameters according to the off-heap memory 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2597) Add a test for Avro-serialized Kafka messages

2015-09-11 Thread Vimal (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740436#comment-14740436
 ] 

Vimal commented on FLINK-2597:
--

hi,
is anyone working on this?

can you provide more details, give pointer to know more about this feature.

thanks

> Add a test for Avro-serialized Kafka messages 
> --
>
> Key: FLINK-2597
> URL: https://issues.apache.org/jira/browse/FLINK-2597
> Project: Flink
>  Issue Type: Bug
>  Components: Avro Support, Kafka Connector
>Affects Versions: 0.10
>Reporter: Robert Metzger
>Priority: Minor
>
> A user has asked for serializing Avro messages from Kafka.
> I think its a legitimate use-case that we should cover by a test case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2373) Add configuration parameter to createRemoteEnvironment method

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740452#comment-14740452
 ] 

ASF GitHub Bot commented on FLINK-2373:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1066#issuecomment-139494208
  
LGTM modulo one minor comment.


> Add configuration parameter to createRemoteEnvironment method
> -
>
> Key: FLINK-2373
> URL: https://issues.apache.org/jira/browse/FLINK-2373
> Project: Flink
>  Issue Type: Bug
>  Components: other
>Reporter: Andreas Kunft
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently there is no way to provide a custom configuration upon creation of 
> a remote environment (via ExecutionEnvironment.createRemoteEnvironment(...)).
> This leads to errors when the submitted job exceeds the default value for the 
> max. payload size in Akka, as we can not increase the configuration value 
> (akka.remote.OversizedPayloadException: Discarding oversized payload...)
> Providing an overloaded method with a configuration parameter for the remote 
> environment fixes that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1098#discussion_r39253553
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-scala/src/test/java/org/apache/flink/streaming/scala/api/CsvOutputFormatITCase.java
 ---
@@ -36,6 +37,10 @@ protected void preSubmit() throws Exception {
@Override
protected void testProgram() throws Exception {
OutputFormatTestPrograms.wordCountToCsv(WordCountData.TEXT, 
resultPath);
+   OutputFormatTestPrograms.wordCountToCsv(WordCountData.TEXT, 
resultPath, 1);
--- End diff --

It is expected that the test fails if you set the overwrite mode to 
`NO_OVERWRITE` because you start multiple programs that all write to the same 
location.

I debugged the issue and found that the `ForkableFlinkMiniCluster` sets the 
overwrite mode to OVERWRITE. That explains the behavior. However, this also 
means that you cannot test the correctness of your methods by setting the 
overwrite mode to OVERWRITE because this is the default behavior.

I would change the `CsvOuputFormatITCase` to extend 
`StreamingMultipleProgramsTestBase` and run each program in a dedicated test 
method. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740460#comment-14740460
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139496123
  
Doesn't `NO_OVERWRITE` mean "append"?


> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread HuangWHWHW
Github user HuangWHWHW commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139496123
  
Doesn't `NO_OVERWRITE` mean "append"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Getter for wrapped StreamExecutionEnvironment ...

2015-09-11 Thread lofifnc
GitHub user lofifnc opened a pull request:

https://github.com/apache/flink/pull/1120

Getter for wrapped StreamExecutionEnvironment by scala api

Currently it is not possible to pass an ExecutionEnvironment defined with 
the scala api to a method working with the 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.
Added a getter to the scala api StreamExecutionEnvironment for java scala 
interoperability.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lofifnc/flink getter-for-stream-environment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1120


commit 6d2f16a1ce34f720f4a26ce99deb6480e29afe80
Author: Alexander Kolb 
Date:   2015-09-11T09:23:27Z

added getter for wrapped environment

commit a11249c66b3995083a0c91af8ba4bd168fcf0ac5
Author: Alexander Kolb 
Date:   2015-09-11T09:31:48Z

renamed method




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2641) Integrate the off-heap memory configuration with the TaskManager start script

2015-09-11 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740500#comment-14740500
 ] 

Fabian Hueske commented on FLINK-2641:
--

+1 It will break the configuration, but I think that's acceptable for the 0.10 
release and we can detect the invalid configs.

> Integrate the off-heap memory configuration with the TaskManager start script
> -
>
> Key: FLINK-2641
> URL: https://issues.apache.org/jira/browse/FLINK-2641
> Project: Flink
>  Issue Type: New Feature
>  Components: Start-Stop Scripts
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Maximilian Michels
> Fix For: 0.10
>
>
> The TaskManager start script needs to adjust the {{-Xmx}}, {{-Xms}}, and 
> {{-XX:MaxDirectMemorySize}} parameters according to the off-heap memory 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2445) Add tests for HadoopOutputFormats

2015-09-11 Thread Martin Liesenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740601#comment-14740601
 ] 

Martin Liesenberg commented on FLINK-2445:
--

Cool. If no one else is working on this, I'd give it a shot. 

> Add tests for HadoopOutputFormats
> -
>
> Key: FLINK-2445
> URL: https://issues.apache.org/jira/browse/FLINK-2445
> Project: Flink
>  Issue Type: Test
>  Components: Hadoop Compatibility, Tests
>Affects Versions: 0.10, 0.9.1
>Reporter: Fabian Hueske
>  Labels: starter
>
> The HadoopOutputFormats and HadoopOutputFormatBase classes are not 
> sufficiently covered by unit tests.
> We need tests that ensure that the methods of the wrapped Hadoop 
> OutputFormats are correctly called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1934) Add approximative k-nearest-neighbours (kNN) algorithm to machine learning library

2015-09-11 Thread Daniel Blazevski (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740627#comment-14740627
 ] 

Daniel Blazevski commented on FLINK-1934:
-

Hi [~till.rohrmann], Flink seems like a neat project, looking forward to 
contributing!

I had indeed looked into H-zkNNJ in Ref [1].  As I had just mentioned in the 
exact kNN, I think it will be more wise for me to try an R-tree implementation 
of the exact kNN first for my first Flink contribution -- so in the meantime, 
how about we don't assign this to me quite yet. 

> Add approximative k-nearest-neighbours (kNN) algorithm to machine learning 
> library
> --
>
> Key: FLINK-1934
> URL: https://issues.apache.org/jira/browse/FLINK-1934
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Raghav Chalapathy
>  Labels: ML
>
> kNN is still a widely used algorithm for classification and regression. 
> However, due to the computational costs of an exact implementation, it does 
> not scale well to large amounts of data. Therefore, it is worthwhile to also 
> add an approximative kNN implementation as proposed in [1,2].
> Resources:
> [1] https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf
> [2] http://www.computer.org/csdl/proceedings/wacv/2007/2794/00/27940028.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2634] [gelly] Added a vertex-centric Tr...

2015-09-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1105#discussion_r39271273
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/TriangleCount.java
 ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.functions.FlatJoinFunction;
+import org.apache.flink.api.common.functions.GroupReduceFunction;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.functions.ReduceFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.NeighborsFunctionWithVertexValue;
+import org.apache.flink.graph.utils.VertexToTuple2Map;
+import org.apache.flink.types.NullValue;
+import org.apache.flink.util.Collector;
+
+import java.util.Iterator;
+import java.util.TreeMap;
+
+/**
+ * Triangle Count Algorithm.
+ *
+ * This algorithm operates in three phases. First, vertices select 
neighbors with id greater than theirs
+ * and send messages to them. Each received message is then propagated to 
neighbors with higher id.
+ * Finally, if a node encounters the target id in the list of received 
messages, it increments the number
+ * of triangles found.
+ *
+ * For skewed graphs, we recommend calling the GSATriangleCount library 
method as it uses the more restrictive
+ * `reduceOnNeighbors` function which internally makes use of combiners to 
speed up computation.
+ *
+ * This implementation is non - iterative.
+ *
+ * The algorithm takes an undirected, unweighted graph as input and 
outputs a DataSet of
+ * Tuple1 which contains a single integer representing the number of 
triangles.
+ */
+public class TriangleCount implements
+   GraphAlgorithm> {
--- End diff --

Why not return `Integer` instead of `Tuple1`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2634) Add a Vertex-centric Version of the Tringle Count Library Method

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740808#comment-14740808
 ] 

ASF GitHub Bot commented on FLINK-2634:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1105#discussion_r39271273
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/TriangleCount.java
 ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.functions.FlatJoinFunction;
+import org.apache.flink.api.common.functions.GroupReduceFunction;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.functions.ReduceFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.NeighborsFunctionWithVertexValue;
+import org.apache.flink.graph.utils.VertexToTuple2Map;
+import org.apache.flink.types.NullValue;
+import org.apache.flink.util.Collector;
+
+import java.util.Iterator;
+import java.util.TreeMap;
+
+/**
+ * Triangle Count Algorithm.
+ *
+ * This algorithm operates in three phases. First, vertices select 
neighbors with id greater than theirs
+ * and send messages to them. Each received message is then propagated to 
neighbors with higher id.
+ * Finally, if a node encounters the target id in the list of received 
messages, it increments the number
+ * of triangles found.
+ *
+ * For skewed graphs, we recommend calling the GSATriangleCount library 
method as it uses the more restrictive
+ * `reduceOnNeighbors` function which internally makes use of combiners to 
speed up computation.
+ *
+ * This implementation is non - iterative.
+ *
+ * The algorithm takes an undirected, unweighted graph as input and 
outputs a DataSet of
+ * Tuple1 which contains a single integer representing the number of 
triangles.
+ */
+public class TriangleCount implements
+   GraphAlgorithm> {
--- End diff --

Why not return `Integer` instead of `Tuple1`?


> Add a Vertex-centric Version of the Tringle Count Library Method
> 
>
> Key: FLINK-2634
> URL: https://issues.apache.org/jira/browse/FLINK-2634
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Minor
>
> The vertex-centric version of this algorithm receives an undirected graph as 
> input and outputs the total number of triangles formed by the graph's edges.
> The implementation consists of three phases:
> 1). Select neighbours with id greater than the current vertex id.
> 2). Propagate each received value to neighbours with higher id. 
> 3). Compute the number of Triangles by verifying if the final vertex contains 
> the sender's id in its list.
> As opposed to the GAS version, all these three steps will be performed via 
> message passing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2230) Add Support for Null-Values in TupleSerializer

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740803#comment-14740803
 ] 

ASF GitHub Bot commented on FLINK-2230:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-139547674
  
I agree with @StephanEwen that the TupleSerializer should not support null 
values. However, it might be good to offer a null-aware alternative.
Can we close this PR for now until we agreed on a solution for this issue?


> Add Support for Null-Values in TupleSerializer
> --
>
> Key: FLINK-2230
> URL: https://issues.apache.org/jira/browse/FLINK-2230
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Shiti Saxena
>Assignee: Shiti Saxena
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2634] [gelly] Added a vertex-centric Tr...

2015-09-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1105#discussion_r39271796
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/TriangleCount.java
 ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.functions.FlatJoinFunction;
+import org.apache.flink.api.common.functions.GroupReduceFunction;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.functions.ReduceFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.NeighborsFunctionWithVertexValue;
+import org.apache.flink.graph.utils.VertexToTuple2Map;
+import org.apache.flink.types.NullValue;
+import org.apache.flink.util.Collector;
+
+import java.util.Iterator;
+import java.util.TreeMap;
+
+/**
+ * Triangle Count Algorithm.
+ *
+ * This algorithm operates in three phases. First, vertices select 
neighbors with id greater than theirs
+ * and send messages to them. Each received message is then propagated to 
neighbors with higher id.
+ * Finally, if a node encounters the target id in the list of received 
messages, it increments the number
+ * of triangles found.
+ *
+ * For skewed graphs, we recommend calling the GSATriangleCount library 
method as it uses the more restrictive
+ * `reduceOnNeighbors` function which internally makes use of combiners to 
speed up computation.
+ *
+ * This implementation is non - iterative.
+ *
+ * The algorithm takes an undirected, unweighted graph as input and 
outputs a DataSet of
+ * Tuple1 which contains a single integer representing the number of 
triangles.
+ */
+public class TriangleCount implements
+   GraphAlgorithm> {
--- End diff --

Why is the algorithm typed to `Long` keys. Shouldn't it work with any key 
type?
Couldn't we also allow any vertex and edge value type and replace them 
internally with `NullValue`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2634) Add a Vertex-centric Version of the Tringle Count Library Method

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740840#comment-14740840
 ] 

ASF GitHub Bot commented on FLINK-2634:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1105#issuecomment-139553246
  
I did not go through the code in detail. Maybe I am missing something, but 
isn't this algorithm doing pretty much the same thing as the example we have 
already for the DataSet API examples (Java + Scala). In the existing examples 
(EnumTrianglesOpt) 3 reduce and 1 join to enumerate all triangles. We would 
need one more reduce to compute the number of triangles which makes 4 reduce 
and 1 join.
In your code you are using 5 reduce and 2 joins if I counted correctly. 

Are those two algorithms doing different things or could you basically port 
the existing code to Gelly?


> Add a Vertex-centric Version of the Tringle Count Library Method
> 
>
> Key: FLINK-2634
> URL: https://issues.apache.org/jira/browse/FLINK-2634
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Minor
>
> The vertex-centric version of this algorithm receives an undirected graph as 
> input and outputs the total number of triangles formed by the graph's edges.
> The implementation consists of three phases:
> 1). Select neighbours with id greater than the current vertex id.
> 2). Propagate each received value to neighbours with higher id. 
> 3). Compute the number of Triangles by verifying if the final vertex contains 
> the sender's id in its list.
> As opposed to the GAS version, all these three steps will be performed via 
> message passing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2659) Object reuse in UnionWithTempOperator

2015-09-11 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-2659:
-

 Summary: Object reuse in UnionWithTempOperator
 Key: FLINK-2659
 URL: https://issues.apache.org/jira/browse/FLINK-2659
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Greg Hogan


The first loop in UnionWithTempOperator.run() executes until null, then the 
second loop attempts to reuse this null value. [~StephanEwen], would you like 
me to submit a pull request?

Stack trace:

{noformat}
org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Job execution failed.
at org.apache.flink.client.program.Client.run(Client.java:381)
at org.apache.flink.client.program.Client.run(Client.java:319)
at org.apache.flink.client.program.Client.run(Client.java:312)
at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
at 
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:790)
at Driver.main(Driver.java:376)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:278)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:630)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:318)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:953)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1003)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution 
failed.
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:418)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:40)
at 
org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:104)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.NullPointerException
at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:136)
at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
at 
org.apache.flink.runtime.plugable.ReusingDeserializationDelegate.read(ReusingDeserializationDelegate.java:57)
at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:125)
at 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:65)
at 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
   

[GitHub] flink pull request: [FLINK-2230] handling null values for TupleSer...

2015-09-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/867#issuecomment-139547674
  
I agree with @StephanEwen that the TupleSerializer should not support null 
values. However, it might be good to offer a null-aware alternative.
Can we close this PR for now until we agreed on a solution for this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2634) Add a Vertex-centric Version of the Tringle Count Library Method

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740825#comment-14740825
 ] 

ASF GitHub Bot commented on FLINK-2634:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1105#discussion_r39271796
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/TriangleCount.java
 ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.functions.FlatJoinFunction;
+import org.apache.flink.api.common.functions.GroupReduceFunction;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.functions.ReduceFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.NeighborsFunctionWithVertexValue;
+import org.apache.flink.graph.utils.VertexToTuple2Map;
+import org.apache.flink.types.NullValue;
+import org.apache.flink.util.Collector;
+
+import java.util.Iterator;
+import java.util.TreeMap;
+
+/**
+ * Triangle Count Algorithm.
+ *
+ * This algorithm operates in three phases. First, vertices select 
neighbors with id greater than theirs
+ * and send messages to them. Each received message is then propagated to 
neighbors with higher id.
+ * Finally, if a node encounters the target id in the list of received 
messages, it increments the number
+ * of triangles found.
+ *
+ * For skewed graphs, we recommend calling the GSATriangleCount library 
method as it uses the more restrictive
+ * `reduceOnNeighbors` function which internally makes use of combiners to 
speed up computation.
+ *
+ * This implementation is non - iterative.
+ *
+ * The algorithm takes an undirected, unweighted graph as input and 
outputs a DataSet of
+ * Tuple1 which contains a single integer representing the number of 
triangles.
+ */
+public class TriangleCount implements
+   GraphAlgorithm> {
--- End diff --

Why is the algorithm typed to `Long` keys. Shouldn't it work with any key 
type?
Couldn't we also allow any vertex and edge value type and replace them 
internally with `NullValue`?


> Add a Vertex-centric Version of the Tringle Count Library Method
> 
>
> Key: FLINK-2634
> URL: https://issues.apache.org/jira/browse/FLINK-2634
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Minor
>
> The vertex-centric version of this algorithm receives an undirected graph as 
> input and outputs the total number of triangles formed by the graph's edges.
> The implementation consists of three phases:
> 1). Select neighbours with id greater than the current vertex id.
> 2). Propagate each received value to neighbours with higher id. 
> 3). Compute the number of Triangles by verifying if the final vertex contains 
> the sender's id in its list.
> As opposed to the GAS version, all these three steps will be performed via 
> message passing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2634] [gelly] Added a vertex-centric Tr...

2015-09-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1105#issuecomment-139553246
  
I did not go through the code in detail. Maybe I am missing something, but 
isn't this algorithm doing pretty much the same thing as the example we have 
already for the DataSet API examples (Java + Scala). In the existing examples 
(EnumTrianglesOpt) 3 reduce and 1 join to enumerate all triangles. We would 
need one more reduce to compute the number of triangles which makes 4 reduce 
and 1 join.
In your code you are using 5 reduce and 2 joins if I counted correctly. 

Are those two algorithms doing different things or could you basically port 
the existing code to Gelly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2597) Add a test for Avro-serialized Kafka messages

2015-09-11 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-2597:
--
Assignee: Vimal

> Add a test for Avro-serialized Kafka messages 
> --
>
> Key: FLINK-2597
> URL: https://issues.apache.org/jira/browse/FLINK-2597
> Project: Flink
>  Issue Type: Bug
>  Components: Avro Support, Kafka Connector
>Affects Versions: 0.10
>Reporter: Robert Metzger
>Assignee: Vimal
>Priority: Minor
>
> A user has asked for serializing Avro messages from Kafka.
> I think its a legitimate use-case that we should cover by a test case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2654) Add JavaDoc to ParameterTool class

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740650#comment-14740650
 ] 

ASF GitHub Bot commented on FLINK-2654:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1116#issuecomment-139528695
  
Thanks for the update. Good to merge.


> Add JavaDoc to ParameterTool class
> --
>
> Key: FLINK-2654
> URL: https://issues.apache.org/jira/browse/FLINK-2654
> Project: Flink
>  Issue Type: Improvement
>Reporter: Behrouz Derakhshan
>Priority: Minor
>
> ParameterTool class is missing JavaDocs 
> The tool is already being used, and the plan is to use it in all of the 
> example codes. We should add JavaDocs before start using it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2654][flink-java] Add JavaDoc to Parame...

2015-09-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1116#issuecomment-139528695
  
Thanks for the update. Good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2591]Add configuration parameter for de...

2015-09-11 Thread willmiao
Github user willmiao closed the pull request at:

https://github.com/apache/flink/pull/1107


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139507621
  
Basically, we should prevent modifying the current Table API tests. This PR 
covers about streaming API only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740525#comment-14740525
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139507227
  
I still cannot understand what is the purpose of adding `javaSet` to the 
Table API. We can get the `DataSet` by using `toDataSet` method.


> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139507227
  
I still cannot understand what is the purpose of adding `javaSet` to the 
Table API. We can get the `DataSet` by using `toDataSet` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2658) fieldsGrouping for multiple output streams fails

2015-09-11 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2658:
--

 Summary: fieldsGrouping for multiple output streams fails
 Key: FLINK-2658
 URL: https://issues.apache.org/jira/browse/FLINK-2658
 Project: Flink
  Issue Type: Bug
  Components: Storm Compatibility
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax


If a Spout or Bolt declares multiple output streams and another Bolt connects 
to one of those streams via "fieldsGrouping",  the call to 
{{FlinkTopologyBuilder.createTopology()}} fails with the following exception:

{noformat}
org.apache.flink.api.common.InvalidProgramException: Specifying keys via field 
positions is only valid for tuple data types. Type: 
PojoType
at 
org.apache.flink.api.java.operators.Keys$ExpressionKeys.(Keys.java:209)
at 
org.apache.flink.api.java.operators.Keys$ExpressionKeys.(Keys.java:203)
at 
org.apache.flink.streaming.api.datastream.DataStream.groupBy(DataStream.java:285)
at 
org.apache.flink.stormcompatibility.api.FlinkTopologyBuilder.createTopology(FlinkTopologyBuilder.java:200)
at 
org.apache.flink.stormcompatibility.api.FlinkTopologyBuilderTest.testFieldsGroupingOnMultipleBoltOutputStreams(FlinkTopologyBuilderTest.java:73)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
{noformat}

Fix: either introduce a mapper, that "flattens" the {{SplitStreamType}} in to 
regular tuple type that is nested inside or provide a custom {{KeySelector}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2597) Add a test for Avro-serialized Kafka messages

2015-09-11 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740605#comment-14740605
 ] 

Robert Metzger commented on FLINK-2597:
---

I don't think anybody is working on this.
I've assigned the issue to you.

The purpose of this issue is to make sure that Flink's Kafka Consumer is able 
to read messages from Kafka serialized using Avro.
I think thats a common use case among industry users that should work properly 
& out of the box with Flink.

Maybe adding a short paragraph into the documentation on how to use Avro with 
Kafka would also be great (I would put it into the kafka.md file).

To test this, I would extend the KafkaITCase (consumer tests).
First, write avro serialized messages into a Kafka topic (maybe this is 
helpful: 
http://docs.confluent.io/1.0/schema-registry/docs/serializer-formatter.html) 
without using any Flink code.

Then, read the data from the topic, using the FlinkKafkaConsumer, the 
TypeInformationSerializationSchema and the AvroTypeInformation.


> Add a test for Avro-serialized Kafka messages 
> --
>
> Key: FLINK-2597
> URL: https://issues.apache.org/jira/browse/FLINK-2597
> Project: Flink
>  Issue Type: Bug
>  Components: Avro Support, Kafka Connector
>Affects Versions: 0.10
>Reporter: Robert Metzger
>Assignee: Vimal
>Priority: Minor
>
> A user has asked for serializing Avro messages from Kafka.
> I think its a legitimate use-case that we should cover by a test case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2647) Stream operators need to differentiate between close() and dispose()

2015-09-11 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2647.
-
Resolution: Fixed

Fixed via 407d74fd004bbd11febfa553c381e828115d8bfb

> Stream operators need to differentiate between close() and dispose()
> 
>
> Key: FLINK-2647
> URL: https://issues.apache.org/jira/browse/FLINK-2647
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> Currently, operators make no distinction between closing (which needs to emit 
> remaining buffered data) and disposing (releasing resources).
> In case of a failure, we want to only release the resources, and not attempt 
> to send pending buffered data.
> Effectively, streaming operators need to implement these methods then:
>   - open()
>   - processRecord()
>   - processWatermark()
>   - close()
>   - dispose()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2656] Fix behavior of FlinkKafkaConsume...

2015-09-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1117#issuecomment-139529822
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2656) FlinkKafkaConsumer is failing with OutOfRangeException

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740665#comment-14740665
 ] 

ASF GitHub Bot commented on FLINK-2656:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1117#issuecomment-139529822
  
LGTM


> FlinkKafkaConsumer is failing with OutOfRangeException
> --
>
> Key: FLINK-2656
> URL: https://issues.apache.org/jira/browse/FLINK-2656
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 0.10, 0.9.1
>Reporter: Robert Metzger
>Priority: Critical
>
> FlinkKafkaConsumer is failing with an OutOfRangeException. There is actually 
> a configuration parameter for the high level kafka consumer how to handle 
> these situations (the high level c) doesn't fail on that exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2591) Add configuration parameter for default number of yarn containers

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740690#comment-14740690
 ] 

ASF GitHub Bot commented on FLINK-2591:
---

Github user willmiao closed the pull request at:

https://github.com/apache/flink/pull/1107


> Add configuration parameter for default number of yarn containers
> -
>
> Key: FLINK-2591
> URL: https://issues.apache.org/jira/browse/FLINK-2591
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Robert Metzger
>Assignee: Will Miao
>Priority: Minor
>  Labels: starter
>
> A user complained about the requirement to always specify the number of yarn 
> containers (-n) when starting a job.
> Adding a configuration value with a default value will allow users to set a 
> default ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2591]Add configuration parameter for de...

2015-09-11 Thread willmiao
Github user willmiao commented on the pull request:

https://github.com/apache/flink/pull/1107#issuecomment-139531959
  
I'm sorry that something went wrong with my original repository. So, I open 
a new PR, and this one will be closed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2591) Add configuration parameter for default number of yarn containers

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740689#comment-14740689
 ] 

ASF GitHub Bot commented on FLINK-2591:
---

Github user willmiao commented on the pull request:

https://github.com/apache/flink/pull/1107#issuecomment-139531959
  
I'm sorry that something went wrong with my original repository. So, I open 
a new PR, and this one will be closed.


> Add configuration parameter for default number of yarn containers
> -
>
> Key: FLINK-2591
> URL: https://issues.apache.org/jira/browse/FLINK-2591
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Robert Metzger
>Assignee: Will Miao
>Priority: Minor
>  Labels: starter
>
> A user complained about the requirement to always specify the number of yarn 
> containers (-n) when starting a job.
> Adding a configuration value with a default value will allow users to set a 
> default ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2656) FlinkKafkaConsumer is failing with OutOfRangeException

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740557#comment-14740557
 ] 

ASF GitHub Bot commented on FLINK-2656:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1117#issuecomment-139514082
  
Good and critical fix. Should go to
  - `master`
  - `release-0.10.0-milestone1`
  - `0.9.2`


> FlinkKafkaConsumer is failing with OutOfRangeException
> --
>
> Key: FLINK-2656
> URL: https://issues.apache.org/jira/browse/FLINK-2656
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 0.10, 0.9.1
>Reporter: Robert Metzger
>Priority: Critical
>
> FlinkKafkaConsumer is failing with an OutOfRangeException. There is actually 
> a configuration parameter for the high level kafka consumer how to handle 
> these situations (the high level c) doesn't fail on that exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2445) Add tests for HadoopOutputFormats

2015-09-11 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-2445:
--
Assignee: Martin Liesenberg

> Add tests for HadoopOutputFormats
> -
>
> Key: FLINK-2445
> URL: https://issues.apache.org/jira/browse/FLINK-2445
> Project: Flink
>  Issue Type: Test
>  Components: Hadoop Compatibility, Tests
>Affects Versions: 0.10, 0.9.1
>Reporter: Fabian Hueske
>Assignee: Martin Liesenberg
>  Labels: starter
>
> The HadoopOutputFormats and HadoopOutputFormatBase classes are not 
> sufficiently covered by unit tests.
> We need tests that ensure that the methods of the wrapped Hadoop 
> OutputFormats are correctly called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2647) Stream operators need to differentiate between close() and dispose()

2015-09-11 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2647.
---

> Stream operators need to differentiate between close() and dispose()
> 
>
> Key: FLINK-2647
> URL: https://issues.apache.org/jira/browse/FLINK-2647
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> Currently, operators make no distinction between closing (which needs to emit 
> remaining buffered data) and disposing (releasing resources).
> In case of a failure, we want to only release the resources, and not attempt 
> to send pending buffered data.
> Effectively, streaming operators need to implement these methods then:
>   - open()
>   - processRecord()
>   - processWatermark()
>   - close()
>   - dispose()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2656] Fix behavior of FlinkKafkaConsume...

2015-09-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1117#issuecomment-139529893
  
Thank you for the review. I'll merge it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2656) FlinkKafkaConsumer is failing with OutOfRangeException

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740666#comment-14740666
 ] 

ASF GitHub Bot commented on FLINK-2656:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1117#issuecomment-139529893
  
Thank you for the review. I'll merge it


> FlinkKafkaConsumer is failing with OutOfRangeException
> --
>
> Key: FLINK-2656
> URL: https://issues.apache.org/jira/browse/FLINK-2656
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 0.10, 0.9.1
>Reporter: Robert Metzger
>Priority: Critical
>
> FlinkKafkaConsumer is failing with an OutOfRangeException. There is actually 
> a configuration parameter for the high level kafka consumer how to handle 
> these situations (the high level c) doesn't fail on that exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2656) FlinkKafkaConsumer is failing with OutOfRangeException

2015-09-11 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-2656:
-

Assignee: Robert Metzger

> FlinkKafkaConsumer is failing with OutOfRangeException
> --
>
> Key: FLINK-2656
> URL: https://issues.apache.org/jira/browse/FLINK-2656
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 0.10, 0.9.1
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>
> FlinkKafkaConsumer is failing with an OutOfRangeException. There is actually 
> a configuration parameter for the high level kafka consumer how to handle 
> these situations (the high level c) doesn't fail on that exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2656] Fix behavior of FlinkKafkaConsume...

2015-09-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1117


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2656) FlinkKafkaConsumer is failing with OutOfRangeException

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740673#comment-14740673
 ] 

ASF GitHub Bot commented on FLINK-2656:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1117


> FlinkKafkaConsumer is failing with OutOfRangeException
> --
>
> Key: FLINK-2656
> URL: https://issues.apache.org/jira/browse/FLINK-2656
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 0.10, 0.9.1
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>
> FlinkKafkaConsumer is failing with an OutOfRangeException. There is actually 
> a configuration parameter for the high level kafka consumer how to handle 
> these situations (the high level c) doesn't fail on that exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2591) Add configuration parameter for default number of yarn containers

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740682#comment-14740682
 ] 

ASF GitHub Bot commented on FLINK-2591:
---

GitHub user willmiao opened a pull request:

https://github.com/apache/flink/pull/1121

[FLINK-2591] Add configuration parameter for default number of yarn 
containers



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/willmiao/flink FLINK-2591

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1121.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1121


commit 6c6005d6ea92177b4a99183f2509a480718dabb2
Author: Will Miao 
Date:   2015-09-10T23:34:35Z

[FLINK-2591]Add configuration parameter for default number of yarn 
containers

commit 154e486620469ffadb9fa9caed0f27f69fbe04ad
Author: Will Miao 
Date:   2015-09-11T12:09:09Z

[FLINK-2591] Add configuration parameter for default number of yarn 
containers




> Add configuration parameter for default number of yarn containers
> -
>
> Key: FLINK-2591
> URL: https://issues.apache.org/jira/browse/FLINK-2591
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Robert Metzger
>Assignee: Will Miao
>Priority: Minor
>  Labels: starter
>
> A user complained about the requirement to always specify the number of yarn 
> containers (-n) when starting a job.
> Adding a configuration value with a default value will allow users to set a 
> default ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2591] Add configuration parameter for d...

2015-09-11 Thread willmiao
GitHub user willmiao opened a pull request:

https://github.com/apache/flink/pull/1121

[FLINK-2591] Add configuration parameter for default number of yarn 
containers



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/willmiao/flink FLINK-2591

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1121.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1121


commit 6c6005d6ea92177b4a99183f2509a480718dabb2
Author: Will Miao 
Date:   2015-09-10T23:34:35Z

[FLINK-2591]Add configuration parameter for default number of yarn 
containers

commit 154e486620469ffadb9fa9caed0f27f69fbe04ad
Author: Will Miao 
Date:   2015-09-11T12:09:09Z

[FLINK-2591] Add configuration parameter for default number of yarn 
containers




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740527#comment-14740527
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139507621
  
Basically, we should prevent modifying the current Table API tests. This PR 
covers about streaming API only.


> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2622) Scala DataStream API does not have writeAsText method which supports WriteMode

2015-09-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740551#comment-14740551
 ] 

ASF GitHub Bot commented on FLINK-2622:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139513146
  
As you know, It seems because of ambiguousness between `Table` from 
`DataSet` and that from `DataStream`. We need refactor for Table API.


> Scala DataStream API does not have writeAsText method which supports WriteMode
> --
>
> Key: FLINK-2622
> URL: https://issues.apache.org/jira/browse/FLINK-2622
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API, Streaming
>Reporter: Till Rohrmann
>
> The Scala DataStream API, unlike the Java DataStream API, does not support a 
> {{writeAsText}} method which takes the {{WriteMode}} as a parameter. In order 
> to make the two APIs consistent, it should be added to the Scala DataStream 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2622][streaming]add WriteMode for write...

2015-09-11 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1098#issuecomment-139513146
  
As you know, It seems because of ambiguousness between `Table` from 
`DataSet` and that from `DataStream`. We need refactor for Table API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >