[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/861#discussion_r37138063
  
--- Diff: docs/libs/ml/statistics.md ---
@@ -0,0 +1,108 @@
+---
+mathjax: include
+htmlTitle: FlinkML - Statistics
+title: a href=../mlFlinkML/a - Statistics
+---
+!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+License); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+--
+
+* This will be replaced by the TOC
+{:toc}
+
+## Description
+
+ The statistics utility provides features such as building histograms over 
data, determining
+ mean, variance, gini impurity, entropy etc. of data.
+
+## Methods
+
+ The Statistics utility provides two major functions: `createHistogram` 
and `dataStats`.
+
+### Creating a histogram
+
+ There are two types of histograms:
+ ul
+  li
+   strongContinuous Histograms/strong: These histograms are formed on 
a data set `X: DataSet[Double]` 
+   when the values in `X` are from a continuous range. These histograms 
support
+   `quantile` and `sum`  operations. Here `quantile(q)` refers to a value 
$x_q$ such that $|x: x
+   \leq x_q| = q * |X|$. Further, `sum(s)` refers to the number of 
elements $x \leq s$, which can
+be construed as a cumulative probability value at $s$[Of course, 
iscaled/i probability].
+   br
+A continuous histogram can be formed by calling `X.createHistogram(b)` 
where `b` is the
+number of bins.
+  /li
+  li
+strongCategorical Histograms/strong: These histograms are formed 
on a data set `X:DataSet[Double]` 
+when the values in `X` are from a discrete distribution. These 
histograms
+support `count(c)` operation which returns the number of elements 
associated with cateogry `c`.
+br
+A categorical histogram can be formed by calling 
`X.createHistogram(0)`.
+  /li
+ /ul
+
+### Data Statistics
+
+ The `dataStats` function operates on a data set `X: DataSet[Vector]` and 
returns column-wise
+ statistics for `X`. Every field of `X` is allowed to be defined as either 
idiscrete/i or
+ icontinuous/i.
+ br
+ Statistics can be evaluated by calling `DataStats.dataStats(X)` or 
+ `DataStats.dataStats(X, discreteFields`). The latter is used when some 
fields are needed to be 
+ declared discrete-valued, and is provided as an array of indices of 
fields which are discrete.
+ br
+ The following information is available as part of `DataStats`:
+ ul
+liNumber of elements in `X`/li
+liDimension of `X`/li
+liColumn-wise statistics where for discrete fields, we report counts 
for each category, and
+ the Gini impurity and Entropy of the field, while for continuous 
fields, we report the
+ minimum, maximum, mean and variance.
+/li
+ /ul
+
+## Examples
+
+{% highlight scala %}
+
+import org.apache.flink.ml.statistics._
+import org.apache.flink.ml._
+
+val X: DataSet[Double] = ...
+// Create continuous histogram
+val histogram = X.createHistogram(5) // creates a histogram with five 
bins
+histogram.quantile(0.3)  // returns the 30th quantile
+histogram.sum(4) // returns number of elements 
less than 4
+
+// Create categorical histogram
+val histogram = X.createHistogram(0) // creates a categorical histogram
+histogram.count(3)   // number of elements with 
cateogory value 3
--- End diff --

cateogory - category


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/861#discussion_r37138093
  
--- Diff: docs/libs/ml/statistics.md ---
@@ -0,0 +1,108 @@
+---
+mathjax: include
+htmlTitle: FlinkML - Statistics
+title: a href=../mlFlinkML/a - Statistics
+---
+!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+License); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+--
+
+* This will be replaced by the TOC
+{:toc}
+
+## Description
+
+ The statistics utility provides features such as building histograms over 
data, determining
+ mean, variance, gini impurity, entropy etc. of data.
+
+## Methods
+
+ The Statistics utility provides two major functions: `createHistogram` 
and `dataStats`.
+
+### Creating a histogram
+
+ There are two types of histograms:
+ ul
+  li
+   strongContinuous Histograms/strong: These histograms are formed on 
a data set `X: DataSet[Double]` 
+   when the values in `X` are from a continuous range. These histograms 
support
+   `quantile` and `sum`  operations. Here `quantile(q)` refers to a value 
$x_q$ such that $|x: x
+   \leq x_q| = q * |X|$. Further, `sum(s)` refers to the number of 
elements $x \leq s$, which can
+be construed as a cumulative probability value at $s$[Of course, 
iscaled/i probability].
+   br
+A continuous histogram can be formed by calling `X.createHistogram(b)` 
where `b` is the
+number of bins.
+  /li
+  li
+strongCategorical Histograms/strong: These histograms are formed 
on a data set `X:DataSet[Double]` 
+when the values in `X` are from a discrete distribution. These 
histograms
+support `count(c)` operation which returns the number of elements 
associated with cateogry `c`.
+br
+A categorical histogram can be formed by calling 
`X.createHistogram(0)`.
+  /li
+ /ul
+
+### Data Statistics
+
+ The `dataStats` function operates on a data set `X: DataSet[Vector]` and 
returns column-wise
+ statistics for `X`. Every field of `X` is allowed to be defined as either 
idiscrete/i or
+ icontinuous/i.
+ br
+ Statistics can be evaluated by calling `DataStats.dataStats(X)` or 
+ `DataStats.dataStats(X, discreteFields`). The latter is used when some 
fields are needed to be 
+ declared discrete-valued, and is provided as an array of indices of 
fields which are discrete.
+ br
+ The following information is available as part of `DataStats`:
+ ul
+liNumber of elements in `X`/li
+liDimension of `X`/li
+liColumn-wise statistics where for discrete fields, we report counts 
for each category, and
+ the Gini impurity and Entropy of the field, while for continuous 
fields, we report the
+ minimum, maximum, mean and variance.
+/li
+ /ul
+
+## Examples
+
+{% highlight scala %}
+
+import org.apache.flink.ml.statistics._
+import org.apache.flink.ml._
+
+val X: DataSet[Double] = ...
+// Create continuous histogram
+val histogram = X.createHistogram(5) // creates a histogram with five 
bins
+histogram.quantile(0.3)  // returns the 30th quantile
+histogram.sum(4) // returns number of elements 
less than 4
--- End diff --

Is this example valid? `RichDoubleDataSet.createHistogram` returns 
`DataSet[OnlineHistogram]` and it doesn't have methods such as `quantile`, 
`sum`, ..., etc..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-15 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-131425060
  
Hi @sachingoel0101, Thanks for your contribution. I reviewed this PR and 
commented the source code.

There are some problems which aren't commented. In documentation, there are 
many lines including `a`, `ul`, `li`, 'br', 'strong' and 'i' tag. 
These can be replaced with markdown syntax. And some codes 
(`MLUtils.createHistogram`, `ColumnStatistics`) are not formatted.

I didn't finish reviewing process of building histogram. After reading full 
source code and the given paper, I'll add comment about the process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-08-14 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-131086738
  
@kno10 Thanks for the comment. In this case we don't need to parallelize 
R-Tree because R-Tree is only used in reducer for matching records of the given 
block pair.

But I agree that exact k-NN implementation doesn't fit large-scale data. We 
can discuss in other [JIRA 
issue](https://issues.apache.org/jira/browse/FLINK-1934).

I'm inclined to close this PR and FLINK-1745 because many people says exact 
k-NN is not good and I think so. May I close them? @thvasilo @tillrohrmann 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2507]Rename the function tansformAndEmi...

2015-08-11 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1007#issuecomment-129850783
  
Why the permissions of file are changed from 644 to 755? Other changes 
seems good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-2030][ml]Data Set Statistics and Histog...

2015-08-10 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/861#issuecomment-129336972
  
Hi, I just discovered the review request. I'll review this PR soon. Because 
I'm busy in working for my graduation essay, maybe I can start reviewing on 
weekend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API

2015-08-07 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/808#issuecomment-128638172
  
[Your last 
commit](https://github.com/PieterJanVanAeken/flink/commit/463a1f30b3b0785f46c76f9c290da3deec26)
 includes all updates of master branch. Please remove last commit (`git reset 
--hard HEAD~1` on your master branch) and rebase your branch on master branch 
(`git rebase apache/master`) and push to your branch with force update (`git 
push origin +master`).

Additionally, working on non-master branch is better than on master branch. 
If you work on non-master branch, you can sync your master branch to apache 
master branch while you are changing your branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2314]

2015-08-06 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r36478648
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java
 ---
@@ -119,13 +124,20 @@ public void run(SourceContextOUT ctx) throws 
Exception {
while (isRunning) {
OUT nextElement = serializer.createInstance();
nextElement =  format.nextRecord(nextElement);
-   if (nextElement == null  splitIterator.hasNext()) {
+   if (nextElement == null  splitIterator.hasNext() ) {
--- End diff --

unnecessary space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-2166. Add fromCsvFile() method to TableE...

2015-07-31 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/939#issuecomment-126621482
  
Just my opinion, `TableEnvironment` is located under 
`org.apache.flink.api.java.table` because of unifying of Table API 
implementation. But Table API is implemented on Scala. I think that using Scala 
API is proper for this.

@aljoscha @fhueske How do you think about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2446]Fix SocketTextStreamFunction has m...

2015-07-31 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/965#issuecomment-12669
  
Nice catch! Looks good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2408] Define all maven properties outsi...

2015-07-27 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/941#issuecomment-125185305
  
Hi, pom.xml file of flink-scala module has also profile based dependency 
setting. Is it okay without modification?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-27 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-125271970
  
There is wrong description in building documentation. Because Scala 2.11 
profile activation is determined by property scala-2.11, Scala 2.10 profile 
is activated when we execute `mvn -Pscala-2.11`. Following commands is right:

```
mvn clean install -DskipTests -Dscala-2.11
```

About the shading artifacts, your guess is right. Because Hadoop packages 
don't need Scala dependencies, I didn't add suffix to them. But if we need the 
suffix for them to maintain uniformity, we can add the suffix. How do you think?

I just found another problem. I opened the dependency reduced pom in my 
maven repository, there are some property expressions in artifact id. For 
example, the following is head of flink-runtime module's pom.xml:

```
project xmlns=http://maven.apache.org/POM/4.0.0; 
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;
  parent
artifactIdflink-parent${scala.suffix}/artifactId
groupIdorg.apache.flink/groupId
version0.10-SNAPSHOT/version
relativePath../../pom.xml/relativePath
  /parent
  modelVersion4.0.0/modelVersion
  artifactIdflink-runtime${scala.suffix}/artifactId
  nameflink-runtime/name
  build
plugins
  plugin
```

As you see, there are property expressions (${scala.suffix}) in artifactId. 
I think that it can be a problem. How can I solve this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-27 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-125319617
  
When we upgrade version of maven-shade-plugin to 2.4.1, the property in 
current project artifactId is interpolated properly. But the property in parent 
artifactId is not interpolated. I think that it is a bug of maven-shade-plugin. 
I posted an [issue](https://issues.apache.org/jira/browse/MSHADE-200) to 
maven-shade-plugin JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-24 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-124521433
  
Could you post your command to compile Flink with Scala 2.11? The current 
setting works well in my environment. Maven module definitions is not artifact 
id but directory name. So we should keep current setting.

I'm adding the suffix into the pom except quickstart.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-22 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-123642852
  
Hi, currently this PR is not ready to merge, because this PR doesn't 
contain changes for #677. I'll update soon. Unfortunately I'm outside now. 
Maybe I can update this PR in 4-5 hours.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2377] Add reader.close() to readAllResu...

2015-07-20 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/924#issuecomment-122893574
  
Great catch! Looks good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2105] Implement Sort-Merge Outer Join a...

2015-07-17 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/907#issuecomment-122376238
  
Hi, I am reviewing this changes. I'm not done yet but I found some points 
which are able to improve.

First, there are some duplicated classes such as `SimpleFlatJoinFunction`, 
`MatchRemovingMatcher`, `Match`, `CollectionIterator`. I think that we can this 
classes move under `org.apache.flink.runtime.operators.testutils` package. 
After moving them, they can be shared with test cases for hash-based outer join.

Second, this is just my opinion, how about creating iterator classes for 
each outer join type such as, `AbstractMergeLeftOuterJoinIterator`, 
`AbstractMergeRightOuterJoinIterator`, `AbstractMergeFullOuterJoinIterator` and 
derived classes by reusing variable? I'm concerned about time consuming by 
comparing outer join type for many records in `callWithNextKey` method. The 
outer join type is already decided before doing join operation. But I'm not 
sure that there is obvious performance decrease by this comparing. If the 
performance decrease is negligible, the second suggestion could be ignored.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Update README.md

2015-07-15 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/912#issuecomment-121515483
  
Hi, Thanks for your contribution. But I think this PR is not necessary 
because the change is not specific to Flink but about the general of git.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Updated method documentation in joinDataSet.sc...

2015-07-14 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/909#issuecomment-121246257
  
Looks good. :)
+1 for merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-14 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121283389
  
@pp86 Looks good to merge. :)
If another committer gives LGTM to this PR, I'll merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-297] [web frontend] First part of JobMa...

2015-07-14 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/677#issuecomment-121441621
  
Looks good to merge. After merging this PR, we need to modify PR #885.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-13 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-120913343
  
Hi, @pp86 Thanks for your contribution.

But I think that using `AutoSelector` is not the best approach to improve 
distinct transformation. In Flink, a `KeySelector` converts a `DataSetO` to 
`DataSetTuple2K, O` and uses the first element of the tuple as key. For 
atomic types, `AutoSelector` creates `DataSetTuple2V, V` which 
unnecessarily duplicated data.

I recommend `Keys.ExpressionKeys` when the user call `distinct()` method on 
atomic data types.

And It would be better to add the test cases for this changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-13 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-121119196
  
@pp86 It seems okay but we need to check this change with some test cases. 
Could you add some test cases into `DistinctITCase` in `flink-tests` module?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

2015-07-13 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/905#issuecomment-120922707
  
@pp86 Hi you can modify this pull request by adding commit in your branch. 
(pp86:master)
I think reopening this pull request and adding commit is better than 
opening new pull request. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2218] Web client cannot distinguesh bet...

2015-07-12 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/904#issuecomment-120707231
  
+1 for renaming. I confused the difference between the options and 
arguments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2337] Multiple SLF4J bindings using Sto...

2015-07-11 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/903#issuecomment-120603967
  
+1 for merging :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [doc] Fix wordcount example in YARN setup

2015-07-10 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/897#issuecomment-120384704
  
Merging...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-10 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-120554092
  
If I get 1 more LGTM, I'll merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-10 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-120411775
  
Hi, I updated PR :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-07 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/885#discussion_r34067611
  
--- Diff: docs/apis/programming_guide.md ---
@@ -187,7 +187,17 @@ that creates the type information for Flink operations.
 /div
 /div
 
+ Scala Dependency Versions
 
+Because Scala 2.10 binary is not compatible with Scala 2.11 binary, we 
provide multiple artifacts
+to support both Scala versions. If you want to run your program on Flink 
with Scala 2.11, you need
+to add a suffix `_2.11` to all Flink artifact ids in your dependencies. 
You should be careful with
+this difference of artifact id. All modules with Scala 2.11 have a suffix 
`_2.11` in artifact id.
+For example, `flink-java` should be changed to `flink-java_2.11` and 
`flink-clients` should be
+changed to `flink-clients_2.11`.
--- End diff --

@aalexandrov I'm not sure it is good to use the keyword Scala-dependent. 
We cross-build all modules including java-based but linked with Scala modules. 
I think that omitting Scala-dependent is much better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-07-07 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/696#discussion_r34017778
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/classification/KNN.scala
 ---
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.DataSetUtils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.Vector
+import org.apache.flink.ml.metrics.distances.{DistanceMetric, 
EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * This algorithm calculates `k` nearest neighbor points in training set 
for each points of
+  * testing set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.classification.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''None''')
+  *
+  * - [[org.apache.flink.ml.classification.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.classification.KNN.DistanceMetric]]
+  * Sets the distance metric to calculate distance between two points. If 
no metric is specified,
+  * then [[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] 
is used. (Default value:
+  * '''EuclideanDistanceMetric()''')
+  *
+  */
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[Vector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k  1, K must be positive.)
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n  1, Number of blocks must be positive.)
+parameters.add(Blocks, n)
+this
+  }
+}
+
+object KNN {
+
+  case object K extends Parameter[Int] {
+val defaultValue: Option[Int] = None
+  }
+
+  case object DistanceMetric extends Parameter[DistanceMetric] {
+val defaultValue: Option[DistanceMetric] = 
Some(EuclideanDistanceMetric())
+  }
+
+  case object Blocks extends Parameter[Int] {
+val defaultValue: Option[Int] = None
+  }
+
+  def apply(): KNN = {
+new KNN()
+  }
+
+  /** [[FitOperation]] which trains a KNN based on the given training data 
set.
+* @tparam T Subtype of [[Vector]]
+*/
+  implicit def fitKNN[T : Vector : TypeInformation] = new 
FitOperation[KNN, T

[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-07-07 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-119163944
  
@thvasilo Yeah exact k-NN is not scalable for gigabytes-sized, 
terabytes-sized data. If I add R-Tree to this algorithm, the algorithm would be 
better. But I agree that we need voting or discussion about this.

I am also interested in approximate k-NN but we should check progress to 
the assigned contributor. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-07-06 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-118792663
  
@thvasilo Thanks :) I'll update this pull request soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-07-06 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/696#discussion_r33921518
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/classification/KNN.scala
 ---
@@ -0,0 +1,204 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.classification
+
+import org.apache.flink.api.common.operators.Order
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.DataSetUtils._
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common._
+import org.apache.flink.ml.math.Vector
+import org.apache.flink.ml.metrics.distances.{DistanceMetric, 
EuclideanDistanceMetric}
+import org.apache.flink.ml.pipeline.{FitOperation, 
PredictDataSetOperation, Predictor}
+import org.apache.flink.util.Collector
+
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.ClassTag
+
+/** Implements a k-nearest neighbor join.
+  *
+  * This algorithm calculates `k` nearest neighbor points in training set 
for each points of
+  * testing set.
+  *
+  * @example
+  * {{{
+  * val trainingDS: DataSet[Vector] = ...
+  * val testingDS: DataSet[Vector] = ...
+  *
+  * val knn = KNN()
+  *   .setK(10)
+  *   .setBlocks(5)
+  *   .setDistanceMetric(EuclideanDistanceMetric())
+  *
+  * knn.fit(trainingDS)
+  *
+  * val predictionDS: DataSet[(Vector, Array[Vector])] = 
knn.predict(testingDS)
+  * }}}
+  *
+  * =Parameters=
+  *
+  * - [[org.apache.flink.ml.classification.KNN.K]]
+  * Sets the K which is the number of selected points as neighbors. 
(Default value: '''None''')
+  *
+  * - [[org.apache.flink.ml.classification.KNN.Blocks]]
+  * Sets the number of blocks into which the input data will be split. 
This number should be set
+  * at least to the degree of parallelism. If no value is specified, then 
the parallelism of the
+  * input [[DataSet]] is used as the number of blocks. (Default value: 
'''None''')
+  *
+  * - [[org.apache.flink.ml.classification.KNN.DistanceMetric]]
+  * Sets the distance metric to calculate distance between two points. If 
no metric is specified,
+  * then [[org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric]] 
is used. (Default value:
+  * '''EuclideanDistanceMetric()''')
+  *
+  */
+class KNN extends Predictor[KNN] {
+
+  import KNN._
+
+  var trainingSet: Option[DataSet[Block[Vector]]] = None
+
+  /** Sets K
+* @param k the number of selected points as neighbors
+*/
+  def setK(k: Int): KNN = {
+require(k  1, K must be positive.)
+parameters.add(K, k)
+this
+  }
+
+  /** Sets the distance metric
+* @param metric the distance metric to calculate distance between two 
points
+*/
+  def setDistanceMetric(metric: DistanceMetric): KNN = {
+parameters.add(DistanceMetric, metric)
+this
+  }
+
+  /** Sets the number of data blocks/partitions
+* @param n the number of data blocks
+*/
+  def setBlocks(n: Int): KNN = {
+require(n  1, Number of blocks must be positive.)
+parameters.add(Blocks, n)
+this
+  }
+}
+
+object KNN {
+
+  case object K extends Parameter[Int] {
+val defaultValue: Option[Int] = None
+  }
+
+  case object DistanceMetric extends Parameter[DistanceMetric] {
+val defaultValue: Option[DistanceMetric] = 
Some(EuclideanDistanceMetric())
+  }
+
+  case object Blocks extends Parameter[Int] {
+val defaultValue: Option[Int] = None
+  }
+
+  def apply(): KNN = {
+new KNN()
+  }
+
+  /** [[FitOperation]] which trains a KNN based on the given training data 
set.
+* @tparam T Subtype of [[Vector]]
+*/
+  implicit def fitKNN[T : Vector : TypeInformation] = new 
FitOperation[KNN, T

[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-05 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-118611879
  
@aalexandrov Thank you for review. :) I will applying your suggestion and 
update this PR. But in this changes, all modules require the suffix if the 
module is linked with Scala 2.11. So we don't need list of modules.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Collect(): Fixing the akka.framesize size limi...

2015-07-05 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/887#issuecomment-118644497
  
FLINK-2319,
I leave a commit with the JIRA ticket to track changes of this PR. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-03 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/885

[FLINK-2200] Add Flink with Scala 2.11 in Maven Repository

Hi, This PR contains following changes:

* Add a suffix `_2.11` to all maven modules by profile setting. 
(`-Pscala-2.11`)
* Add documentation about using Flink with Scala 2.11.

I defined rules to add the suffix `_2.11` to maven modules. (From 
suggestion and discussion in mailing list, thanks @rmetzger)

1. All modules with Scala 2.10 don't have any suffix. (`flink-java`, 
`flink-core`, `flink-scala`, ..., etc.)
2. All modules except packaging as pom with Scala 2.11 have the suffix 
`_2.11`. (`flink-java_2.11`, `flink-core_2.11`, `flink-scala_2.11`, ..., etc.)
  * `flink-parent`, `flink-quickstart`, `flink-examples`, ..., and etc. 
don't have any suffix although they are built with Scala 2.11 because this 
modules are for only structuring project.
3. Currently, `flink-scala-shell` don't have any suffix. But in later, we 
have to add suffix to `flink-scala-shell`.

I think that providing quickstart with Scala version variation such as 
`flink-quickstart-java`, `flink-quickstart-java_2.11` is good. But I cannot 
find the method to change architype result file (`pom.xml` in 
`src/main/resources/archetype-resources`) by maven profile. So I add 
explanation of default scala version in quickstart to documentation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-2200

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/885.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #885


commit e15ad6749c24a7bc6e2d37370c926981066a52e1
Author: Chiwan Park chiwanp...@apache.org
Date:   2015-07-02T17:28:51Z

[FLINK-2200] [build system] Add scala version suffix to artifact id for 
Scala 2.11

commit 3bfa184af168d538cbafb527b6238ff945997d95
Author: Chiwan Park chiwanp...@apache.org
Date:   2015-07-03T09:16:16Z

[FLINK-2200] [docs] Add documentation for Flink with Scala 2.11




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2272] [ml] [docs] Move vision and roadm...

2015-07-01 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/864#issuecomment-117860517
  
Good. +1 for merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-06-30 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-117059705
  
Hi, I updated this PR. I reimplemented kNN Join with `zipWithIndex` and 
fitted to changed pipeline architecture. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2152] Added zipWithIndex

2015-06-28 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/832#discussion_r33419462
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/util/DataSetUtilsITCase.scala
 ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.scala.util
+
+import org.apache.flink.api.scala._
+import org.apache.flink.test.util.{MultipleProgramsTestBase, TestBaseUtils}
+import org.junit.rules.TemporaryFolder
+import org.junit.runner.RunWith
+import org.junit.runners.Parameterized
+import org.junit.{After, Before, Rule, Test}
+import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet
+
+@RunWith(classOf[Parameterized])
+class DataSetUtilsITCase (mode: 
MultipleProgramsTestBase.TestExecutionMode) extends
+MultipleProgramsTestBase(mode){
+
+  private var resultPath: String = null
+  private var expectedResult: String = null
+
+  var tempFolder: TemporaryFolder = new TemporaryFolder()
+
+  @Rule
+  def getFolder(): TemporaryFolder = {
+tempFolder;
+  }
+
+  @Before
+  @throws(classOf[Exception])
+  def before {
+resultPath = tempFolder.newFile.toURI.toString
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testZipWithIndex {
+val env: ExecutionEnvironment = 
ExecutionEnvironment.getExecutionEnvironment
+env.setParallelism(1)
+
+val input: DataSet[String] = env.fromElements(A, B, C, D, E, 
F)
+val result: DataSet[(Long, String)] = input.zipWithIndex
+
+result.writeAsCsv(resultPath, \n, ,)
+env.execute()
+
+expectedResult = 0,A\n + 1,B\n + 2,C\n + 3,D\n + 4,E\n + 
5,F
+  }
+
+  @After
+  @throws(classOf[Exception])
+  def after {
--- End diff --

Add a parenthesis and return type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2152] Added zipWithIndex

2015-06-28 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/832#issuecomment-116208070
  
Hi, I added some minor comments about coding style in Scala test case. The 
rest things is okay.
I think we can merge this after fixing the style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2152] Added zipWithIndex

2015-06-28 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/832#discussion_r33419461
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/util/DataSetUtilsITCase.scala
 ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.scala.util
+
+import org.apache.flink.api.scala._
+import org.apache.flink.test.util.{MultipleProgramsTestBase, TestBaseUtils}
+import org.junit.rules.TemporaryFolder
+import org.junit.runner.RunWith
+import org.junit.runners.Parameterized
+import org.junit.{After, Before, Rule, Test}
+import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet
+
+@RunWith(classOf[Parameterized])
+class DataSetUtilsITCase (mode: 
MultipleProgramsTestBase.TestExecutionMode) extends
+MultipleProgramsTestBase(mode){
+
+  private var resultPath: String = null
+  private var expectedResult: String = null
+
+  var tempFolder: TemporaryFolder = new TemporaryFolder()
+
+  @Rule
+  def getFolder(): TemporaryFolder = {
+tempFolder;
+  }
+
+  @Before
+  @throws(classOf[Exception])
+  def before {
+resultPath = tempFolder.newFile.toURI.toString
+  }
+
+  @Test
+  @throws(classOf[Exception])
+  def testZipWithIndex {
--- End diff --

Add a parenthesis and return type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2152] Added zipWithIndex

2015-06-28 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/832#discussion_r33419459
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/util/DataSetUtilsITCase.scala
 ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.scala.util
+
+import org.apache.flink.api.scala._
+import org.apache.flink.test.util.{MultipleProgramsTestBase, TestBaseUtils}
+import org.junit.rules.TemporaryFolder
+import org.junit.runner.RunWith
+import org.junit.runners.Parameterized
+import org.junit.{After, Before, Rule, Test}
+import org.apache.flink.api.scala.DataSetUtils.utilsToDataSet
+
+@RunWith(classOf[Parameterized])
+class DataSetUtilsITCase (mode: 
MultipleProgramsTestBase.TestExecutionMode) extends
+MultipleProgramsTestBase(mode){
+
+  private var resultPath: String = null
+  private var expectedResult: String = null
+
+  var tempFolder: TemporaryFolder = new TemporaryFolder()
+
+  @Rule
+  def getFolder(): TemporaryFolder = {
+tempFolder;
+  }
+
+  @Before
+  @throws(classOf[Exception])
+  def before {
--- End diff --

It would be better to insert a empty parenthesis and declare return type as 
Unit to match coding style of other test cases.

I think `def before(): Unit = {` is better than now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2152] Added zipWithIndex

2015-06-28 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/832#issuecomment-116252016
  
Looks good :) merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2152] Added zipWithIndex

2015-06-28 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/832#issuecomment-116252346
  
Oops! I forgot add This closes #832 into commit message. I mistook 
because this is my first commit to upload Apache repository. Sorry. How can I 
fix it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2278] [ml] changed Vector fromBreeze

2015-06-26 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/869#issuecomment-115636110
  
Nice catch and looks good to merge. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1962] Add Gelly Scala API

2015-06-18 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/808#issuecomment-113147981
  
Hi. I'm very excited about Gelly's Scala API. I'm reading the changes from 
this PR. I found a problem about re-formatting. Unlike Flink's Java code, 
Flink's Scala code use 2 spaces as indent. But I think the IDE did re-format 
all code to use 4 spaces as indent.

We should preserve the previous indent setting to preserve modification 
history.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-06-17 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-112745989
  
@thvasilo Good! Thank you. I'll update the implementation after #801 is 
merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-06-17 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/696#issuecomment-112733734
  
Hi. I updated the implementation of kNN using pipeline architecture.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2076] [runtime] Fix memory leakage in M...

2015-06-01 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/751#issuecomment-107478713
  
I fixed the bug related to non-null memory segment. The bug was caused by 
non-clearing `writeBehindBuffersAvailable` variable in `close()` method of 
`MutableHashTable` class. I added a test case for this bug.

I have a problem for creating a test case to test the bug related to null 
memory segment. I think that it is related to parallelism also. If I run 
ConnectedComponents example with parallelism 1, the bug not occurs. So I cannot 
reproduce the state without ConnectedComponents example. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2076] [runtime] Fix memory leakage in M...

2015-05-31 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/751#issuecomment-107287836
  
I was debugging with ConnectedComponents example and found the bug caused 
by null memory segment. I tried adding a test case to test re-openable hash 
table with small memory a moment ago. But I found another memory leakage not 
related to null memory segment. (All memory segments are not null)

I think that more investigation is needed. I'll update this PR when I fix 
the bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2076] [runtime] Fix memory leakage in M...

2015-05-30 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/751

[FLINK-2076] [runtime] Fix memory leakage in MutableHashTable

Hi. This PR contains a bug fix for 
[FLINK-2076](https://issues.apache.org/jira/browse/FLINK-2076).

When `prepareNextPartition` method runs with some pending partitions, the 
memory leakage can occur at `memory.add(getNextBuffer())` statement (Line 515, 
516). In common case, there are extra memory spaces for hash table so first and 
second call of `getNextBuffer()` return memory segment.

But in extreme case such as a case described in JIRA, the second call of 
`getNextBuffer()` can return null and this null makes the memory leakage.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-2076

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/751.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #751


commit e9aa4d9a1ca36178f3f860b1997d5f92ada51c3f
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-05-30T18:27:17Z

[FLINK-2076] [runtime] Fix memory leakage in MutableHashTable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2061] CSVReader: quotedStringParsing an...

2015-05-28 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/734#issuecomment-106313923
  
Okay. :)
Because there is Stephen's email address in test code, I modified test code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2061] CSVReader: quotedStringParsing an...

2015-05-27 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/734

[FLINK-2061] CSVReader: quotedStringParsing and includeFields yields 
ParseException

Fix the bug in `GenericCsvInputFormat` when skipped field is quoted string. 
I also added a unit test for this case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-2061

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #734


commit 99fb79beda88c73d80e630aa5e22e9ee401538ed
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-05-27T13:24:59Z

[FLINK-2061] [java api] Fix GenericCsvInputFormat skipping fields error 
with quoted string




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1745] [ml] [WIP] Add exact k-nearest-ne...

2015-05-19 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/696

[FLINK-1745] [ml] [WIP] Add exact k-nearest-neighbours algorithm to machine 
learning library

This PR is not final but work in progress. You can see detail description 
in [JIRA](https://issues.apache.org/jira/browse/FLINK-1745).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-1745

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/696.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #696


commit 804abe12c9503265171291a2841332341fe972be
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-05-15T05:45:50Z

kNN join first draft




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2001] [ml] Fix DistanceMetric serializa...

2015-05-12 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/668

[FLINK-2001] [ml] Fix DistanceMetric serialization error

* `DistanceMetric` extends Serializable
* Add simple serialization test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-2001

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #668


commit 53726c12c8af09dfbe17903df3efcc1308da0540
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-05-12T06:11:25Z

[FLINK-2001] [ml] Fix DistanceMetric serialization error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1933] Add distance measure interface an...

2015-05-07 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/629#discussion_r29837576
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/metrics/distances/CosineDistanceMeasure.scala
 ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math.metrics.distances
+
+import org.apache.flink.ml.math.Vector
+
+/** This class implements a cosine distance metric. The class calculates 
the distance between
+  * the given vectors by dividing the dot product of two vectors by the 
product of their lengths.
+  * We convert the result of division to a usable distance. So, 1 - 
cos(angle) is actually returned.
+  *
+  * @see http://en.wikipedia.org/wiki/Cosine_similarity
+  */
+class CosineDistanceMeasure extends DistanceMeasure {
+  override def distance(a: Vector, b: Vector): Double = {
+checkValidArguments(a, b)
+
+val dotProd: Double = a.dot(b)
+val denominator: Double = a.magnitude * b.magnitude
+if (dotProd == 0  denominator == 0) {
--- End diff --

@tillrohrmann That is for case with zero-vector. Without the code dealing 
the zero-vector case, we got a `DivisionByZeroError` in the case. I followed 
[the result of Wolfram Alpha](http://goo.gl/NXGLgo) and [implementation of 
Mahout](https://github.com/apache/mahout/blob/master/mr/src/main/java/org/apache/mahout/common/distance/CosineDistanceMeasure.java#L66)
 for resolving this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1933] Add distance measure interface an...

2015-05-07 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/629#issuecomment-99803730
  
@tillrohrmann I just renamed the classes. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1933] Add distance measure interface an...

2015-05-07 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/629#issuecomment-99803806
  
Oh, the commit logs contains distance measure. I will fix them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1933] Add distance measure interface an...

2015-05-06 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/629#issuecomment-99531887
  
I added a overview documentation. Please review and comment if there are 
some errors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1855] SocketTextStreamWordCount example...

2015-05-03 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/644

[FLINK-1855] SocketTextStreamWordCount example cannot be run from the 
webclient

This PR fixes 
[FLINK-1855](https://issues.apache.org/jira/browse/FLINK-1855). Tested in Local 
Cluster  with Oracle JDK7, JDK8.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-1855

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/644.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #644


commit 273c1707fa808a8405e3887dfa91a746485b3283
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-05-04T04:45:02Z

[FLINK-1855] [streaming] Add WordCount class into WindowWordCount and 
SocketTextStreamWordCount example jars




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1937] [ml] Fixes sparse vector/matrix c...

2015-04-28 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/636#issuecomment-97100649
  
Looks good to merge. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1933] Add distance measure interface an...

2015-04-27 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/629

[FLINK-1933] Add distance measure interface and basic implementation to 
machine learning library

This PR contains following changes:

* Add `dot` method and `magnitude` method.
* Add `DistanceMeasure` trait.
* Add 7 basic implementation of `DistanceMeasure`.
* Add tests for above changes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-1933

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #629


commit 3fac0ff339de93ab1c3b3582924af75a1e6057ea
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-04-24T03:50:28Z

[FLINK-1933] [ml] Add dot product and magnitude into Vector

commit c8f940c2439f754ef0e640b5440507bce4b859d2
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-04-27T09:48:56Z

[FLINK-1933] [ml] Add distance measure interface and basic implementation 
to machine learning library




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-703] Use complete element as join key

2015-04-20 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/572#issuecomment-94512484
  
I updated this PR. Thanks for advice. :)

* Remove `setParallelism(1)` in test code.
* Simplify `testGroupReduceWithAtomicValue` in `GroupReduceITCase`.
* Add a `null` check for `expressionsIn` in constructor of `ExpressionKeys`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-703] Use complete element as join key

2015-04-18 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/572#issuecomment-94188129
  
Hi. I updated this PR. The changes are following.

* Re-implement this feature with generalizing `ExpressionKeys`.
* Modify `CoGroupOperatorBase`, `GroupCombineOperatorBase` and 
`GroupReduceOperatorBase` to allow use of AtomicType as Key.
* Add unit tests to test invalid usage.

Because a mention for wildcard expression with atomic type exists already 
in documentation. I didn't modify documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1906] [docs] Add tip to work around pla...

2015-04-18 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/611

[FLINK-1906] [docs] Add tip to work around plain Tuple return type of 
project operator

Add a tip about project transformation with type hinting in documentation. 
Related JIRA is [here](https://issues.apache.org/jira/browse/FLINK-1906).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-1906

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/611.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #611


commit fd784be28c8f6bd85019e653a131975c36e7f2d0
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-04-18T18:24:16Z

[FLINK-1906] [docs] Add tip to work around plain Tuple return type of 
project operator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-703] Use complete element as join key

2015-04-05 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/572

[FLINK-703] Use complete element as join key

Hello. I open a pull request about FLINK-703. You can find more detail 
description in [JIRA](https://issues.apache.org/jira/browse/FLINK-703). This PR 
contains following changes.

* Add `BasicKeySelector` class to use complete element as key.
* Add `checkForAtomicType` method in `Keys` class to check condition.
* Modify `CoGroupOperator`, `JoinOperator`, `DataSet` in Java and Scala API
* Add some unit tests and integration tests to test this modification.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-703

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/572.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #572


commit e6804bbf4d5ec345a723d692da26b277a1cabf04
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-04-05T18:18:23Z

[FLINK-703] [java api] Use complete element as join key

commit 862c5a5608cd7fd8af22c5a781e87ef0ece79e85
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-04-05T20:07:11Z

[FLINK-703] [scala api] Use complete element as join key




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-25 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-86011099
  
Oops, I pushed a intermediate commit a8a5c37. I will fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-25 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-86011782
  
@fhueske You can check it now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-24 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-85830275
  
Hi, I updated this PR.

* Remove `pojoType(Class? targetType)` method in `CsvReader` to force the 
user to explicitly specify the fields order.
* Add checking the fields order existence routine in `readCsvFile` method.
* Add two integration tests for above 2 modifications.

By the way, I cannot find out why Travis fails. In my computer, `mvn clean 
install -DskipTests` and `mvn verify` succeed. From [travis 
log](https://travis-ci.org/chiwanpark/flink/jobs/55747686#L7074), It seems that 
the problem relates with Gelly. Although I read some codes in Gelly, I cannot 
find what is exactly problem.

Could anyone help me with this problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-23 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-85285193
  
@fhueske Hi, Thanks for your kindly advice! I will fix them soon.

About the order of POJO fields, I think also that the option 3 is good. 
However, [FLINK-1665](https://issues.apache.org/jira/browse/FLINK-1665) is not 
implemented yet. I would implement the option 1 and 2 now. After FLINK-1665 
completed, we can implement the option 3.

About the inheritance of `ScalaCsvInputFormat`, I didn't think about Record 
API. Your opinion looks good. I will revert the changes and refactor 
`GenericCsvInputFormat` to contain duplicated methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-18 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-82885499
  
I updated this PR.

* Change method of obtaining `Field` object from using `PojoTypeInfo` to 
saving field names. (Thanks @fhueske for advice!)
* `ScalaCsvInputFormat` extends `CsvInputFormat` because there are many 
duplicated code between the two classes.
* Add integration tests for `CsvReader` (Java API) and 
`ExecutionEnvironment.readCsvFile` (Scala API)

Any feedback is welcome! (especially error message because of my poor 
english)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-13 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-78876571
  
Hello. I have a question about object reuse in `readRecord` method of 
`ScalaCsvInputFormat`. In java implementation, `CsvInputFormat` reuse result 
object. But in `ScalaCsvInputFormat`, we don't reuse object and create instance 
for each record. Why don't `ScalaCsvInputFormat` reuse object?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-13 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-79066240
  
@aljoscha Thanks! I understand about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...

2015-03-11 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78228121
  
As I wrote in JIRA, Fields of Scala class (not case class) are private. 
([Reference](http://stackoverflow.com/questions/1589603/scala-set-a-field-value-reflectively-from-field-name))
 Because fields declared by `val` keyword don't have setter, Flink 
TypeExtractor fails to extract information of POJO example in documentation. I 
attached a result of sample type extraction in 
[JIRA](https://issues.apache.org/jira/browse/FLINK-1654?focusedCommentId=14348124page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14348124).

Flink TypeExtractor deals with Scala case class differently. Case classes 
are dealt like Scala tuple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-11 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-78401359
  
@fhueske Thanks for your kindly advice. I will fix as soon as possible. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...

2015-03-10 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/478

[FLINK-1654] Wrong scala example of POJO type in documentation

More detail description and discussion in 
[JIRA](https://issues.apache.org/jira/browse/FLINK-1654).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-1654

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/478.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #478


commit 398f77facd5aeb357a3e5e7825da60eca65e9435
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-03-11T01:40:17Z

[FLINK-1654] [docs] Fix scala example in programming guide




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-04 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-77293135
  
@fhueske Oh, you are right. Currently, users cannot decide order of fields. 
I will add a parameter to set order of fields.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-04 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r25760453
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -235,8 +252,21 @@ public OUT readRecord(OUT reuse, byte[] bytes, int 
offset, int numBytes) throws

if (parseRecord(parsedValues, bytes, offset, numBytes)) {
// valid parse, map values into pact record
-   for (int i = 0; i  parsedValues.length; i++) {
-   reuse.setField(parsedValues[i], i);
+   if (pojoTypeInfo == null) {
+   Tuple result = (Tuple) reuse;
+   for (int i = 0; i  parsedValues.length; i++) {
+   result.setField(parsedValues[i], i);
+   }
+   } else {
+   for (int i = 0; i  parsedValues.length; i++) {
+   try {
+   
pojoTypeInfo.getPojoFieldAt(i).field.set(reuse, parsedValues[i]);
--- End diff --

@rmetzger Thanks! I modify my implementation to set the fields accessible 
in `CsvInputFormat` and `ScalaCsvInputFormat` and add a test case with private 
fields.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-02-19 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/426

[FLINK-1512] Add CsvReader for reading into POJOs.

This PR contains following changes.

* `CsvInputFormat` and `ScalaCsvInputFormat` can receive POJO type as 
generic parameter
* Add `pojoType(ClassT targetType)` into `CsvReader` (Java API)
* Modify `readCsvFile` method in `ExecutionEnvironment` (Scala API)
* Add unit tests for `CsvInputFormat` and `ScalaCsvInputFormat`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-1512

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/426.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #426


commit 2463d6b7d244528e1625288e1b780335769f14ee
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-02-18T18:27:59Z

[FLINK-1512] [java api] Add CsvReader for reading into POJOs

commit 8fe5f8d1bd402382e6fa93014c5b2fec8e22cbd0
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-02-19T17:23:56Z

[FLINK-1512] [scala api] Add CsvReader for reading into POJOs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...

2015-02-09 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/374#issuecomment-73509674
  
@StephanEwen Thanks for your advice! I fixed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...

2015-02-08 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/374#issuecomment-73410413
  
@tillrohrmann Thanks for your advice. I will fix it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...

2015-02-07 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/374

[FLINK-1179] Add button to JobManager web interface to request stack trace 
of a TaskManager

This PR contains following changes:
* Add public constructors of `org.apache.flink.runtime.instance.InstanceID` 
for sending instance ID from web interface to job manager
* Add a helper method called `getRegisteredInstanceById(InstanceID)` into 
`org.apache.flink.runtime.instance.InstanceManager` for finding Akka Actor from 
instance ID
* Add akka messages called `RequestStackTrace`, `SendStackTrace` and 
`StackTrace`
* Modify a task manager page in web interface of job manager to request and 
show stack trace of a task manager

The following image is a screenshot of web interface of job manager.

![screen shot 2015-02-08 at 3 49 51 
pm](https://cloud.githubusercontent.com/assets/1941681/6095765/9293e996-afaf-11e4-9e8e-4dcd69ce595b.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-1179

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #374


commit 83ab236ce52dd8c3b0aa0b94ee8644a4de28e152
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-02-08T06:36:19Z

[FLINK-1179] [runtime] Add helper method for InstanceID

commit a2a0a0f8261851c93330417f3c60a16c5f1d2dd5
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-02-08T07:05:29Z

[FLINK-1179] Add internal API for obtaining StackTrace

commit 423e64ca4cc6ad4a9396e4418eab95ce5b81b219
Author: Chiwan Park chiwanp...@icloud.com
Date:   2015-02-08T07:10:03Z

[FLINK-1179] [jobmanager] Add button to JobManager web interface to request 
stack trace of a TaskManager




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


<    1   2   3   4   5   6