[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951429#comment-15951429
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

Github user skonto commented on the issue:

https://github.com/apache/flink/pull/3659
  
@p4nna although there is certainly interoperability between scala and java 
could you try first add the Imputer in the scala API.
I will add some comments to the current implementation in java shortly.


> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951183#comment-15951183
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

GitHub user p4nna opened a pull request:

https://github.com/apache/flink/pull/3659

[FLINK-5785] Add an Imputer for preparing data

Adds an imputer class including tests which is able to impute values into 
sparse DataSets of Vectors. One can choose if the median, the mean or the most 
frequent value of a vector or row should be inserted

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/p4nna/flink imputer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3659


commit 88514a98642763c5ad962efecc44bef887b84110
Author: p4nna 
Date:   2017-03-30T08:00:33Z

Added an imputer class with Strategy class

The imputer imputes missing values into a sparse DataSet of Vectors with 
different strategies which can be chosen out of the existing ones in the 
strategy enum class (mean, median or most frequent value) in a row or column

commit d17c6de2ad9456a58d24ac4cda44b5ef5ce5c216
Author: p4nna 
Date:   2017-03-30T08:01:47Z

deleted class in false destination

commit e4b336fdbf93084c30a8ee0067efcd7a4729c0e1
Author: p4nna 
Date:   2017-03-30T08:02:07Z

deleted class in false destination

commit ee6d57cfa669876f983cbf10eb6ffdd02b5c3052
Author: p4nna 
Date:   2017-03-30T08:04:04Z

added imputer class with strategy class

the imputer impustes values into a sparse DataSet of vectors with different 
strategies (mean, median or most frequent value as listed in the strategy class)

commit 57524586cbd63e2f0dfdc70cb34df82e6451c3be
Author: p4nna 
Date:   2017-03-30T08:04:47Z

added a test class for the new imputer class

commit 72ebd5e210f583cd7e8df21ea8d73c06e835e198
Author: p4nna 
Date:   2017-03-30T08:08:49Z

[FLINK-5785] Add an Imputer for preparing data, 

removed unnecessary things and comments, added license

commit 31dbfc704247b0c4723d6d3091a16759fbe18041
Author: p4nna 
Date:   2017-03-30T08:09:26Z

[FLINK-5785] Add an Imputer for preparing data

added license

commit d0f7b816bea49090633b4bc85762bbf70b192b27
Author: p4nna 
Date:   2017-03-30T08:10:03Z

[FLINK-5785] Add an Imputer for preparing data

added license

commit 76f996e2ddc5d912c947f20e2109bd53973c8091
Author: p4nna 
Date:   2017-03-30T08:10:33Z

[FLINK-5785] Add an Imputer for preparing data

added license

commit d533805c7b37888632238ce87e73e6ef9d081d02
Author: p4nna 
Date:   2017-03-31T15:54:37Z

[FLINK-5785] Add an Imputer for preparing data 

should work now.

commit 10dcdfab0ea27e6191cf6d0efad05a563f389ba4
Author: p4nna 
Date:   2017-03-31T15:56:04Z

[FLINK-5785] Add an Imputer for preparing data

was in wrong place

commit 8e67f01ba1fb707b808473f4961902542aaca369
Author: p4nna 
Date:   2017-03-31T15:56:21Z

[FLINK-5785] Add an Imputer for preparing data

was in wrong place

commit c3fdc87e0e9fc07785b4b4b8dc2b1fde4c756d35
Author: p4nna 
Date:   2017-03-31T15:56:59Z

[FLINK-5785] Add an Imputer for preparing data

should work now

commit 07507b5ca0f1cfebc38f96bb8db32c10f2186bbf
Author: p4nna 
Date:   2017-03-31T15:57:37Z

[FLINK-5785] Add an Imputer for preparing data

tests should work now




> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948597#comment-15948597
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

Github user p4nna closed the pull request at:

https://github.com/apache/flink/pull/3631


> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948598#comment-15948598
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

Github user p4nna closed the pull request at:

https://github.com/apache/flink/pull/3625


> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-28 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944851#comment-15944851
 ] 

Stavros Kontopoulos commented on FLINK-5785:


[~beera]Thnx I will have a look ASAP.

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-28 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944817#comment-15944817
 ] 

Chesnay Schepler commented on FLINK-5785:
-

Please remove all files that shouldn't be part of the pull request.

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-28 Thread Anna Beer (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944702#comment-15944702
 ] 

Anna Beer commented on FLINK-5785:
--

1: Thanks I added it to all 4 files
2: Thank you, didn't know that
3: was a mistake, accidentally commited a folder too high. Should I try to 
remove the files I didn't change from my commit or doesn't it matter?

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943728#comment-15943728
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/3625
  
Regarding the license: Every (non-binary) file in the flink repository must 
have the apache license at the very top of the file. Simply take a look at an 
existing scala class and you'll see what i mean.

Second: It is not required to open a new PR when making changes, you can 
add commits to the branch of the PR. (note that force-pushes should only be 
done if necessary).

Third, the file count in this PR is dramatically higher than in the last 
one (4 vs 84), is this intended or a mistake?


> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943528#comment-15943528
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

GitHub user p4nna opened a pull request:

https://github.com/apache/flink/pull/3625

[FLINK-5785] Add an Imputer for preparing data

Provides an Imputer for sparse DataSets of Vectors. 
Adds missing values with the mean, median or most frequent value of each 
vector resp. dimension

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/p4nna/flink ml-Imputer-edits

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3625.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3625


commit f2875ac5890564213d5f055d710976d1fede3962
Author: p4nna 
Date:   2017-03-27T09:47:39Z

Add files via upload

commit 8e6909b52dad34d6c4cd6c84618616ac50cd83d1
Author: p4nna 
Date:   2017-03-27T09:49:59Z

Test for Imputer class

Two testclasses which test the functions implemented in the new imputer 
class. One for the rowwise imputing over all vectors and one for the vectorwise 
imputing

commit 0c420a84c136b330135ce180db04d899b5a6f54c
Author: p4nna 
Date:   2017-03-27T09:56:51Z

removed unused imports and methods

commit 9136607e84a0297bb4fb24a53bad9950b86bf116
Author: p4nna 
Date:   2017-03-27T15:58:37Z

Imputer was added

adds missing values in sparse DataSets of Vectors




> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943507#comment-15943507
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

Github user p4nna closed the pull request at:

https://github.com/apache/flink/pull/3620


> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943439#comment-15943439
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

Github user p4nna commented on the issue:

https://github.com/apache/flink/pull/3620
  
What does that mean and how could I fix it?


> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943399#comment-15943399
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/3620
  
It appears that all files are lacking the apache license:

`Too many files with unapproved license: 4 See RAT report in: 
/home/travis/build/apache/flink/target/rat.txt`


> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread Anna Beer (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943222#comment-15943222
 ] 

Anna Beer commented on FLINK-5785:
--

[~Zentol] Thank you for the detailed description, hope I've done it right this 
time:
https://github.com/apache/flink/pull/3620

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943219#comment-15943219
 ] 

ASF GitHub Bot commented on FLINK-5785:
---

GitHub user p4nna opened a pull request:

https://github.com/apache/flink/pull/3620

[FLINK-5785]  Add an Imputer for preparing data

Provides an imputer method which adds missing values to a sparse DataSet of 
vectors. Those can be filled with the mean, the median or the most frequent 
value of each row or optionally column. Like that incomplete data don't have to 
be thrown away, but rather can be used to train a machine learning algorithm

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/p4nna/flink ml-Imputer-edits

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3620.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3620


commit f2875ac5890564213d5f055d710976d1fede3962
Author: p4nna 
Date:   2017-03-27T09:47:39Z

Add files via upload

commit 8e6909b52dad34d6c4cd6c84618616ac50cd83d1
Author: p4nna 
Date:   2017-03-27T09:49:59Z

Test for Imputer class

Two testclasses which test the functions implemented in the new imputer 
class. One for the rowwise imputing over all vectors and one for the vectorwise 
imputing

commit 0c420a84c136b330135ce180db04d899b5a6f54c
Author: p4nna 
Date:   2017-03-27T09:56:51Z

removed unused imports and methods




> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943184#comment-15943184
 ] 

Chesnay Schepler commented on FLINK-5785:
-

[~beera] You've opened the PR against your own fork of flink, and not the 
original apache one. Please close the PR and follow the steps below:

Go to https://github.com/p4nna/flink , there is a drop-down list where you can 
select the branch you want to merge. Select the ml-Imputer-edits branch, this 
should lead you to https://github.com/p4nna/flink/tree/ml-Imputer-edits.

Directly next to the drop-down list you should see a "New pull request" button. 
Push that thing.

In the next page, which should be titled "Comparing changes", make sure that 
"base fork" = "apache/flink", "base" = "master", "head fork" = "p4nna/flink" 
and "compare" = "ml-Imputer-edits".
The page should then look like this: 
https://github.com/apache/flink/compare/master...p4nna:ml-Imputer-edits?expand=1

>From here on out you should know the way, please ping me if anything doesn't 
>work as described.

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread Anna Beer (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943118#comment-15943118
 ] 

Anna Beer commented on FLINK-5785:
--

[~Zentol] https://github.com/p4nna/flink/pull/1

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943111#comment-15943111
 ] 

Chesnay Schepler commented on FLINK-5785:
-

[~beera] Could you provide a link to the pull request?

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-27 Thread Anna Beer (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943106#comment-15943106
 ] 

Anna Beer commented on FLINK-5785:
--

[~skonto] I made a pull request. The imputer works now for a DataSet of vectors 
but I'm not sure if I loaded it up correctly, I'm new to github and all :S

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-08 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901830#comment-15901830
 ] 

Stavros Kontopoulos commented on FLINK-5785:


[~beera] If you do that please follow my approach here:
https://github.com/skonto/flink/blob/6736a66ae1bd2c0efbaa29cf170cabd18b281a8a/flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/Normalizer.scala#L127
I will finish that PR ASAP.

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data

2017-03-08 Thread Anna Beer (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901539#comment-15901539
 ] 

Anna Beer commented on FLINK-5785:
--

I just started to try it

> Add an Imputer for preparing data
> -
>
> Key: FLINK-5785
> URL: https://issues.apache.org/jira/browse/FLINK-5785
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)