[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951429#comment-15951429 ] ASF GitHub Bot commented on FLINK-5785: --- Github user skonto commented on the issue: https://github.com/apache/flink/pull/3659 @p4nna although there is certainly interoperability between scala and java could you try first add the Imputer in the scala API. I will add some comments to the current implementation in java shortly. > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951183#comment-15951183 ] ASF GitHub Bot commented on FLINK-5785: --- GitHub user p4nna opened a pull request: https://github.com/apache/flink/pull/3659 [FLINK-5785] Add an Imputer for preparing data Adds an imputer class including tests which is able to impute values into sparse DataSets of Vectors. One can choose if the median, the mean or the most frequent value of a vector or row should be inserted You can merge this pull request into a Git repository by running: $ git pull https://github.com/p4nna/flink imputer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3659.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3659 commit 88514a98642763c5ad962efecc44bef887b84110 Author: p4nnaDate: 2017-03-30T08:00:33Z Added an imputer class with Strategy class The imputer imputes missing values into a sparse DataSet of Vectors with different strategies which can be chosen out of the existing ones in the strategy enum class (mean, median or most frequent value) in a row or column commit d17c6de2ad9456a58d24ac4cda44b5ef5ce5c216 Author: p4nna Date: 2017-03-30T08:01:47Z deleted class in false destination commit e4b336fdbf93084c30a8ee0067efcd7a4729c0e1 Author: p4nna Date: 2017-03-30T08:02:07Z deleted class in false destination commit ee6d57cfa669876f983cbf10eb6ffdd02b5c3052 Author: p4nna Date: 2017-03-30T08:04:04Z added imputer class with strategy class the imputer impustes values into a sparse DataSet of vectors with different strategies (mean, median or most frequent value as listed in the strategy class) commit 57524586cbd63e2f0dfdc70cb34df82e6451c3be Author: p4nna Date: 2017-03-30T08:04:47Z added a test class for the new imputer class commit 72ebd5e210f583cd7e8df21ea8d73c06e835e198 Author: p4nna Date: 2017-03-30T08:08:49Z [FLINK-5785] Add an Imputer for preparing data, removed unnecessary things and comments, added license commit 31dbfc704247b0c4723d6d3091a16759fbe18041 Author: p4nna Date: 2017-03-30T08:09:26Z [FLINK-5785] Add an Imputer for preparing data added license commit d0f7b816bea49090633b4bc85762bbf70b192b27 Author: p4nna Date: 2017-03-30T08:10:03Z [FLINK-5785] Add an Imputer for preparing data added license commit 76f996e2ddc5d912c947f20e2109bd53973c8091 Author: p4nna Date: 2017-03-30T08:10:33Z [FLINK-5785] Add an Imputer for preparing data added license commit d533805c7b37888632238ce87e73e6ef9d081d02 Author: p4nna Date: 2017-03-31T15:54:37Z [FLINK-5785] Add an Imputer for preparing data should work now. commit 10dcdfab0ea27e6191cf6d0efad05a563f389ba4 Author: p4nna Date: 2017-03-31T15:56:04Z [FLINK-5785] Add an Imputer for preparing data was in wrong place commit 8e67f01ba1fb707b808473f4961902542aaca369 Author: p4nna Date: 2017-03-31T15:56:21Z [FLINK-5785] Add an Imputer for preparing data was in wrong place commit c3fdc87e0e9fc07785b4b4b8dc2b1fde4c756d35 Author: p4nna Date: 2017-03-31T15:56:59Z [FLINK-5785] Add an Imputer for preparing data should work now commit 07507b5ca0f1cfebc38f96bb8db32c10f2186bbf Author: p4nna Date: 2017-03-31T15:57:37Z [FLINK-5785] Add an Imputer for preparing data tests should work now > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948597#comment-15948597 ] ASF GitHub Bot commented on FLINK-5785: --- Github user p4nna closed the pull request at: https://github.com/apache/flink/pull/3631 > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948598#comment-15948598 ] ASF GitHub Bot commented on FLINK-5785: --- Github user p4nna closed the pull request at: https://github.com/apache/flink/pull/3625 > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944851#comment-15944851 ] Stavros Kontopoulos commented on FLINK-5785: [~beera]Thnx I will have a look ASAP. > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944817#comment-15944817 ] Chesnay Schepler commented on FLINK-5785: - Please remove all files that shouldn't be part of the pull request. > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944702#comment-15944702 ] Anna Beer commented on FLINK-5785: -- 1: Thanks I added it to all 4 files 2: Thank you, didn't know that 3: was a mistake, accidentally commited a folder too high. Should I try to remove the files I didn't change from my commit or doesn't it matter? > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943728#comment-15943728 ] ASF GitHub Bot commented on FLINK-5785: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/3625 Regarding the license: Every (non-binary) file in the flink repository must have the apache license at the very top of the file. Simply take a look at an existing scala class and you'll see what i mean. Second: It is not required to open a new PR when making changes, you can add commits to the branch of the PR. (note that force-pushes should only be done if necessary). Third, the file count in this PR is dramatically higher than in the last one (4 vs 84), is this intended or a mistake? > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943528#comment-15943528 ] ASF GitHub Bot commented on FLINK-5785: --- GitHub user p4nna opened a pull request: https://github.com/apache/flink/pull/3625 [FLINK-5785] Add an Imputer for preparing data Provides an Imputer for sparse DataSets of Vectors. Adds missing values with the mean, median or most frequent value of each vector resp. dimension You can merge this pull request into a Git repository by running: $ git pull https://github.com/p4nna/flink ml-Imputer-edits Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3625.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3625 commit f2875ac5890564213d5f055d710976d1fede3962 Author: p4nnaDate: 2017-03-27T09:47:39Z Add files via upload commit 8e6909b52dad34d6c4cd6c84618616ac50cd83d1 Author: p4nna Date: 2017-03-27T09:49:59Z Test for Imputer class Two testclasses which test the functions implemented in the new imputer class. One for the rowwise imputing over all vectors and one for the vectorwise imputing commit 0c420a84c136b330135ce180db04d899b5a6f54c Author: p4nna Date: 2017-03-27T09:56:51Z removed unused imports and methods commit 9136607e84a0297bb4fb24a53bad9950b86bf116 Author: p4nna Date: 2017-03-27T15:58:37Z Imputer was added adds missing values in sparse DataSets of Vectors > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943507#comment-15943507 ] ASF GitHub Bot commented on FLINK-5785: --- Github user p4nna closed the pull request at: https://github.com/apache/flink/pull/3620 > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943439#comment-15943439 ] ASF GitHub Bot commented on FLINK-5785: --- Github user p4nna commented on the issue: https://github.com/apache/flink/pull/3620 What does that mean and how could I fix it? > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943399#comment-15943399 ] ASF GitHub Bot commented on FLINK-5785: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/3620 It appears that all files are lacking the apache license: `Too many files with unapproved license: 4 See RAT report in: /home/travis/build/apache/flink/target/rat.txt` > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943222#comment-15943222 ] Anna Beer commented on FLINK-5785: -- [~Zentol] Thank you for the detailed description, hope I've done it right this time: https://github.com/apache/flink/pull/3620 > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943219#comment-15943219 ] ASF GitHub Bot commented on FLINK-5785: --- GitHub user p4nna opened a pull request: https://github.com/apache/flink/pull/3620 [FLINK-5785] Add an Imputer for preparing data Provides an imputer method which adds missing values to a sparse DataSet of vectors. Those can be filled with the mean, the median or the most frequent value of each row or optionally column. Like that incomplete data don't have to be thrown away, but rather can be used to train a machine learning algorithm You can merge this pull request into a Git repository by running: $ git pull https://github.com/p4nna/flink ml-Imputer-edits Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3620.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3620 commit f2875ac5890564213d5f055d710976d1fede3962 Author: p4nnaDate: 2017-03-27T09:47:39Z Add files via upload commit 8e6909b52dad34d6c4cd6c84618616ac50cd83d1 Author: p4nna Date: 2017-03-27T09:49:59Z Test for Imputer class Two testclasses which test the functions implemented in the new imputer class. One for the rowwise imputing over all vectors and one for the vectorwise imputing commit 0c420a84c136b330135ce180db04d899b5a6f54c Author: p4nna Date: 2017-03-27T09:56:51Z removed unused imports and methods > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943184#comment-15943184 ] Chesnay Schepler commented on FLINK-5785: - [~beera] You've opened the PR against your own fork of flink, and not the original apache one. Please close the PR and follow the steps below: Go to https://github.com/p4nna/flink , there is a drop-down list where you can select the branch you want to merge. Select the ml-Imputer-edits branch, this should lead you to https://github.com/p4nna/flink/tree/ml-Imputer-edits. Directly next to the drop-down list you should see a "New pull request" button. Push that thing. In the next page, which should be titled "Comparing changes", make sure that "base fork" = "apache/flink", "base" = "master", "head fork" = "p4nna/flink" and "compare" = "ml-Imputer-edits". The page should then look like this: https://github.com/apache/flink/compare/master...p4nna:ml-Imputer-edits?expand=1 >From here on out you should know the way, please ping me if anything doesn't >work as described. > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943118#comment-15943118 ] Anna Beer commented on FLINK-5785: -- [~Zentol] https://github.com/p4nna/flink/pull/1 > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943111#comment-15943111 ] Chesnay Schepler commented on FLINK-5785: - [~beera] Could you provide a link to the pull request? > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943106#comment-15943106 ] Anna Beer commented on FLINK-5785: -- [~skonto] I made a pull request. The imputer works now for a DataSet of vectors but I'm not sure if I loaded it up correctly, I'm new to github and all :S > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901830#comment-15901830 ] Stavros Kontopoulos commented on FLINK-5785: [~beera] If you do that please follow my approach here: https://github.com/skonto/flink/blob/6736a66ae1bd2c0efbaa29cf170cabd18b281a8a/flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/Normalizer.scala#L127 I will finish that PR ASAP. > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901539#comment-15901539 ] Anna Beer commented on FLINK-5785: -- I just started to try it > Add an Imputer for preparing data > - > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)