[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-16 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 Merged to master. Thanks @hhbyyh and also everyone for reviews. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74651/ Test PASSed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #74651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74651/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-16 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 Created [SPARK-19969](https://issues.apache.org/jira/browse/SPARK-19969) to track doc and examples to be done for 2.2 release. I can help with this if you're tied up. --- If your project is set up

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #74651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74651/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-16 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-08 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Thanks @MLnick for being the Shepherd and providing consistent help on discussion and review. The performance test matches what I got from my local environment. --- If your project is set up for

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74216/ Test PASSed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #74216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74216/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #74216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74216/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-08 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 By the way out of curiosity, I tested things out on a cluster (4x workers, 192 cores & 480GB RAM total), with 100 columns of 100 million doubles each, 1% `NaN` occurrence. Reading from a Parquet

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-08 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 Made a few last comments. LGTM. cc @sethah @jkbradley I am going to merge this for 2.2. Let me know if you have any final comments. --- If your project is set up for it, you can reply to

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74038/ Test PASSed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #74038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74038/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #74038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74038/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73868/ Test PASSed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #73868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73868/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #73868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73868/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-03 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Hi @MLnick I changed the surrogateDF format for better extensibility in the last update and added unit tests for multi-column support. Let me know if I miss anything. inputCol1|inputCol2

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-02 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Thanks a lot for making a pass @MLnick. The last update mainly focus on the interface and behavior change. I'll make a pass and also address your comments. --- If your project is set up for it,

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73753/ Test PASSed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #73753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73753/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #73753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73753/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-03-02 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-02-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Looks like CI was interrupted. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73268/console --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73268/ Test FAILed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-02-21 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Sent an update to add multi-column support. Let me know if this is not what you have in mind. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #73268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73268/testReport)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66516/ Test PASSed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #66516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66516/consoleFull)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-07 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11601 Thanks for the comments @MLnick @jkbradley @sethah I have sent update according to the comments and change `ImputerModel.surrogate` and persistence format into DataFrame. As for the

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #66516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66516/consoleFull)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66476/ Test PASSed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #66476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66476/consoleFull)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #66476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66476/consoleFull)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-27 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 The reason we didn't support mode was partly due to time and mostly due to not being certain about the performance (e.g. if mode was called on a non-categorical double column it could become quite

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/11601 I agree we should plan to support multiple columns and Vector columns in the future. The 2 places I noticed may cause problems in the future are: * ```ImputerModel.surrogate```: This is nice

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/11601 I'll make a review pass now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/11601 So, I am trying to refresh my memory on this PR. I see we settled on not supporting vector type and not supporting mode. Did we ever settle on supporting multiple input columns? I am not sure I see

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 @hhbyyh seems behavior of approx quantiles may have changed somewhere? Can you take a look into it? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65908/ Test FAILed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #65908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65908/consoleFull)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #65908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65908/consoleFull)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 Sorry for the delay - been a bit tied up! Overall looks good. Will leave open a day or two for @sethah or @jkbradley make any final comment. --- If your project is set up for it, you can reply to

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11601 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65053/ Test PASSed. ---

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #65053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65053/consoleFull)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-09-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11601 **[Test build #65053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65053/consoleFull)** for PR 11601 at commit

[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-08-01 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/11601 @hhbyyh could you update the since annotations to target `2.1.0`? @jkbradley if you have a chance to review would be great. Thanks! --- If your project is set up for it, you can reply to this email