[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-10-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15299 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-30 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15299 Ah sorry ok I was a bit confused - I thought the `isSorted` method had only been added after 2.0, but it has been around since the beginning. It is indeed public (well protected) so is a MiMa issue

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-30 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 @MLnick This change will not break binary compatibility currently. It marks ```isSorted``` as deprecated and will break binary compatibility when we delete that method. This should be not a big

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-30 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15299 Hey folks - what exactly is the issue we're concerned about here for binary compat? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15299 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15299 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66107/ Test PASSed. ---

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15299 **[Test build #66107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66107/consoleFull)** for PR 15299 at commit

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15299 **[Test build #66107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66107/consoleFull)** for PR 15299 at commit

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15299 I don't think it can be transient, because if serialized, this data must be part of the model. It is just an array of ints. I don't think lazy makes sense because it is just holding on to an

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15299 hi @srowen , is @transient needed for val selectedFeatures or val filterIndices, one of them? is it good to define filterIndices lazy? --- If your project is set up for it, you can reply to

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 @srowen It looks strange to left it protected, and deprecating it looks ok to me except someone tells me any reason. BTW, please update Python API docs to reflect that ```selectedFeatures``` is

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15299 Yeah I left it deprecated because I really can't figure why it was left protected. If you're ok with this then yes I'd like to restore these parts from the original change. I think it makes slightly

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 Oh, ```isSorted``` is left and it's not introduce binary incompatible right now. Thanks for your remind. I'm neutral for this change. Thanks! --- If your project is set up for it, you can reply

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15299 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66103/ Test PASSed. ---

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15299 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15299 **[Test build #66103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66103/consoleFull)** for PR 15299 at commit

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15299 What is binary incompatible here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 We need ```sort``` cost in any case, and put it in fit/training or model has no difference. So I think if we want to introduce this binary incompatible change, there should be strong

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15299 @srowen I think this PR may fail MiMa tests, since it makes binary incompatible change. The major disagreement between this and #15277 is whether to keep ```selectedFeatures``` sorted. I think

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15299 **[Test build #66103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66103/consoleFull)** for PR 15299 at commit