[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64322398 [Test build #23825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23825/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64331011 [Test build #23825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23825/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64331017 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3435#discussion_r20852499 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -97,30 +97,57 @@ class StandardScalerModel private[mllib] (

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3435#discussion_r20885451 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -97,30 +97,57 @@ class StandardScalerModel private[mllib] (

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-25 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3435 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/3435 [SPARK-4581][MLlib] Refactorize StandardScaler to improve the transformation performance The following optimizations are done to improve the StandardScaler model transformation performance.

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64288342 [Test build #23800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23800/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64288808 [Test build #23801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23801/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64292549 [Test build #23803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23803/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64294665 @dbtsai Did you measure the performance gain from the following change? ~~~ 3) Have a local reference to shift and factor array so JVM can locate the value

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64295699 [Test build #23805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23805/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64296434 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64296425 [Test build #23800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23800/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64296926 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64296918 [Test build #23801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23801/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-6421 [Test build #23803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23803/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-6426 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64302807 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64302802 [Test build #23805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23805/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64304769 @mengxr Without the local reference copy of `factor` and `shift` arrays, the runtime is almost three time slower. DenseVector withMean and withStd:

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64304881 PS, we may want to go though the mllib codebase, and find things like this. This issue impacts the performance quite a lot. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64307020 @dbtsai What if we mark `factor` and `shift` as `private[this]`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64308394 Wow, with ```scala private[this] val factor: Array[Double] = { val f = Array.ofDim[Double](variance.size) var i = 0 while (i f.size) {

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64309996 By default, Scala generates Java methods for members, no matter whether you use `val` or `def`. That's why you saw `invokespecial` for `shift` and `factor`. But if a

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3435#discussion_r20843292 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -87,6 +85,8 @@ class StandardScalerModel private[mllib] ( f

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3435#discussion_r20843299 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -97,30 +97,51 @@ class StandardScalerModel private[mllib] (

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3435#discussion_r20843294 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -97,30 +97,51 @@ class StandardScalerModel private[mllib] (

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3435#discussion_r20843304 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -97,30 +97,51 @@ class StandardScalerModel private[mllib] (

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64312182 [Test build #23817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23817/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64317319 [Test build #23817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23817/consoleFull) for PR 3435 at commit

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3435#issuecomment-64317323 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3435#discussion_r20847175 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -97,30 +97,57 @@ class StandardScalerModel private[mllib] (

[GitHub] spark pull request: [SPARK-4581][MLlib] Refactorize StandardScaler...

2014-11-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3435#discussion_r20847415 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -97,30 +97,57 @@ class StandardScalerModel private[mllib] (