Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64322398
[Test build #23825 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23825/consoleFull)
for PR 3435 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64331011
[Test build #23825 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23825/consoleFull)
for PR 3435 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64331017
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20852499
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -97,30 +97,57 @@ class StandardScalerModel private[mllib] (
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20885451
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -97,30 +97,57 @@ class StandardScalerModel private[mllib] (
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/3435
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3435
[SPARK-4581][MLlib] Refactorize StandardScaler to improve the
transformation performance
The following optimizations are done to improve the StandardScaler model
transformation performance.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64288342
[Test build #23800 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23800/consoleFull)
for PR 3435 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64288808
[Test build #23801 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23801/consoleFull)
for PR 3435 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64292549
[Test build #23803 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23803/consoleFull)
for PR 3435 at commit
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64294665
@dbtsai Did you measure the performance gain from the following change?
~~~
3) Have a local reference to shift and factor array so JVM can locate the
value
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64295699
[Test build #23805 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23805/consoleFull)
for PR 3435 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64296434
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64296425
[Test build #23800 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23800/consoleFull)
for PR 3435 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64296926
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64296918
[Test build #23801 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23801/consoleFull)
for PR 3435 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-6421
[Test build #23803 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23803/consoleFull)
for PR 3435 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-6426
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64302807
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64302802
[Test build #23805 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23805/consoleFull)
for PR 3435 at commit
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64304769
@mengxr
Without the local reference copy of `factor` and `shift` arrays, the
runtime is almost three time slower.
DenseVector withMean and withStd:
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64304881
PS, we may want to go though the mllib codebase, and find things like this.
This issue impacts the performance quite a lot.
---
If your project is set up for it, you can
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64307020
@dbtsai What if we mark `factor` and `shift` as `private[this]`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64308394
Wow, with
```scala
private[this] val factor: Array[Double] = {
val f = Array.ofDim[Double](variance.size)
var i = 0
while (i f.size) {
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64309996
By default, Scala generates Java methods for members, no matter whether you
use `val` or `def`. That's why you saw `invokespecial` for `shift` and
`factor`. But if a
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20843292
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -87,6 +85,8 @@ class StandardScalerModel private[mllib] (
f
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20843299
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -97,30 +97,51 @@ class StandardScalerModel private[mllib] (
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20843294
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -97,30 +97,51 @@ class StandardScalerModel private[mllib] (
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20843304
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -97,30 +97,51 @@ class StandardScalerModel private[mllib] (
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64312182
[Test build #23817 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23817/consoleFull)
for PR 3435 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64317319
[Test build #23817 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23817/consoleFull)
for PR 3435 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64317323
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20847175
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -97,30 +97,57 @@ class StandardScalerModel private[mllib] (
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20847415
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -97,30 +97,57 @@ class StandardScalerModel private[mllib] (
34 matches
Mail list logo