Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64304769
@mengxr
Without the local reference copy of `factor` and `shift` arrays, the
runtime is almost three time slower.
DenseVector withMean and withStd: 18.15secs
DenseVector withMean and withoutStd: 18.05secs
DenseVector withoutMean and withStd: 18.54secs
SparseVector withoutMean and withStd: 2.01secs
The following code,
```scala
while (i < size) {
values(i) = (values(i) - shift(i)) * factor(i)
i += 1
}
```
will generate the bytecode
```
L13
LINENUMBER 106 L13
FRAME FULL [org/apache/spark/mllib/feature/StandardScalerModel
org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/Vector
org/apache/spark/mllib/linalg/DenseVector T [D I I] []
ILOAD 7
ILOAD 6
IF_ICMPGE L14
L15
LINENUMBER 107 L15
ALOAD 5
ILOAD 7
ALOAD 5
ILOAD 7
DALOAD
ALOAD 0
INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.shift
()[D
ILOAD 7
DALOAD
DSUB
ALOAD 0
INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.factor
()[D
ILOAD 7
DALOAD
DMUL
DASTORE
L16
LINENUMBER 108 L16
ILOAD 7
ICONST_1
IADD
ISTORE 7
GOTO L13
```
, while with the local reference of the `shift` and `factor` arrays, the
bytecode will be
```
L14
LINENUMBER 107 L14
ALOAD 0
INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.factor
()[D
ASTORE 9
L15
LINENUMBER 108 L15
FRAME FULL [org/apache/spark/mllib/feature/StandardScalerModel
org/apache/spark/mllib/linalg/Vector [D org/apache/spark/mllib/linalg/Vector
org/apache/spark/mllib/linalg/DenseVector T [D I I [D] []
ILOAD 8
ILOAD 7
IF_ICMPGE L16
L17
LINENUMBER 109 L17
ALOAD 6
ILOAD 8
ALOAD 6
ILOAD 8
DALOAD
ALOAD 2
ILOAD 8
DALOAD
DSUB
ALOAD 9
ILOAD 8
DALOAD
DMUL
DASTORE
L18
LINENUMBER 110 L18
ILOAD 8
ICONST_1
IADD
ISTORE 8
GOTO L15
```
You can see that with local reference, the both of the arrays will be in
the stack, so JVM can access the value without calling `INVOKESPECIAL`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]