Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66336110
@avulanov
1. I did the same optimization for MLlib in [my recently
PRs](https://github.com/apache/spark/commits/master?author=dbtsai).
* Accessing the values in dense/sparse vector directly is very slow without
having a local reference of primitive array due to the dereference. See #3577
and #3435. There is bytecode analysis for this issue in #3435
* Breeze's foreachActive is very slow, so I implemented a 4x faster version
in #3288 My experience is that if Breeze is used in critical code path, it has
to be cautious.
2. I don't check out your ANN implementation yet, but I will check today.
I'll send you our optimized Gradient Computation code for MLOR. Will be
interesting to see the new benchmark compared with the one you tested.
3. See page 27 at Prof. CJ Lin's slide.
http://www.csie.ntu.edu.tw/~cjlin/talks/SFmeetup.pdf It's just doing the
feature expansion by mapping the data into higher dimension space.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]