Github user sethah commented on the issue:
https://github.com/apache/spark/pull/12761
@daniel-siegmann-aol Good points, and thanks for following up on this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user daniel-siegmann-aol commented on the issue:
https://github.com/apache/spark/pull/12761
Sorry for the delay, I had forgotten about this.
@dbtsai this patch will only help in the case where most of your feature
weights are zero _after_ aggregation. As I mentioned in
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/12761
@daniel-siegmann-aol I would rather like to propose to close this if there
is no explicit argument against ^.
---
If your project is set up for it, you can reply to this email and have your
rep
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/12761
I'm benchmarking LOR with 14M features of internal company dataset
(unfortunately, it's not public).
Regrading using sparse data structure for aggregation, I'm not so sure how
much this wil
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/12761
I believe the `VectorBuilder` class should be a part of the new ML linalg
library (which maybe didn't exist when this was created?) instead of MLlib.
---
If your project is set up for it, you can re
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/12761
@daniel-siegmann-aol ok - I think between @dbtsai and me we can run some
performance tests. I've been doing some work on Criteo Display Ad Challenge
data which makes a nice sparse benchmark dataset.
Github user daniel-siegmann-aol commented on the issue:
https://github.com/apache/spark/pull/12761
I'll work on merging the changes. However, I no longer work at AOL so I
don't have the data to do thoroughly test it.
---
If your project is set up for it, you can reply to this email a
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/12761
@daniel-siegmann-aol will you have time to work on this? I think it will be
important to have this in Spark 2.1. If not, I think between @dbtsai, @sethah
and myself we can help take it forward.
---
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/12761
@daniel-siegmann-aol I think it's a good time to update this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does n