Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-63401446
@avulanov I'll look around for papers which might allow for comparisons;
I'm not sure offhand.
For experiments, I agree with @manishamde may getting at---dividing into 2
types of tests:
(1) accuracy tests: Here, comparing with single-threaded implementations on
small datasets sounds fine.
(2) scaling tests: By self-speedup tests, I was referring to the scaling
tests which @manishamde is mentioning above. Comparing this ANN implementation
with itself to see how it scales in various ways: increasing # examples, #
nodes in the model, # machines, etc. That might let us spot bottlenecks or
inefficiencies, even if there aren't good alternate implementations available
for comparison.
If I find papers on distributed implementations referencing code available,
I'll be sure to post here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]