Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/1290#issuecomment-63401446
  
    @avulanov I'll look around for papers which might allow for comparisons; 
I'm not sure offhand.
    
    For experiments, I agree with @manishamde may getting at---dividing into 2 
types of tests:
    (1) accuracy tests: Here, comparing with single-threaded implementations on 
small datasets sounds fine.
    (2) scaling tests: By self-speedup tests, I was referring to the scaling 
tests which @manishamde is mentioning above.  Comparing this ANN implementation 
with itself to see how it scales in various ways: increasing # examples, # 
nodes in the model, # machines, etc.  That might let us spot bottlenecks or 
inefficiencies, even if there aren't good alternate implementations available 
for comparison.
    
    If I find papers on distributed implementations referencing code available, 
I'll be sure to post here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to