Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-64283686
@manishamde Thanks for the useful references! It seems that model
parallelization for ANN is a challenging problem. I asked this question to few
presenters on the recent AMP CAMP and they confirm this point given that
present MLlib interfaces are not very well suited for this task. Moreover,
there will be a huge communication overhead during the update step for big
models that can still fit into memory. I took a look at the other algorithms
rather than back propagation listed in this paper:
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=393138&tag=1. A number of
models needs to be evaluated in genetic algorithm which even hardens the task.
Simulated annealing which is a global optimization routine seems to be more
promising. However, with the model distributed across several nodes one needs
to copy data points to all nodes that store the model. I suggest to stick with
the current implementation until one finds a clear and better approach. Does it
make sense?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]