GitHub user staple opened a pull request:
https://github.com/apache/spark/pull/2491
[SPARK-1655][MLLIB] Add option for distributed naive bayes model.
Adds an option to store a naive bayes model distributively. The default
behavior, in which the whole model is stored on the driver node, remains
unchanged. NaiveBayes.trainâs new distMode parameter can be used to request
that a model be distributed.
When distributed, the model is stored as an RDD of model blocks. Each block
contains the labels and prior and conditional probabilities for a set of label
classes, allowing fast computation of the maximum a posteriori prediction for
each block and straightforward aggregation of these MAP predictions across
blocks.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/staple/spark SPARK-1655
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2491.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2491
----
commit 4594761dd035d2d01b91fb36a9029bda9f34c4a1
Author: Aaron Staple <[email protected]>
Date: 2014-09-22T05:02:28Z
[SPARK-1655][MLLIB] Add option for distributed naive bayes model.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]