[GitHub] spark pull request: [SPARK-1655][MLLIB] Add option for distributed...

staple Mon, 22 Sep 2014 11:18:14 -0700

GitHub user staple opened a pull request:

    https://github.com/apache/spark/pull/2491


    [SPARK-1655][MLLIB] Add option for distributed naive bayes model.

    Adds an option to store a naive bayes model distributively. The default 
behavior, in which the whole model is stored on the driver node, remains 
unchanged. NaiveBayes.trainâs new distMode parameter can be used to request 
that a model be distributed.
    
    When distributed, the model is stored as an RDD of model blocks. Each block 
contains the labels and prior and conditional probabilities for a set of label 
classes, allowing fast computation of the maximum a posteriori prediction for 
each block and straightforward aggregation of these MAP predictions across 
blocks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/staple/spark SPARK-1655

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2491
    
----
commit 4594761dd035d2d01b91fb36a9029bda9f34c4a1
Author: Aaron Staple <[email protected]>
Date:   2014-09-22T05:02:28Z

    [SPARK-1655][MLLIB] Add option for distributed naive bayes model.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-1655][MLLIB] Add option for distributed...

Reply via email to