Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4087#discussion_r26169280
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
    @@ -262,4 +303,58 @@ object NaiveBayes {
       def train(input: RDD[LabeledPoint], lambda: Double): NaiveBayesModel = {
         new NaiveBayes(lambda).run(input)
       }
    +
    +
    +  /**
    +   * Trains a Naive Bayes model given an RDD of `(label, features)` pairs.
    +   *
    +   * The model type can be set to either Multinomial NB 
([[http://tinyurl.com/lsdw6p]])
    +   * or Bernoulli NB ([[http://tinyurl.com/p7c96j6]]). The Multinomial NB 
can handle
    +   * discrete count data and can be called by setting the model type to 
"multinomial".
    +   * For example, it can be used with word counts or TF_IDF vectors of 
documents.
    +   * The Bernoulli model fits presence or absence (0-1) counts. By making 
every vector a
    +   * 0-1 vector and setting the model type to "bernoulli", the  fits and 
predicts as
    +   * Bernoulli NB.
    +   *
    +   * @param input RDD of `(label, array of features)` pairs.  Every vector 
should be a frequency
    +   *              vector or a count vector.
    +   * @param lambda The smoothing parameter
    +   *
    +   * @param modelType The type of NB model to fit from the enumeration 
NaiveBayesModels, can be
    +   *              multinomial or bernoulli
    +   */
    +  def train(input: RDD[LabeledPoint], lambda: Double, modelType: String): 
NaiveBayesModel = {
    +    new NaiveBayes(lambda, MODELTYPE.fromString(modelType)).run(input)
    +  }
    +
    +
    +  /**
    +   * Model types supported in Naive Bayes:
    +   * multinomial and Bernoulli currently supported
    +   */
    +  sealed abstract class ModelType
    +
    +  object MODELTYPE {
    +    final val MULTINOMIAL_STRING = "multinomial"
    +    final val BERNOULLI_STRING = "bernoulli"
    +
    +    def fromString(modelType: String): ModelType = modelType match {
    +      case MULTINOMIAL_STRING => Multinomial
    +      case BERNOULLI_STRING => Bernoulli
    +      case _ =>
    +        throw new IllegalArgumentException(s"Cannot recognize NaiveBayes 
ModelType: $modelType")
    +    }
    +  }
    +
    +  final val ModelType = MODELTYPE
    --- End diff --
    
    Add doc, perhaps something like "Provides static methods for using 
ModelType"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to