Yes, https://github.com/apache/spark/blob/v1.5.0/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala#L158 is the method you are interested in. It does normalize the probabilities and return them to non-log-space. So you can use predictProbabilities to get the actual posterior class probabilities for a given input: https://github.com/apache/spark/blob/v1.5.0/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala#L130
On Thu, Sep 10, 2015 at 6:32 PM, Adamantios Corais <adamantios.cor...@gmail.com> wrote: > Thanks Sean. As far as I can see probabilities are NOT normalized; > denominator isn't implemented in either v1.1.0 or v1.5.0 (by denominator, I > refer to the probability of feature X). So, for given lambda, how to compute > the denominator? FYI: > https://github.com/apache/spark/blob/v1.5.0/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala > > // Adamantios > > > > On Thu, Sep 10, 2015 at 7:03 PM, Sean Owen <so...@cloudera.com> wrote: >> >> The log probabilities are unlikely to be very large, though the >> probabilities may be very small. The direct answer is to exponentiate >> brzPi + brzTheta * testData.toBreeze -- apply exp(x). >> >> I have forgotten whether the probabilities are normalized already >> though. If not you'll have to normalize to get them to sum to 1 and be >> real class probabilities. This is better done in log space though. >> >> On Thu, Sep 10, 2015 at 5:12 PM, Adamantios Corais >> <adamantios.cor...@gmail.com> wrote: >> > great. so, provided that model.theta represents the log-probabilities >> > and >> > (hence the result of brzPi + brzTheta * testData.toBreeze is a big >> > number >> > too), how can I get back the non-log-probabilities which - apparently - >> > are >> > bounded between 0.0 and 1.0? >> > >> > >> > // Adamantios >> > >> > >> > >> > On Tue, Sep 1, 2015 at 12:57 PM, Sean Owen <so...@cloudera.com> wrote: >> >> >> >> (pedantic: it's the log-probabilities) >> >> >> >> On Tue, Sep 1, 2015 at 10:48 AM, Yanbo Liang <yblia...@gmail.com> >> >> wrote: >> >> > Actually >> >> > brzPi + brzTheta * testData.toBreeze >> >> > is the probabilities of the input Vector on each class, however it's >> >> > a >> >> > Breeze Vector. >> >> > Pay attention the index of this Vector need to map to the >> >> > corresponding >> >> > label index. >> >> > >> >> > 2015-08-28 20:38 GMT+08:00 Adamantios Corais >> >> > <adamantios.cor...@gmail.com>: >> >> >> >> >> >> Hi, >> >> >> >> >> >> I am trying to change the following code so as to get the >> >> >> probabilities >> >> >> of >> >> >> the input Vector on each class (instead of the class itself with the >> >> >> highest >> >> >> probability). I know that this is already available as part of the >> >> >> most >> >> >> recent release of Spark but I have to use Spark 1.1.0. >> >> >> >> >> >> Any help is appreciated. >> >> >> >> >> >>> override def predict(testData: Vector): Double = { >> >> >>> labels(brzArgmax(brzPi + brzTheta * testData.toBreeze)) >> >> >>> } >> >> >> >> >> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> https://github.com/apache/spark/blob/v1.1.0/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala >> >> >> >> >> >> >> >> >> // Adamantios >> >> >> >> >> >> >> >> > >> > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org