Yes, 
https://github.com/apache/spark/blob/v1.5.0/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala#L158
is the method you are interested in. It does normalize the
probabilities and return them to non-log-space. So you can use
predictProbabilities to get the actual posterior class probabilities
for a given input:
https://github.com/apache/spark/blob/v1.5.0/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala#L130

On Thu, Sep 10, 2015 at 6:32 PM, Adamantios Corais
<adamantios.cor...@gmail.com> wrote:
> Thanks Sean. As far as I can see probabilities are NOT normalized;
> denominator isn't implemented in either v1.1.0 or v1.5.0 (by denominator, I
> refer to the probability of feature X). So, for given lambda, how to compute
> the denominator? FYI:
> https://github.com/apache/spark/blob/v1.5.0/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
>
> // Adamantios
>
>
>
> On Thu, Sep 10, 2015 at 7:03 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> The log probabilities are unlikely to be very large, though the
>> probabilities may be very small. The direct answer is to exponentiate
>> brzPi + brzTheta * testData.toBreeze -- apply exp(x).
>>
>> I have forgotten whether the probabilities are normalized already
>> though. If not you'll have to normalize to get them to sum to 1 and be
>> real class probabilities. This is better done in log space though.
>>
>> On Thu, Sep 10, 2015 at 5:12 PM, Adamantios Corais
>> <adamantios.cor...@gmail.com> wrote:
>> > great. so, provided that model.theta represents the log-probabilities
>> > and
>> > (hence the result of brzPi + brzTheta * testData.toBreeze is a big
>> > number
>> > too), how can I get back the non-log-probabilities which - apparently -
>> > are
>> > bounded between 0.0 and 1.0?
>> >
>> >
>> > // Adamantios
>> >
>> >
>> >
>> > On Tue, Sep 1, 2015 at 12:57 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> (pedantic: it's the log-probabilities)
>> >>
>> >> On Tue, Sep 1, 2015 at 10:48 AM, Yanbo Liang <yblia...@gmail.com>
>> >> wrote:
>> >> > Actually
>> >> > brzPi + brzTheta * testData.toBreeze
>> >> > is the probabilities of the input Vector on each class, however it's
>> >> > a
>> >> > Breeze Vector.
>> >> > Pay attention the index of this Vector need to map to the
>> >> > corresponding
>> >> > label index.
>> >> >
>> >> > 2015-08-28 20:38 GMT+08:00 Adamantios Corais
>> >> > <adamantios.cor...@gmail.com>:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I am trying to change the following code so as to get the
>> >> >> probabilities
>> >> >> of
>> >> >> the input Vector on each class (instead of the class itself with the
>> >> >> highest
>> >> >> probability). I know that this is already available as part of the
>> >> >> most
>> >> >> recent release of Spark but I have to use Spark 1.1.0.
>> >> >>
>> >> >> Any help is appreciated.
>> >> >>
>> >> >>> override def predict(testData: Vector): Double = {
>> >> >>>     labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
>> >> >>>   }
>> >> >>
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> https://github.com/apache/spark/blob/v1.1.0/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
>> >> >>
>> >> >>
>> >> >> // Adamantios
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to