Re: Does mahout classification depends on amount of data in each category?

Sean Owen Tue, 03 Jul 2012 07:42:50 -0700

(Please don't "ping" your questions on the list -- bad form and makes
people less likely to answer.)


You do not have to have equal numbers of positive/negative examples. I
think you need to go back and read up on the basics of how Bayesian
classification works before you dig in to Mahout. This is exactly why
the frequency of the class/label is part of the calculation.

On Tue, Jul 3, 2012 at 4:54 PM, damodar shetyo <[email protected]> wrote:
> Can someone help me with this?
>
>
> Regards,
> Damodar
>
> On Tue, Jul 3, 2012 at 4:27 PM, damodar shetyo <[email protected]>wrote:
>
>> Hi,
>> I plan to use mahout classification feature.I have a lot of data on which
>> i am planning to train my model.Now i have few queries as follows:
>> 1)Suppose i have 2 types of data:  Spam and not spam (this is just for
>> example and not real use case , but similar  to my real use case).The
>> amount of  spam data is far less then that of non spam data in training
>> data . I have 2% of spam(or may be 1%)  and 98% of nonspam in training.
>> Now the question is, if i build my model on this training  such that it
>> outputs spam/ nonspam will i get nonspam  all the time as non spam data is
>> more in training?
>> Will my model correclty identify spam?
>>
>>
>> --
>> Regards,
>> Damodar Shetyo
>>
>>
>
>
> --
> Regards,
> Damodar Shetyo

Re: Does mahout classification depends on amount of data in each category?

Reply via email to