Can someone help me with this?
Regards, Damodar On Tue, Jul 3, 2012 at 4:27 PM, damodar shetyo <[email protected]>wrote: > Hi, > I plan to use mahout classification feature.I have a lot of data on which > i am planning to train my model.Now i have few queries as follows: > 1)Suppose i have 2 types of data: Spam and not spam (this is just for > example and not real use case , but similar to my real use case).The > amount of spam data is far less then that of non spam data in training > data . I have 2% of spam(or may be 1%) and 98% of nonspam in training. > Now the question is, if i build my model on this training such that it > outputs spam/ nonspam will i get nonspam all the time as non spam data is > more in training? > Will my model correclty identify spam? > > > -- > Regards, > Damodar Shetyo > > -- Regards, Damodar Shetyo
