Am 20.07.2012 15:34, schrieb Lars Buitinck:
> 2012/7/20 Philipp Singer <kill...@gmail.com>:
>> I jsut have tried out your implementation of semi-supervised
>> MultinomialNB. The code works flawless, but unfortunately the
>> performance of the algorithm drops extremely when I trie to incorporate
>> my additional data.
>>
>> I am starting to think that my additional data is useless :/
>>
>> Just for the record:
>>
>> training on my 96000 labeled data with MultinomialNB gets me a f1-score
>> of 0.47. Using around 2.000.000 unlabeled additional data using your
>> semi-supervised code achieves a f1-score of 0.39
> Hmm, too bad. Is the extra data from a very different source?
>
Not very different, but documents produced by another kind of users.

I really thought that this data could improve somehow the whole 
classification process, because fitting a model on the extra data alone 
leads to an f1-score of 0.27, which is pretty good for that data.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to