Am 20.07.2012 11:47, schrieb Lars Buitinck:
> 2012/7/20 Philipp Singer <[email protected]>:
>> Everything works fine now. The sad thing though is that I still can't
>> really improve the classification results. The only thing I can achieve
>> is to get a higher recall for the classes working well in the background
>> model, but the precision sinks at the same time. Overall I am staying at
>> about the same average score when incorporating the background model.
>>
>> If anyone has any further ideas, please let me know ;)
>
> Well, since Gael already mentioned semi-supervised training using
> label propagation: I have an old PR which has still not been merged,
> mostly because of API reasons, that implements semi-supervised
> training of Naive Bayes using an EM algorithm:
>
>      https://github.com/scikit-learn/scikit-learn/pull/430
>
> I've seen improvements in F1 score when doing text classification with
> this algorithm. It may take some work to get this up to speed with the
> latest scikit-learn, though.

Hey Lars,

Thanks, this looks awesome. I will try it out. The reason why I haven't 
used label propagation techniques yet is, that I could not achieve a 
fast runtime yet, because I have huge amounts of unlabeled/background 
data available.
>
> (Just out of curiosity, which topic models did you try? I'm looking
> into these for my own projects.)

We have been using Mallet's LDA based Parallel Topic Model.

Philipp



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to