2012/7/20 Philipp Singer <[email protected]>: > Am 20.07.2012 11:47, schrieb Lars Buitinck: >> 2012/7/20 Philipp Singer <[email protected]>: >>> Everything works fine now. The sad thing though is that I still can't >>> really improve the classification results. The only thing I can achieve >>> is to get a higher recall for the classes working well in the background >>> model, but the precision sinks at the same time. Overall I am staying at >>> about the same average score when incorporating the background model. >>> >>> If anyone has any further ideas, please let me know ;) >> >> Well, since Gael already mentioned semi-supervised training using >> label propagation: I have an old PR which has still not been merged, >> mostly because of API reasons, that implements semi-supervised >> training of Naive Bayes using an EM algorithm: >> >> https://github.com/scikit-learn/scikit-learn/pull/430 >> >> I've seen improvements in F1 score when doing text classification with >> this algorithm. It may take some work to get this up to speed with the >> latest scikit-learn, though. > > Hey Lars, > > Thanks, this looks awesome. I will try it out. The reason why I haven't > used label propagation techniques yet is, that I could not achieve a > fast runtime yet, because I have huge amounts of unlabeled/background > data available. >> >> (Just out of curiosity, which topic models did you try? I'm looking >> into these for my own projects.) > > We have been using Mallet's LDA based Parallel Topic Model.
You could also try to extract the top 100 singular vectors using sklearn.decomposition.RandomizedPCA or gensim. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
