Am 14.09.2012 15:28, schrieb Philipp Singer:
> Okay, so I did a fast chi2 check and it seems like some LDA features
> have high p-values, so they should be helpful at least.

Oh, sorry. We want the lowest p-values, right? But that's the same case. 
There are many with low p-values.
>
> Am 14.09.2012 15:06, schrieb Andreas Müller:
>> I'd be interested in the outcome.
>> Let us know when you get it to work :)
>>
>>
>> ----- Ursprüngliche Mail -----
>> Von: "Philipp Singer" <kill...@gmail.com>
>> An: scikit-learn-general@lists.sourceforge.net
>> Gesendet: Freitag, 14. September 2012 14:00:48
>> Betreff: Re: [Scikit-learn-general] Combining TFIDF and LDA features
>>
>> Am 14.09.2012 14:53, schrieb Andreas Müller:
>>> Hi Philipp.
>>
>> Hey Andreas!
>>> First, you should ensure that the features all have approximately the
>>> same scale.
>>> For example they should all be between zero and one - if the LDA
>>> features
>>> are much smaller than the other ones, then they will probably not be
>>> weighted much.
>>
>> LDA features sum up to 1 for one sample, because they describe the
>> probability of one sample to belong to the different topics (in this
>> case 500). So basically, they are between 0 and 1.
>>>
>>> Which LDA package did you use?
>>
>> We used Mallet's LDA implementation, because from experience they have
>> the most established smoothing processes. http://mallet.cs.umass.edu/
>>
>> If we just train on the LDA features we btw get reasonable results, a
>> bit worse than pure TFIDF.
>>>
>>> I am not very experienced with this kind of model, but maybe it would
>>> be helpful
>>> to look at some univariate statistics, like
>>> ``feature_selection.chi2``, to see
>>> if the LDA features are actually helpful.
>>
>> Yeah, this would be something I could look into. I have already tried to
>> to feature selection with chi2 but not actually looked at the specific
>> statistics.
>>>
>>> Cheers,
>>> Andy
>>
>> Regards,
>> Philipp
>>>
>>>
>>> ----- Ursprüngliche Mail -----
>>> Von: "Philipp Singer" <kill...@gmail.com>
>>> An: scikit-learn-general@lists.sourceforge.net
>>> Gesendet: Freitag, 14. September 2012 13:47:30
>>> Betreff: [Scikit-learn-general] Combining TFIDF and LDA features
>>>
>>> Hey there!
>>>
>>> I have seen in the past some few research papers that combined tfidf
>>> based features with LDA topic model features and they could increase
>>> their accuracy by some useful extent.
>>>
>>> I now wanted to do the same. As a simple step I just attended the topic
>>> features to each train and test sample with the existing tfidf features
>>> and performed my standard LinearSVC - oh btw thanks that the confusion
>>> with dense and sparse is now resolved in 0.12 ;) - on it.
>>>
>>> The problem now is, that the results are overall exactly similar. Some
>>> classes perform better and some worse.
>>>
>>> I am not exactly sure if this is a data problem, or comes from my lack
>>> of understanding of such feature extension techniques.
>>>
>>> Is it possible that the huge amount of tfidf features somehow overrules
>>> the rather small number of topic features? Do I maybe have to some
>>> feature modification - because tfidf and LDA features are of different
>>> nature?
>>>
>>> Maybe it is also due to the classifier and I need something else?
>>>
>>> Would be happy if someone could shed a little light on my problems ;)
>>>
>>> Regards,
>>> Philipp
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Got visibility?
>>> Most devs has no idea what their production app looks like.
>>> Find out how fast your code is with AppDynamics Lite.
>>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Got visibility?
>>> Most devs has no idea what their production app looks like.
>>> Find out how fast your code is with AppDynamics Lite.
>>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> Got visibility?
>> Most devs has no idea what their production app looks like.
>> Find out how fast your code is with AppDynamics Lite.
>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>> ------------------------------------------------------------------------------
>>
>> Got visibility?
>> Most devs has no idea what their production app looks like.
>> Find out how fast your code is with AppDynamics Lite.
>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>


------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to