Re: [Scikit-learn-general] "reverse feature engineering" (or something vague like that)

Immanuel Tue, 02 Oct 2012 01:56:40 -0700

@Joseph Turian
> For error analysis, I usually look at the examples that the model
> breaks on, and try to figure out the pattern.
> This usually suggests new features to engineer.
How do you look at the examples? Do you use summary stats, visualization ...
I found this not easy when working with n_features > 1000.


Are you taking any measures to avoid "over-fitting by hand". Here I'm
primary
concerned with problems where n_samples (< 50) is small.

Are you looking at the wrong classified examples one by one?

Thanks for sharing.
Immanuel

>
> On Mon, Oct 1, 2012 at 6:01 PM, Immanuel <[email protected]> wrote:
>> Hi Christian,
>>
>> that's a great question and I'm curious what other's have to say.
>>
>> My impression is that the way to diagnose a trained model (classifier or
>> regression)
>> differ much between models and also depend on the problem at hand. This
>> makes it hard
>> to come up with a general framework.
>> Here some resources:
>>
>> * ESL [0] contains lot's of information on how to interpret linear models.
>> * "Advice for applying Machine Learning" [1] gives general recommendations
>> on how
>> to diagnose trained models
>> * Some inspiration on how to gain inside though visualization [2]
>> * [3] and [4] deal with Functional ANOVA decomposition (Still on my reading
>> list)
>>
>> Best,
>> Immanuel
>>
>>
>> [0] Hastie, T., R. Tibshirani, J. Friedman, and J. Franklin. “The Elements
>> of Statistical Learning: Data Mining, Inference and Prediction.” The
>> Mathematical Intelligencer 27, no. 2 (2005): 83–85.
>> [1] http://cs229.stanford.edu/materials/ML-advice.pdf
>> [2] http://had.co.nz/model-vis/[3] Hooker, G. “Diagnostics and Extrapolation
>> in Machine Learning”. stanford university, 2004.
>> [4] Roosen, C.B. “Visualization and Exploration of High-dimensional
>> Functions Using the Functional ANOVA Decomposition”. Citeseer, 1995.
>>
>>
>>
>> On 10/01/2012 10:49 PM, Christian Jauvin wrote:
>>
>> Hi everyone,
>>
>> I have this (rather vague) intuition that studying the "reasons" which
>> led a trained classifier to behave like it did on particular instances
>> of a problem might be a good way to increase its understanding. If you
>> have for instance a very imbalanced problem, it might be useful to
>> identify the few cases where a (trained) classifier answered right (in
>> terms of classification or probabilistic output) on the least likely
>> class, in order to determine which particular features have played a
>> positive role, and which haven't. The way I see it, this would be a
>> bit like "reverse engineering the features".
>>
>> So my question: is there a mechanism or maybe an already existing
>> framework or theory for doing this? And would something approaching it
>> be possible currently with Sklearn?
>>
>> Thanks,
>>
>> Christian
>>
>> ------------------------------------------------------------------------------
>> Got visibility?
>> Most devs has no idea what their production app looks like.
>> Find out how fast your code is with AppDynamics Lite.
>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Got visibility?
>> Most devs has no idea what their production app looks like.
>> Find out how fast your code is with AppDynamics Lite.
>> http://ad.doubleclick.net/clk;262219671;13503038;y?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>


------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] "reverse feature engineering" (or something vague like that)

Reply via email to