[Scikit-learn-general] Add to docs [was Re: Best classification for very sparse and skewed feature matrix

Satrajit Ghosh Tue, 24 Jan 2012 05:03:15 -0800

hi olivier and others,

this list generates a lot of practical useful information such as your
response below that gets "lost" (i.e. difficult to search if you don't have
the right terms) in the mailing list archives. could we think about how to
capture such information in the docs/wiki?


cheers,

satra


On Tue, Jan 24, 2012 at 7:57 AM, Olivier Grisel <[email protected]>wrote:

> Which classifier have you tried? Are you sure you selected the best
> hyper-parameters with GridSearchCV? Have your tried to normalize the
> dataset? For instance have a look at:
>
>  http://scikit-learn.org/dev/modules/preprocessing.html
>
> For very sparse data with large variance in the feature, you should
> try the sklearn.feature_extraction.text.TfidfTransformer .
>
>
> http://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html#sklearn.feature_extraction.text.TfidfTransformer
>
> Also have you tried to fit ExtraTrees?
>
> Is this a multiclass problem or a multilabel problem (e.g. is the
> target for each example unique and exclusive or can a single sample be
> assigned several labels)?
>
> For the later case, multilabel support was recently introduced and is
> documented here:
>
>  http://scikit-learn.org/stable/modules/multiclass.html
>
> If you can cheaply collect unsupervised data that looks similar to
> your training set (albeit without the labels and in much larger
> amount) it might be interesting to compute cluster centers using
> MinibatchKMeans and then project your data on the space using a non
> linear transform (e.g. a RBF kernel) and add this additional features
> to the original features (horizontal concatenation of the 2 datasets)
> and then fit the classifier with the labels on this.
>
> You can also do the same with a linear feature extraction such as
> sklearn.decomposition.RandomizedPCA or a non linear dictionary learner
> such as  sklearn.decomposition.MiniBatchDictionaryLearning:
>
>
> http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.MiniBatchDictionaryLearning.html#sklearn.decomposition.MiniBatchDictionaryLearning
>
> Whatever the feature extractor you use, the goal is to extract latent
> components with each of them having a wide feature coverage so as to
> cast a net large net that can activate those component even if the raw
> input has very few active features.
>
> There are also 2 incoming pull requests to add real support for semi
> supervised learning algorithms in the scikit but not yet merged in the
> master. With you could use those kind of algorithm to implement so
> kind of active learning by enriching your training set by manually
> annotating the most promising examples from your unsupervised dataset.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Add to docs [was Re: Best classification for very sparse and skewed feature matrix

Reply via email to