Re: [scikit-learn] Classifiers for dataset with categorical features

Gael Varoquaux Wed, 26 Jul 2017 00:04:53 -0700

The right thing to do would probably be to write a scikit-learn-contrib
package for them and see if they gather traction. If they perform well on
eg kaggle competitions, we know that we need them in :).


Cheers,

Gaël

On Fri, Jul 21, 2017 at 07:09:03PM -0400, Sebastian Raschka wrote:
> Maybe because they are genetic algorithms, which are -- for some reason -- 
> not very popular in the ML field in general :P. (People in bioinformatics 
> seem to use them a lot though.). Also, the name "Learning Classifier Systems" 
> is also a bit weird I'd must say: I remember that when Ryan introduced me to 
> those, I was like "ah yeah, sure, I know machine learning classifiers" ;)



> > On Jul 21, 2017, at 3:01 PM, Stuart Reynolds <stu...@stuartreynolds.net> 
> > wrote:

> > +1
> > LCS and its many many variants seem very practical and adaptable. I'm
> > not sure why they haven't gotten traction.
> > Overshadowed by GBM & random forests?


> > On Fri, Jul 21, 2017 at 11:52 AM, Sebastian Raschka
> > <se.rasc...@gmail.com> wrote:
> >> Just to throw some additional ideas in here. Based on a conversation with 
> >> a colleague some time ago, I think learning classifier systems 
> >> (https://en.wikipedia.org/wiki/Learning_classifier_system) are 
> >> particularly useful when working with large, sparse binary vectors (like 
> >> from a one-hot encoding). I am really not into LCS's, and only know the 
> >> basics (read through the first chapters of the Intro to Learning 
> >> Classifier Systems draft; the print version will be out later this year).
> >> Also, I saw an interesting poster on a Set Covering Machine algorithm 
> >> once, which they benchmarked against SVMs, random forests and the like for 
> >> categorical (genomics data). Looked promising.

> >> Best,
> >> Sebastian


> >>> On Jul 21, 2017, at 2:37 PM, Raga Markely <raga.mark...@gmail.com> wrote:

> >>> Thank you, Jacob. Appreciate it.

> >>> Regarding 'perform better', I was referring to better accuracy, 
> >>> precision, recall, F1 score, etc.

> >>> Thanks,
> >>> Raga

> >>> On Fri, Jul 21, 2017 at 2:27 PM, Jacob Schreiber 
> >>> <jmschreibe...@gmail.com> wrote:
> >>> Traditionally tree based methods are very good when it comes to 
> >>> categorical variables and can handle them appropriately. There is a 
> >>> current WIP PR to add this support to sklearn. I'm not exactly sure what 
> >>> you mean that "perform better" though. Estimators that ignore the 
> >>> categorical aspect of these variables and treat them as discrete will 
> >>> likely perform worse than those that treat them appropriately.

> >>> On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <raga.mark...@gmail.com> 
> >>> wrote:
> >>> Hello,

> >>> I am wondering if there are some classifiers that perform better for 
> >>> datasets with categorical features (converted into sparse input matrix 
> >>> with pd.get_dummies())? The data for the categorical features are nominal 
> >>> (order doesn't matter, e.g. country, occupation, etc).

> >>> If you could provide me some references (papers, books, website, etc), 
> >>> that would be great.

> >>> Thank you very much!
> >>> Raga



> >>> _______________________________________________
> >>> scikit-learn mailing list
> >>> scikit-learn@python.org
> >>> https://mail.python.org/mailman/listinfo/scikit-learn



> >>> _______________________________________________
> >>> scikit-learn mailing list
> >>> scikit-learn@python.org
> >>> https://mail.python.org/mailman/listinfo/scikit-learn


> >>> _______________________________________________
> >>> scikit-learn mailing list
> >>> scikit-learn@python.org
> >>> https://mail.python.org/mailman/listinfo/scikit-learn

> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn@python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn

> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Classifiers for dataset with categorical features

Reply via email to