Re: [scikit-learn] Classifiers for dataset with categorical features

Sebastian Raschka Fri, 21 Jul 2017 16:11:23 -0700

Maybe because they are genetic algorithms, which are -- for some reason -- not 
very popular in the ML field in general :P. (People in bioinformatics seem to 
use them a lot though.). Also, the name "Learning Classifier Systems" is also a 
bit weird I'd must say: I remember that when Ryan introduced me to those, I was 
like "ah yeah, sure, I know machine learning classifiers" ;)




> On Jul 21, 2017, at 3:01 PM, Stuart Reynolds <stu...@stuartreynolds.net> 
> wrote:
> 
> +1
> LCS and its many many variants seem very practical and adaptable. I'm
> not sure why they haven't gotten traction.
> Overshadowed by GBM & random forests?
> 
> 
> On Fri, Jul 21, 2017 at 11:52 AM, Sebastian Raschka
> <se.rasc...@gmail.com> wrote:
>> Just to throw some additional ideas in here. Based on a conversation with a 
>> colleague some time ago, I think learning classifier systems 
>> (https://en.wikipedia.org/wiki/Learning_classifier_system) are particularly 
>> useful when working with large, sparse binary vectors (like from a one-hot 
>> encoding). I am really not into LCS's, and only know the basics (read 
>> through the first chapters of the Intro to Learning Classifier Systems 
>> draft; the print version will be out later this year).
>> Also, I saw an interesting poster on a Set Covering Machine algorithm once, 
>> which they benchmarked against SVMs, random forests and the like for 
>> categorical (genomics data). Looked promising.
>> 
>> Best,
>> Sebastian
>> 
>> 
>>> On Jul 21, 2017, at 2:37 PM, Raga Markely <raga.mark...@gmail.com> wrote:
>>> 
>>> Thank you, Jacob. Appreciate it.
>>> 
>>> Regarding 'perform better', I was referring to better accuracy, precision, 
>>> recall, F1 score, etc.
>>> 
>>> Thanks,
>>> Raga
>>> 
>>> On Fri, Jul 21, 2017 at 2:27 PM, Jacob Schreiber <jmschreibe...@gmail.com> 
>>> wrote:
>>> Traditionally tree based methods are very good when it comes to categorical 
>>> variables and can handle them appropriately. There is a current WIP PR to 
>>> add this support to sklearn. I'm not exactly sure what you mean that 
>>> "perform better" though. Estimators that ignore the categorical aspect of 
>>> these variables and treat them as discrete will likely perform worse than 
>>> those that treat them appropriately.
>>> 
>>> On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <raga.mark...@gmail.com> 
>>> wrote:
>>> Hello,
>>> 
>>> I am wondering if there are some classifiers that perform better for 
>>> datasets with categorical features (converted into sparse input matrix with 
>>> pd.get_dummies())? The data for the categorical features are nominal (order 
>>> doesn't matter, e.g. country, occupation, etc).
>>> 
>>> If you could provide me some references (papers, books, website, etc), that 
>>> would be great.
>>> 
>>> Thank you very much!
>>> Raga
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> 
>>> 
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Classifiers for dataset with categorical features

Reply via email to