Re: [scikit-learn] GSoC proposal - linear model

2017-03-29 Thread Jacob Schreiber
Hi Konstantinos I likely won't be a mentor for the linear models project, but I looked over your proposal and have a few suggestions. In general it was a good write up! 1. You should include some equations in the write up, basically the softmax loss (which I think is a more common term than multi

[scikit-learn] Announcement: scikit-image 0.13.0

2017-03-29 Thread Juan Nunez-Iglesias
We're happy to (finally) announce the release of scikit-image v0.13.0! Special thanks to all our contributors who made this possible. Linux and macOS wheels are available now on PyPI, as well as a source distribution. A conda-forge package, Windows wheels, and Debian packages should arrive in t

[scikit-learn] Announcement: scikit-image 0.13.0

2017-03-29 Thread Juan Nunez-Iglesias
We're happy to (finally) announce the release of scikit-image v0.13.0! Special thanks to our many contributors for making it possible! This release is the result of over a year of work, with over 200 pull requests by 82 contributors. Linux and macOS wheels are available now on PyPI

Re: [scikit-learn] decision trees

2017-03-29 Thread Julio Antonio Soto de Vicente
IMO CART can handle categorical features just as good as CITrees, as long as we slightly change sklearn's implementation... -- Julio > El 29 mar 2017, a las 15:30, Andreas Mueller escribió: > > I'd argue that's why we should implement conditional inference trees ;) > >> On 03/29/2017 05:56 AM

Re: [scikit-learn] decision trees

2017-03-29 Thread Andreas Mueller
I'd argue that's why we should implement conditional inference trees ;) On 03/29/2017 05:56 AM, Olivier Grisel wrote: Integer coding will indeed make the DT assume an arbitrary ordering while one-hot encoding does not force the tree model to make that assumption. However in practice when the de

Re: [scikit-learn] decision trees

2017-03-29 Thread Raphael C
There is https://github.com/scikit-learn/scikit-learn/pull/4899 . It looks like it is waiting for review? Raphael On 29 March 2017 at 11:50, federico vaggi wrote: > That's a really good point. Do you know of any systematic studies about the > two different encodings? > > Finally: wasn't there

Re: [scikit-learn] decision trees

2017-03-29 Thread federico vaggi
That's a really good point. Do you know of any systematic studies about the two different encodings? Finally: wasn't there a PR for RF to accept categorical variables as inputs? On Wed, 29 Mar 2017 at 11:57, Olivier Grisel wrote: > Integer coding will indeed make the DT assume an arbitrary ord

Re: [scikit-learn] decision trees

2017-03-29 Thread Andrew Howe
Thanks very much for the thorough answer. I didn't think about the inductive bias issue with my forests. I'll evaluate both set of coding for my unordered categoricals. Andrew <~~~> J. Andrew Howe, PhD www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://www.res

Re: [scikit-learn] decision trees

2017-03-29 Thread Olivier Grisel
Integer coding will indeed make the DT assume an arbitrary ordering while one-hot encoding does not force the tree model to make that assumption. However in practice when the depth of the trees is not too limited (or if you use a large enough ensemble of trees), the model will have enough flexibil

Re: [scikit-learn] decision trees

2017-03-29 Thread Brian Holt
>From a theoretical point of view, yes you should one-hot-encode your categorical variables if you don't want any ordering to be implied. Brian On 29 Mar 2017 08:40, "Andrew Howe" wrote: > My question is more along the lines of will the DT classifier falsely > infer an ordering? > > <~~

Re: [scikit-learn] decision trees

2017-03-29 Thread Andrew Howe
My question is more along the lines of will the DT classifier falsely infer an ordering? <~~~> J. Andrew Howe, PhD www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://www.researchgate.net/profile/John_Howe12/ I live to learn, so I can learn to live. - me <

Re: [scikit-learn] decision trees

2017-03-29 Thread Olivier Grisel
For large enough models (e.g. random forests or gradient boosted trees ensembles) I would definitely recommend arbitrary integer coding for the categorical variables. Try both, use cross-validation and see for yourself. -- Olivier ___ scikit-learn mail

[scikit-learn] decision trees

2017-03-29 Thread Andrew Howe
Is one-hot encoding still the most accurate way to pass categorical variables to decision trees in scikit-learn (i.e. without causing spurious ordering/interpolation)? Thanks. Andrew <~~~> J. Andrew Howe, PhD www.andrewhowe.com http://www.linkedin.com/in/ahowe42 https://