Re: [Scikit-learn-general] Clustering of Text Documents

2013-06-04 Thread Lars Buitinck
2013/6/4 Joel Nothman jnoth...@student.usyd.edu.au: NLP folks pass the blame to IR folks :P ... and IR folks always mean absolute frequency, unless stated otherwise. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-04 Thread Andreas Mueller
On 06/04/2013 05:55 AM, Christian Jauvin wrote: Many thanks to all for your help and detailed answers, I really appreciate it. So I wanted to test the discussion's takeaway, namely, what Peter suggested: one-hot encode the categorical features with small cardinality, and leave the others in

Re: [Scikit-learn-general] scikit contribution

2013-06-04 Thread Andreas Mueller
Hi Şükrü. First, please call my Andy like anyone else ;) Second, please ask questions like these on the mailing list. I don't always have time to reply to mails. If you want to start to contribute, please read the contributor guidelines :

Re: [Scikit-learn-general] Random Forest with a mix of categorical and lexical features

2013-06-04 Thread Peter Prettenhofer
Hi Christian, I believe more in my results than in my expertise - and so should you :-) ** I think you misunderstood me: I did not claim that one-hot encoded categorical features give better results than ordinal encoded ones - I just claimed that ordinal encoding works as good as one-hot encoded

Re: [Scikit-learn-general] Clustering of Text Documents

2013-06-04 Thread Tom Fawcett
On Jun 4, 2013, at 2:38 AM, Lars Buitinck l.j.buiti...@uva.nl wrote: 2013/6/4 Joel Nothman jnoth...@student.usyd.edu.au: NLP folks pass the blame to IR folks :P ... and IR folks always mean absolute frequency, unless stated otherwise. Coming from ML, I’ve seen it used as both absolute and

[Scikit-learn-general] Developer list

2013-06-04 Thread Karol Pysniak
Hi All, I am new to scikit-learn, and I am very keen on starting contributing to the project. However, I couldn't find any developer list, where I could propose my ideas. Could you direct me to such a list or are such discussions generally taken in the general mailing list? Many thanks, Karol

Re: [Scikit-learn-general] Developer list

2013-06-04 Thread Gael Varoquaux
Welcome, This is right list. Gaël -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise

Re: [Scikit-learn-general] Developer list

2013-06-04 Thread Karol Pysniak
Great, thanks. As I mentioned, I am very interested in making some contribution to scikit-learn library. In particular, I would like to extend the current implementation of handling small sample size problems. What I have in mind is adding the support for Maximum Entropy Covariance Estimate (The

Re: [Scikit-learn-general] Developer list

2013-06-04 Thread Ronnie Ghose
imho this may fit better in scikit-image http://scikit-image.org/. Thanks, Ronnie On Tue, Jun 4, 2013 at 4:33 PM, Karol Pysniak kpysn...@gmail.com wrote: Great, thanks. As I mentioned, I am very interested in making some contribution to scikit-learn library. In particular, I would like to