Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-22 Thread Maheshakya Wijewardena
Hi As I think, using sparse data we can enhance the descriptiveness of the data while keeping its' smaller compared to the dense data without loosing information. Isn't that what trees generally need for improved accuracy? I will try using sparse data on 20newsgroups data and let you know the res

Re: [Scikit-learn-general] Strange Error Message

2014-01-22 Thread Lars Buitinck
2014/1/21 Lorenzo Isella : > I checked the dat files with R and there are only numeric values and no > missing values, nevertheless I get this error message when I try running > the Python script > > $ ./loan-minimal.py > Traceback (most recent call last): >File "./loan-minimal.py", line 13, in

Re: [Scikit-learn-general] Strange Error Message

2014-01-22 Thread Eustache DIEMERT
Hi, Maybe try with a subset of the data (or use dichotomy) to find the part of it that is triggering the error. And before that update your sklearn version using e.g. pip: """ sudo pip install -U scikit-learn """ so we're sure the version is recent enough :) Eustache 2014/1/21 Lorenzo Isella

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-22 Thread Caleb
Hi all, I am using random forest to do deep learning/feature learning using the RandomForestEmbedding in scikit-learn. It would be cool to apply  the random forest on the learned features and induced a higher level representation. I have actually tried the naive approach of densified the output

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-22 Thread Gilles Louppe
Mathieu, I have no experience with forests on sparse data, nor have I seen much work on the topic. I would be curious to investigate however, there may be problems how which this is useful. I know that Arnaud tried forests on (densified) 20newsgroups and it seems to work well actually. In partic

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-22 Thread Mathieu Blondel
Hi, Something I was wondering is whether sparse support in decision trees would actually be useful. Do decision trees (or ensembles of them like random forests) work better than linear models for high-dimensional data? It would be nice to take the News20 dataset, pre-select the top 10k features (

Re: [Scikit-learn-general] Theil-Sen estimator for a multiple linear regression problem

2014-01-22 Thread Mathieu Blondel
I just remembered that you can also try RANSAC, which was recently added to scikit-learn master: http://scikit-learn.org/dev/auto_examples/linear_model/plot_ransac.html Mathieu On Mon, Jan 13, 2014 at 6:45 PM, Mathieu Blondel wrote: > > On Mon, Jan 13, 2014 at 5:09 PM, [email protected]

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-22 Thread Arnaud Joly
Hi Maheshakya, I could be one of the mentors for this GSOC. If you want to apply for a GSOC, I think that this message from Gael and Mathieu is worth reading http://sourceforge.net/mailarchive/message.php?msg_id=31864881 Best, Arnaud On 22 Jan 2014, at 06:13, Maheshakya Wijewardena wrote: