Re: [Scikit-learn-general] GridSearch comparing two preprocessors (or graph paths)

2014-08-07 Thread Joel Nothman
This is possible with https://github.com/scikit-learn/scikit-learn/pull/1769, which includes an example of something quite similar. Reviews would be greatly appreciated! On 8 August 2014 07:32, Ronnie Ghose wrote: > No afaik but it's easy enough to build in :) > On Aug 7, 2014 5:03 PM, "Satraji

Re: [Scikit-learn-general] GridSearch comparing two preprocessors (or graph paths)

2014-08-07 Thread Ronnie Ghose
No afaik but it's easy enough to build in :) On Aug 7, 2014 5:03 PM, "Satrajit Ghosh" wrote: > hi folks, > > is there a way for GridSearch in scikit learn to choose between two > preprocessors (e.g., PCA vs FeatureAgglomeration). more generally, whether > there is something to search through diff

[Scikit-learn-general] GridSearch comparing two preprocessors (or graph paths)

2014-08-07 Thread Satrajit Ghosh
hi folks, is there a way for GridSearch in scikit learn to choose between two preprocessors (e.g., PCA vs FeatureAgglomeration). more generally, whether there is something to search through different paths of a pipeline graph. i think i have seen this being discussed, but my keywords were not ret

Re: [Scikit-learn-general] train_test_split consumes too much memory

2014-08-07 Thread ZORAIDA HIDALGO SANCHEZ
ok! De: Joel Nothman mailto:[email protected]>> Responder a: "[email protected]" mailto:[email protected]>> Fecha: jueves, 7 de agosto de 2014 16:52 Para: scikit-learn-general mailto:sciki

Re: [Scikit-learn-general] train_test_split consumes too much memory

2014-08-07 Thread Joel Nothman
Try 0.15.1 On 8 August 2014 00:22, ZORAIDA HIDALGO SANCHEZ < [email protected]> wrote: > Andy, > > I am using version 0.14.1. My data are python list with strings :_| > > De: Andreas Mueller > Responder a: "[email protected]" < > scikit-learn-gen

Re: [Scikit-learn-general] train_test_split consumes too much memory

2014-08-07 Thread ZORAIDA HIDALGO SANCHEZ
Andy, I am using version 0.14.1. My data are python list with strings :_| De: Andreas Mueller mailto:[email protected]>> Responder a: "[email protected]" mailto:[email protected]>> Fecha: jueves

Re: [Scikit-learn-general] train_test_split consumes too much memory

2014-08-07 Thread Andreas Mueller
Hi. Which version are you using, and what is the dtype and shape of your data? I recently fixed something when the input was a list of strings. Andy On Aug 7, 2014 10:45 AM, "ZORAIDA HIDALGO SANCHEZ" < [email protected]> wrote: > Hi all, > > I have a dataset of 600M that I need

Re: [Scikit-learn-general] train_test_split consumes too much memory

2014-08-07 Thread Joel Nothman
Are you sure it is train_test_split itself that is taking a long time? What are the dimensions of your data? Are they stored in memory as a numpy array when you call train_test_split? On my MacBook with 16GB RAM I have no problem train_test_splitting np.empty((100, 500),dtype=np.float64), whi

[Scikit-learn-general] train_test_split consumes too much memory

2014-08-07 Thread ZORAIDA HIDALGO SANCHEZ
Hi all, I have a dataset of 600M that I need to split into train and test. I am using cross_validation.train_test_split to achieve it but it keeps running for one hour more or less and it ends consuming all the memory of the system (and thus I need to kill the process). My laptop has 8G of memory(

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-07 Thread Daniel Vainsencher
On 08/06/2014 07:25 PM, Maheshakya Wijewardena wrote: > Actually in our implementation of LSH Forest, we have an extra parameter > to control the candidate acquisition(to avoid having the candidates with > very small hash length matches - lower bound for max_depth) for > `kneighbors` queries. But t