subject:"Re\: \[Scikit\-learn\-general\] Problem in Reading Large CSV and Fitting to ML Algorithm"

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Gael Varoquaux

> Take the example of 1-NN it can be very well happen that for samples > close to the voronoi boundary, the closest neighbor is on the other > side of the boundary. Indeed, I was bullshiting. -- Live Security Virtual Conf

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Mathieu Blondel

Solving this issue in a generic way would be nice: https://github.com/scikit-learn/scikit-learn/issues/325 On Mon, Jul 30, 2012 at 6:43 PM, Olivier Grisel wrote: > Actually I think the KNearestNeighborsClassifier implementation in > scikit-learn has a real memory occupation issue in "brute" mode

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Olivier Grisel

2012/7/30 Gael Varoquaux : > On Mon, Jul 30, 2012 at 11:52:36AM +0200, Olivier Grisel wrote: >> > In addition, a voronoi tessalation computed with a KMeans during the >> > train could be used to avoid testing all the samples in the large n >> > situation. > >> Hum, that won't work for exact k-NN. >

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Andreas Müller

well intuition on that works in high dimensions, though. - Ursprüngliche Mail - Von: "Gael Varoquaux" An: [email protected] Gesendet: Montag, 30. Juli 2012 10:54:48 Betreff: Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Alg

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Gael Varoquaux

On Mon, Jul 30, 2012 at 11:52:36AM +0200, Olivier Grisel wrote: > > In addition, a voronoi tessalation computed with a KMeans during the > > train could be used to avoid testing all the samples in the large n > > situation. > Hum, that won't work for exact k-NN. I don't understand. Yes I do belie

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Olivier Grisel

2012/7/30 Gael Varoquaux : > On Mon, Jul 30, 2012 at 11:43:01AM +0200, Olivier Grisel wrote: >> This could be worked around by chunking the data argument of the >> predict calls instead. > > Indeed. > > In addition, a voronoi tessalation computed with a KMeans during the > train could be used to av

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Olivier Grisel

Ji, could you please create a new github issue to track this bug? https://github.com/scikit-learn/scikit-learn/issues Please include the python snippets of your notebook as verbatim markdown block in the issue: ```python from sklearn.neighbors import KNearestNeighborsClassifier ... ``` -- O

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Gael Varoquaux

On Mon, Jul 30, 2012 at 11:43:01AM +0200, Olivier Grisel wrote: > This could be worked around by chunking the data argument of the > predict calls instead. Indeed. In addition, a voronoi tessalation computed with a KMeans during the train could be used to avoid testing all the samples in the larg

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Olivier Grisel

Actually I think the KNearestNeighborsClassifier implementation in scikit-learn has a real memory occupation issue in "brute" mode (which is selected). I suspect it is materializing the whole (n_samples_train, n_samples_predict) distances array in memory before computing the (n_samples_predict * k

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

2012-07-30 Thread Gael Varoquaux

Hi Ji, What you are trying to do is called 'online fitting'. Only a small number of models can do online fitting. This is implemented in the scikit-learn with a 'partial_fit' method. As far as supervised learning goes, only SGD does online learning, I believe. http://scikit-learn.org/stable/module

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

10 matches

Site Navigation

Mail list logo

Footer information