> Take the example of 1-NN it can be very well happen that for samples
> close to the voronoi boundary, the closest neighbor is on the other
> side of the boundary.
Indeed, I was bullshiting.
--
Live Security Virtual Conf
Solving this issue in a generic way would be nice:
https://github.com/scikit-learn/scikit-learn/issues/325
On Mon, Jul 30, 2012 at 6:43 PM, Olivier Grisel wrote:
> Actually I think the KNearestNeighborsClassifier implementation in
> scikit-learn has a real memory occupation issue in "brute" mode
2012/7/30 Gael Varoquaux :
> On Mon, Jul 30, 2012 at 11:52:36AM +0200, Olivier Grisel wrote:
>> > In addition, a voronoi tessalation computed with a KMeans during the
>> > train could be used to avoid testing all the samples in the large n
>> > situation.
>
>> Hum, that won't work for exact k-NN.
>
well intuition on that works
in high
dimensions, though.
- Ursprüngliche Mail -
Von: "Gael Varoquaux"
An: [email protected]
Gesendet: Montag, 30. Juli 2012 10:54:48
Betreff: Re: [Scikit-learn-general] Problem in Reading Large CSV and Fitting to
ML Alg
On Mon, Jul 30, 2012 at 11:52:36AM +0200, Olivier Grisel wrote:
> > In addition, a voronoi tessalation computed with a KMeans during the
> > train could be used to avoid testing all the samples in the large n
> > situation.
> Hum, that won't work for exact k-NN.
I don't understand. Yes I do belie
2012/7/30 Gael Varoquaux :
> On Mon, Jul 30, 2012 at 11:43:01AM +0200, Olivier Grisel wrote:
>> This could be worked around by chunking the data argument of the
>> predict calls instead.
>
> Indeed.
>
> In addition, a voronoi tessalation computed with a KMeans during the
> train could be used to av
Ji,
could you please create a new github issue to track this bug?
https://github.com/scikit-learn/scikit-learn/issues
Please include the python snippets of your notebook as verbatim
markdown block in the issue:
```python
from sklearn.neighbors import KNearestNeighborsClassifier
...
```
--
O
On Mon, Jul 30, 2012 at 11:43:01AM +0200, Olivier Grisel wrote:
> This could be worked around by chunking the data argument of the
> predict calls instead.
Indeed.
In addition, a voronoi tessalation computed with a KMeans during the
train could be used to avoid testing all the samples in the larg
Actually I think the KNearestNeighborsClassifier implementation in
scikit-learn has a real memory occupation issue in "brute" mode (which
is selected).
I suspect it is materializing the whole (n_samples_train,
n_samples_predict) distances array in memory before computing the
(n_samples_predict * k
Hi Ji,
What you are trying to do is called 'online fitting'. Only a small number
of models can do online fitting. This is implemented in the
scikit-learn with a 'partial_fit' method. As far as supervised learning
goes, only SGD does online learning, I believe.
http://scikit-learn.org/stable/module
10 matches
Mail list logo