Re: [Scikit-learn-general] Next best match

2013-05-08 Thread Lars Buitinck
2013/5/8 Ark <[email protected]>: > I am using sgdclassifier for document clasification. [snip] > -is there a way to predict next best match directly? The decision_function method returns what you want: scores for the individual classes, which can be combined with the labels using something like

Re: [Scikit-learn-general] Next best match

2013-05-08 Thread Ark
> -or is there a best way to switch to something like knn (which initially Correction: -or is the best way to switch to something like knn? -- Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is t

[Scikit-learn-general] Next best match

2013-05-08 Thread Ark
I am using sgdclassifier for document clasification. where (n_samples, n_features) = (12000, 50). In my project in some of the cases the category chosen leads to post-processing the document and again trying to predict, in which case it should not predict the same category, but return th

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-08 Thread Alexandre ABRAHAM
Bao, To compute the silhouette distance, the scikit precompute the matrix of distances between the elements of X (samples). But it is possible to do without this matrix and compute the distance between two samples only when it's needed. This is the most naive implementation of the silhouette. Ther

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-08 Thread Bao Thien
Hi Alexandre, Thank for your feedback. But could you please more clarify about "computing the distance between samples "on the fly"'. In my case, the time requirement is not very serious. If you can make me clear about this, I think it would be a suitable solution for my case. Regards, T.Bao O

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-08 Thread Alexandre ABRAHAM
Hi Bao, If I am not mistaken, the computation of pairwise distances is a way to speed up silhouette calculus, and make the code simpler. It is possible to compute silhouette by computing the distance between samples "on the fly". This will be very slow indeed but no additional memory is required.