Re: [Scikit-learn-general] Circular dependencies and fit_pairwise

2012-08-26 Thread Mathieu Blondel
I'd rather avoid importing high-level modules like `neighbors` in low-level modules like `metrics`... Also, can't we just postpone the code factorization of affinity computations to later? After all, we are talking of only a few lines of code that concern only 2 modules so far. And I'm still -1

[Scikit-learn-general] Circular dependencies and fit_pairwise

2012-08-26 Thread Andreas Mueller
Hi everybody. I am working on the "pairwise" PR atm since I think it is high time that this gets merged. Without it, it's basically impossible to reasonably test any clustering algorithms and it fixes what I think is a bit hole in the API. As Olivier suggested I tried to move all affinity computa

Re: [Scikit-learn-general] Getting confidence of a svc predict_prob

2012-08-26 Thread David Montgomery
I was going down the route of bootstrappinga lot... and distributing across multiple cores and machines is not an issue for deriving a mean an variance. What I am confused about is the effect, from my understanding, is the implicit 5-Kfold used to generate prob estimates since SVM inherentl

Re: [Scikit-learn-general] Getting confidence of a svc predict_prob

2012-08-26 Thread Mathieu Blondel
GaussianProcess.predict has an eval_MSE option to give you a 95% confidence interval, see: http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gp_regression.html I think it should be possible to do something similar for Ridge and LogisticRegression. Mathieu ---

Re: [Scikit-learn-general] Getting confidence of a svc predict_prob

2012-08-26 Thread Olivier Grisel
2012/8/26 Gael Varoquaux : > On Sun, Aug 26, 2012 at 12:08:52PM +0200, Olivier Grisel wrote: >> A sound, non parametric but computationally expensive way to get this >> kind of information (confidence intervals on the estimated parameters >> or predicted probability estimate) would be to bootstrap:

Re: [Scikit-learn-general] Getting confidence of a svc predict_prob

2012-08-26 Thread Gael Varoquaux
On Sun, Aug 26, 2012 at 12:08:52PM +0200, Olivier Grisel wrote: > A sound, non parametric but computationally expensive way to get this > kind of information (confidence intervals on the estimated parameters > or predicted probability estimate) would be to bootstrap: resample > n_samples out of n_s

Re: [Scikit-learn-general] Getting confidence of a svc predict_prob

2012-08-26 Thread Olivier Grisel
2012/8/26 Andreas Mueller : > Hi David. > I don't think this is possible. > Getting the probability is already a bit of a hack. > The LibSVM uses cross-validation and Platt-scaling for that afaik. > > I am not so much into the statistics side but I don't know any classfier > that gives you confiden

Re: [Scikit-learn-general] Getting confidence of a svc predict_prob

2012-08-26 Thread Andreas Mueller
Hi David. I don't think this is possible. Getting the probability is already a bit of a hack. The LibSVM uses cross-validation and Platt-scaling for that afaik. I am not so much into the statistics side but I don't know any classfier that gives you confidences, except maybe ensembles. Cheers, Andy

Re: [Scikit-learn-general] CV Kfold interatioins look good, test sets....not so good.

2012-08-26 Thread Andreas Mueller
Hi David. I didn't look at your code in detail, but there are several tools in sklearn that could help you simplify your setup and maybe get rid of your problem. Is there any reason to use one-vs-rest instead of one-vs-one? The SVC has one-vs-one built in and you could just use that and not fid

Re: [Scikit-learn-general] Issue with StratifiedShuffleSplit

2012-08-26 Thread Olivier Grisel
Can you check whether this branch fixes the issue? https://github.com/scikit-learn/scikit-learn/pull/1060 Would be good to include your failed examples as a non regression tests for this PR. -- Olivier -- Live Security