Re: [scikit-learn] NearestNeighbors without replacement

2018-04-02 Thread Jacob Vanderplas
at enough controls are matched to many different cases so that each > case ends up being matched to 20 unique controls. Does this method make > sense?? > > Best, > > Randy > > On Sun, Apr 1, 2018 at 10:13 PM, Jacob Vanderplas < > jake...@cs.washington.edu> wrote: > &g

Re: [scikit-learn] NearestNeighbors without replacement

2018-04-01 Thread Jacob Vanderplas
On Sun, Apr 1, 2018 at 6:36 PM, Randy Ellis wrote: > Hello to the Scikit-learn community! > > I am doing case-control matching for an electronic health records study. > My question is, is it possible to run Sklearn's NearestNeighbors function > without replacement? As

Re: [scikit-learn] CountVectorizer: Additional Feature Suggestion

2018-01-27 Thread Jacob Vanderplas
Hi Yacine, If I'm understanding you correctly, I think what you have in mind is already implemented in scikit-learn in the TF-IDF vectorizer . Best, Jake Jake VanderPlas Senior Data

Re: [scikit-learn] Support Vector Machines: Sensitive to Single Datapoints?

2017-12-19 Thread Jacob Vanderplas
Hi JohnMark, SVMs, by design, are quite sensitive to the addition of single data points – but only if those data points happen to lie near the margin. I wrote about some of those types of details here: https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html Hope

Re: [scikit-learn] Nearest neighbor search with 2 distance measures

2017-08-01 Thread Jacob Vanderplas
or FORTRAN. The closest one was halotools which again works with > euclidean metric. For now, I will try to get my work done with 2 different > BallTrees iteratively in bins. If I find a better option will try to post > an update. > > Regards, > Rohin. > > > On T

Re: [scikit-learn] Nearest neighbor search with 2 distance measures

2017-08-01 Thread Jacob Vanderplas
>> huge computational cost. I hope I am able to frame my question properly. >> >> Thanks & Regards, >> Rohin. >> >> >> >> On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < >> jake...@cs.washington.edu> wrote: >> >>> O

Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn

2017-02-03 Thread Jacob Vanderplas
Hi Afarin, The short answer is no, you can't really compute p-values and related statistics in Scikit-Learn. This stems from a fundamental divide in statistics/AI between machine learning on one hand, and statistical modeling on the other. A classic treatment of this divide is "Statistical