Re: [scikit-learn] NearestNeighbors without replacement

2018-04-03 Thread Randy Ellis
Hi Dr. Varoquaux, It seems like the SciPy function only assigns one row to one column. I need to assign 20 controls to each case. Does the linear_sum_assignment function, since it assigns unique pairs, depend on the order of the rows and columns? If so, perhaps I could shuffle and then combine the

Re: [scikit-learn] NearestNeighbors without replacement

2018-04-03 Thread Randy Ellis
Thanks Dr. Varoquax, it’s awesome you’re on this list, I’m a fan of your work! Will look into this strategy. Best, Randy On Tue, Apr 3, 2018 at 8:57 AM Gael Varoquaux wrote: > Matching to minimize a cost is known as the linear assignment problem, > can be solved in n^3 cost, and is implemente

Re: [scikit-learn] NearestNeighbors without replacement

2018-04-03 Thread Gael Varoquaux
Matching to minimize a cost is known as the linear assignment problem, can be solved in n^3 cost, and is implemented in scikit-learn in sklearn.utils.linear_assignment_.linear_assignment or in recent versions of scipy as scipy.optimize.linear_sum_assignment Of course, this problem will require muc

Re: [scikit-learn] NearestNeighbors without replacement

2018-04-02 Thread Randy Ellis
Hi Jake, Thank you for the feedback. Yeah, working without replacement, certain cases are going to more appropriate matches than others. I proposed the idea of using replacement and compensating for the re-use of controls with frequency weighting, but you gotta do what your PI tells you sometimes!

Re: [scikit-learn] NearestNeighbors without replacement

2018-04-02 Thread Jacob Vanderplas
Hi Randy, I think that approach is probably a good heuristic, but it will not necessarily find the optimal result. That said, if you don't care about having guarantees that you're finding the optimal pairing, but only that you can find a reasonable set of pairs, it will probably work out fine. J

Re: [scikit-learn] NearestNeighbors without replacement

2018-04-02 Thread Randy Ellis
Hi Jake, Thanks for the reply. Yes, trying this out resulted from looking for ways in python to implement propensity score matching. I found a package, pscore_match (http://www.kellieottoboni.com/pscore_match/), but the matching was really terrible. Specifically, I'm matching based on age, race, g

Re: [scikit-learn] NearestNeighbors without replacement

2018-04-01 Thread Jacob Vanderplas
On Sun, Apr 1, 2018 at 6:36 PM, Randy Ellis wrote: > Hello to the Scikit-learn community! > > I am doing case-control matching for an electronic health records study. > My question is, is it possible to run Sklearn's NearestNeighbors function > without replacement? As in, match the treated group