Hey guys, I guess in the end this is a question about methodology and I could write my own functions for sampling and evaluation, but I'm wondering if this problem has already been solved in scikit-learn. I have a dataset where I would like to group samples for cross-validation and evaluation because each row represents a tuple from a group of samples so it shouldn't be considered in isolation. Let me go a little over the set-up. I'm trying to use a binary classifier (maybe logistic regression) to match elements in set A with elements in set B. The cardinality of B is much larger than that of A. You can think of the elements in B as a bunch of imperfect copies of elements in A. The goal is to match each element in A with its closest imperfect copy in B. After some preprocessing, each element in A has a small set of candidates C (a subset of B) and the manually labeled data assigned a 1 to the best candidate from C and 0 to the rest. Note that the number of candidates varies depending on the given element of A. So each row in the data is a feature vector that comes from a tuple (a in A, c in C_a) and only one of the candidates c is labeled as the winner (1), e.g. (a0, c0_0) : 0 (a0, c0_1) : 1 (a0, c0_2) : 0 (a1, c1_0) : 1 (a1, c1_1) : 0 ...
Is there a way to use the scikit-learn functionality for cross-validation and evaluation given this set-up? Thanks! -- Hector ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
