Dear Hector,

     You probably want to introduce a loss function that quantifies the 
accuracy of the solution and try to minimize it.
     Then you will have to figure out how to perform some kind of 
generalization to new samples. Are you learning a rule that associates a 
with C_a ? Otherwise, I don't think that skl evaluation mechanisms will 
be useful to you.
     Best,

Bertrand



On 31/08/2013 21:46, Hector wrote:
> Hey guys,
> I guess in the end this is a question about methodology and I could
> write my own functions for sampling and evaluation, but I'm wondering
> if this problem has already been solved in scikit-learn.
> I have a dataset where I would like to group samples for
> cross-validation and evaluation because each row represents a tuple
> from a group of samples so it shouldn't be considered in isolation.
> Let me go a little over the set-up.
> I'm trying to use a binary classifier (maybe logistic regression) to
> match elements in set A with elements in set B. The cardinality of B
> is much larger than that of A. You can think of the elements in B as a
> bunch of imperfect copies of elements in A. The goal is to match each
> element in A with its closest imperfect copy in B.
> After some preprocessing, each element in A has a small set of
> candidates C (a subset of B) and the manually labeled data assigned a
> 1 to the best candidate from C and 0 to the rest. Note that the number
> of candidates varies depending on the given element of A.
> So each row in the data is a feature vector that comes from a tuple
> (a in A, c in C_a)
> and only one of the candidates c is labeled as the winner (1), e.g.
> (a0, c0_0) : 0
> (a0, c0_1) : 1
> (a0, c0_2) : 0
> (a1, c1_0) : 1
> (a1, c1_1) : 0
> ...
>
> Is there a way to use the scikit-learn functionality for
> cross-validation and evaluation given this set-up?
> Thanks!


------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to