If I understand you correctly, you might be looking for SequenceKFold in
Lars Buitinck's seqlearn repository:
https://github.com/larsmans/seqlearn/blob/master/seqlearn/evaluation.py


On Sun, Sep 1, 2013 at 5:46 AM, Hector <[email protected]> wrote:

> Hey guys,
> I guess in the end this is a question about methodology and I could
> write my own functions for sampling and evaluation, but I'm wondering
> if this problem has already been solved in scikit-learn.
> I have a dataset where I would like to group samples for
> cross-validation and evaluation because each row represents a tuple
> from a group of samples so it shouldn't be considered in isolation.
> Let me go a little over the set-up.
> I'm trying to use a binary classifier (maybe logistic regression) to
> match elements in set A with elements in set B. The cardinality of B
> is much larger than that of A. You can think of the elements in B as a
> bunch of imperfect copies of elements in A. The goal is to match each
> element in A with its closest imperfect copy in B.
> After some preprocessing, each element in A has a small set of
> candidates C (a subset of B) and the manually labeled data assigned a
> 1 to the best candidate from C and 0 to the rest. Note that the number
> of candidates varies depending on the given element of A.
> So each row in the data is a feature vector that comes from a tuple
> (a in A, c in C_a)
> and only one of the candidates c is labeled as the winner (1), e.g.
> (a0, c0_0) : 0
> (a0, c0_1) : 1
> (a0, c0_2) : 0
> (a1, c1_0) : 1
> (a1, c1_1) : 0
> ...
>
> Is there a way to use the scikit-learn functionality for
> cross-validation and evaluation given this set-up?
> Thanks!
> --
>  Hector
>
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to