Re: [Scikit-learn-general] Active learning strategies with scikit-learn

Josh Wasserstein Fri, 04 Oct 2013 11:38:54 -0700

Thank you all for the pointers. This gives me a great place to start. I
will let you know if I find anything that is useful sharing with the list.


Josh


On Fri, Oct 4, 2013 at 9:42 AM, Emanuele Olivetti
<[email protected]>wrote:

>  Hi Josh,
>
> Some years ago I used to work on a similar problem, i.e. to decide which
> attributes
> of which instances should be measured  in order to reach a given goal (in
> our case:
> to learn which features were important and which ones were not, with
> respect to class
> labels).  Note that this formulation includes the possibility that you
> already collected
> some attributes (or labels) for some of your instances and the proposed
> solution used
> this information to estimate the gain/benefit for possible sampling action
> you would
> perform perform.
>
> Even though our application (feature relevance estimation) was different
> from yours,
> I suspect that the general approach, i.e. the maximum average change (MAC)
> sampling
> algorithm, could be applied in your case.
>
> Here are two references:
>
> - Active Learning of Feature Relevance
> Emanuele Olivetti, Sriharsha Veeramachaneni, Paolo Avesani
> In Computation Methods for Feature Selection (Huan Liu, Hiroshi Motoda,
> eds.),
> Chapman and Hall/CRC Press, 2007.
>
> http://books.google.it/books?id=N1ViHNWZeQ0C&lpg=PA91&ots=pH_7AzrbvM&dq=%22Active%20Learning%20of%20Feature%20Relevance%22&hl=it&pg=PA89#v=onepage&q=%22Active%20Learning%20of%20Feature%20Relevance%22&f=false
>
> - Active sampling for detecting irrelevant features.
> Sriharsha Veeramachaneni, Emanuele Olivetti, Paolo Avesani
> ICML 2006: 961-968
> http://dl.acm.org/citation.cfm?id=1143965
>
> As far as I know this is *not* a popular problem :) . You should ask to the
> [active-learning-ml] mailing list for more help, as Byron suggested.
>
> Best,
>
> Emanuele
>
>
>
> On 10/03/2013 04:01 PM, Josh Wasserstein wrote:
>
>  Hello,
>
>  I work in a classification problem where each instance has several
> attributes (e.g. the age of an individual). However, collecting instances
> (either labeled or unlabeled) is very expensive, since it requires asking
> domain experts to spend a significant amount of time to simply collect the
> instance (labeling the instance once it has been collected is actually
> relatively fast)
>
>  Given this, I want to explore an active learning strategy where rather
> than starting with a set of labeled and unlabeled instances, I only have
> labeled instances,* but *I can ask for additional labeled instances by
> specifying:
>
>
>    - Attributes or statistics of the attributes of the additional
>    instances (e.g. give me an instance with an age in the range [a,b]) on the
>    new instances
>     - The desired label of the additional instances (e.g. give me a new
>    instance with label x),  or alternatively the *label *sampling
>    distribution that the experts should use get new instances.
>
>  With this, my questions are:
>
>
>    - Does this problem have a name? It looks like a specific case of
>    Active Learning, but I am not sure, since in Active Learning one starts
>    with a set of unlabeled instances, which is not my case.
>
>     - What types of approaches (from the most rudimentary to the more
>    sophisticated) can I employ to identify the most informative sampling
>    distribution from instance attributes or instance labels?
>
>     - Does *scikit-learn* provide any functionality geared towards the
>    specific challenges of this problem?
>
> Thanks a lot,
>
>  Josh
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register 
> >http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Active learning strategies with scikit-learn

Reply via email to