Hi Josh,

Some years ago I used to work on a similar problem, i.e. to decide which 
attributes
of which instances should be measured  in order to reach a given goal (in our 
case:
to learn which features were important and which ones were not, with respect to class labels). Note that this formulation includes the possibility that you already collected some attributes (or labels) for some of your instances and the proposed solution used
this information to estimate the gain/benefit for possible sampling action you 
would
perform perform.

Even though our application (feature relevance estimation) was different from 
yours,
I suspect that the general approach, i.e. the maximum average change (MAC) 
sampling
algorithm, could be applied in your case.

Here are two references:

- Active Learning of Feature Relevance
Emanuele Olivetti, Sriharsha Veeramachaneni, Paolo Avesani
In Computation Methods for Feature Selection (Huan Liu, Hiroshi Motoda, eds.),
Chapman and Hall/CRC Press, 2007.
http://books.google.it/books?id=N1ViHNWZeQ0C&lpg=PA91&ots=pH_7AzrbvM&dq=%22Active%20Learning%20of%20Feature%20Relevance%22&hl=it&pg=PA89#v=onepage&q=%22Active%20Learning%20of%20Feature%20Relevance%22&f=false

- Active sampling for detecting irrelevant features.
Sriharsha Veeramachaneni, Emanuele Olivetti, Paolo Avesani
ICML 2006: 961-968
http://dl.acm.org/citation.cfm?id=1143965

As far as I know this is *not* a popular problem :) . You should ask to the
[active-learning-ml] mailing list for more help, as Byron suggested.

Best,

Emanuele


On 10/03/2013 04:01 PM, Josh Wasserstein wrote:
Hello,

I work in a classification problem where each instance has several attributes (e.g. the age of an individual). However, collecting instances (either labeled or unlabeled) is very expensive, since it requires asking domain experts to spend a significant amount of time to simply collect the instance (labeling the instance once it has been collected is actually relatively fast)

Given this, I want to explore an active learning strategy where rather than starting with a set of labeled and unlabeled instances, I only have labeled instances,* but *I can ask for additional labeled instances by specifying:

  * Attributes or statistics of the attributes of the additional instances
    (e.g. give me an instance with an age in the range [a,b]) on the new 
instances
  * The desired label of the additional instances (e.g. give me a new instance
    with label x),  or alternatively the /label /sampling distribution that
    the experts should use get new instances.

With this, my questions are:

  * Does this problem have a name? It looks like a specific case of Active
    Learning, but I am not sure, since in Active Learning one starts with a
    set of unlabeled instances, which is not my case.

  * What types of approaches (from the most rudimentary to the more
    sophisticated) can I employ to identify the most informative sampling
    distribution from instance attributes or instance labels?

  * Does *scikit-learn* provide any functionality geared towards the specific
    challenges of this problem?

Thanks a lot,

Josh


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to