Re: [Scikit-learn-general] Tackling Dataset bias

Yogesh Karpate Thu, 15 Aug 2013 05:01:20 -0700

Thanks a lot Olivier for suggesting Alex Blog.
My apologies!! I rephrase my problem.
I have two data set of Brain MR images, lets call it A and B. A is acquired
in one country
and B in another. The data-set A contains both patients having pathology
and healthy volunteers where as data-set B contains only patients with
pathology.
Aim: To detect the patches with pathology in given MR image(There will be
few patches of pathology with respect to non-pathology in whole image).
This resembles to the object detection/localization techniques  from main
stream computer vision.
Description: Labels are MR image patch with pathology (+) and image patch
without the pathology(-). There are  60 no. of subjects in data-set A , out
of that 30 are with pathology and remaining are healthy ones. Data-set B
contains 35 patients and no health subjects.
Methods: Data-set A is split into train and test cases. Image of each
subject is scanned in all possible positions and scales and Bag of Words
like  3DSIFT, 3D HOG are calculated and  typical pipeline which consists
of  preprossessing of features, classification, post processing, cross
validation is developed. Note patches with pathology are obtained from
labeled patches of training set of  patients and patches  without pathology
from healthy subjects. Finally this framework is tested on the test
data-set. The ground truth is available for both training and test data-set.
==== This part is completed..
Dataset B only contains the patients with same pathology.No healthy
subjects. There are 20 subjects for training and 15 for training. Idea is
to apply the same framework applied on Dataset A.
Problem: Since there are  no healthy subjects in data-set B, and its very
difficult to cultivate the patches without pathology from the patients (Not
advisable in our context), For training purpose, I need to use patches
without pathology of healthy subjects from dataset A.  The patches with
pathology are are labelled in training data.The domain knowledge tells that
there exists a data-set bias due to variety of reasons.
Possible ways: Covariate shift adaptation by arthur gretton and Masachi
Sugiyama and Yamada. For ex:  "No Bias Left Behind: Co-variate Shift
Adaptation" ECCV 2012 and many NIPS paper. But I am not sure about their
scalability


So my question: Is there any other way to  tackle the problem like
"Transfer Learning", "Zero-shot learning"? Any experience doing such task?


On Thu, Aug 15, 2013 at 11:50 AM, Olivier Grisel
<olivier.gri...@ensta.org>wrote:

> I don't really understand what are the samples, the labels and the
> features in your case and how much unlabeled data do you have and what
> do you mean by "I have completed the classification task on 1st
> database.": if you have labeled datasets what does "completion of the
> classification task" mean?.
>
> As for scalable co-variate shift you might be interested in this blog post:
>
>
> http://blog.smola.org/post/4110255196/real-simple-covariate-shift-correction
>
> --
> Olivier
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
    Warm Regards
    Yogesh Karpate

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Tackling Dataset bias

Reply via email to