Re: [Scikit-learn-general] Comparing classifier confusion matrices

Emanuele Olivetti Tue, 17 Apr 2012 01:27:11 -0700

On 04/16/2012 08:35 PM, Michael Waskom wrote:
> Hi all,
>
> I asked a question on metaoptimize about quantitative comparisons between 
> classifier 
> confusion matrices. If anyone has a good idea and would like to chime in, it 
> would be 
> much appreciated.
>
> http://metaoptimize.com/qa/questions/9936/good-methods-to-compare-classifier-confusion-matrices
>
>


Hi Michael,

I am working on hypothesis testing applied to classifiers and your question
on metaoptimize seems to fall within this topic.

My impression is that your question is not fully detailed - but it can be
that I did not understand it properly :-)

Just to reformulate:
- You have "two kinds" of datasets, e.g. pictures of real {apple,pear,banana}
and drawings of {apple,pear,banana}.
- You train a classifier half of the first dataset (pictures) and compute the 
confusion
matrix over the second half.
- You do the same on the second dataset (drawings) and get the related confusion
matrix.
- Your question is: "are the two classifiers predicting in the same way or are
they (significantly) different?"

If that is the correct interpretation, I think that some necessary details are 
missing
in order to state a proper problem.

The first that comes to my mind is: is the set of instances (fruits) exactly 
the same for
the two datasets and what changes is just their representation (pictures in the 
first
case, drawings in the second case) or are they just two different dataset?

More precisely, I see three options:
a) Each set of instances is drawn from its own (different) distribution.
b) The two set of instances are two distinct draws from the same distribution. 
Then
they are represented in different feature spaces.
c) There is just one common set of instances. The two datasets and are just two
representations of it in two feature different spaces (pictures and drawings,
respectively).

My opinion is that option 'a' leads to an ill posed problem, option 'b'
leads to a difficult problem. Option 'c' inot that easy to address anyway
but I have a ready solution for it :-).

Could you please add the missing details or tell me what I did not
get properly?

Best,

Emanuele


------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Comparing classifier confusion matrices

Reply via email to