So I came up with this small test script to show the difference between
Matlab's classify function and Matlab's ClassificationDiscriminant
function.  It appears that for balanced classes, they perform identically,
but when you unbalance them you get radically different answers.  This
problem becomes even worse when the classes are drawn from the same
distribution.  Please let me know if there is a better way to include this
script than just copying/pasting.

I also understand this may becoming more of a Matlab topic, but was hoping
someone might have an idea as to what is happening.

N1 = 3500;

N2 = 1500;



%% Balanced Classes

A = randn(10000, 1);

B = 1 + randn(10000, 1);



a = randsample(A, 2500);

b = randsample(B, 2500);





data = [a; b];

labels = [zeros(2500, 1) ; ones(2500, 1)];



y_pred = classify(data, data, labels);

TP1 = sum(y_pred == 1 & labels == 1);



temp = ClassificationDiscriminant.fit(data, labels);

y_pred2 = temp.predict(data);

TP2 = sum(y_pred2 == 1 & labels == 1);



fprintf('Data Easily Seperable (Balanced Classes): classify = %i,
ClassificationDiscriminant = %i\n', TP1, TP2)



%% Easily Seperable

A = randn(10000, 1);

B = 1 + randn(10000, 1);



a = randsample(A, N1);

b = randsample(B, N2);





data = [a; b];

labels = [zeros(N1, 1) ; ones(N2, 1)];



y_pred = classify(data, data, labels);

TP1 = sum(y_pred == 1 & labels == 1);



temp = ClassificationDiscriminant.fit(data, labels);

y_pred2 = temp.predict(data);

TP2 = sum(y_pred2 == 1 & labels == 1);



fprintf('Data Easily Seperable (Unbalanced Classes): classify = %i,
ClassificationDiscriminant = %i\n', TP1, TP2)





%% Same Distribution

C = randn(10000, 1);

D = randn(10000, 1);

c = randsample(C, N1);

d = randsample(D, N2);



data = [c; d];

labels = [zeros(N1, 1) ; ones(N2, 1)];



y_pred = classify(data, data, labels);

TP1 = sum(y_pred == 1 & labels == 1);



temp = ClassificationDiscriminant.fit(data, labels);

y_pred2 = temp.predict(data);

TP2 = sum(y_pred2 == 1 & labels == 1);



fprintf('Data drawn from same distribution  (Unbalanced Classes): classify
= %i, ClassificationDiscriminant = %i\n', TP1, TP2)


On Fri, Feb 15, 2013 at 6:44 AM, David Reed <[email protected]> wrote:

> Yes that is the method I was using, and its not giving the same results.
>  I'm going to keep working on getting some sim data.
>
>
> On Fri, Feb 15, 2013 at 5:01 AM, Andreas Mueller <[email protected]
> > wrote:
>
>>  On 02/15/2013 02:10 AM, David Reed wrote:
>>
>> Could you link that?
>>
>>
>> http://www.mathworks.de/products/statistics/examples.html?file=/products/demos/shipping/stats/classdemo.html#3
>>
>> "The classify function can perform classification using different types
>> of discriminant analysis. First classify the data using the default linear
>> discriminant analysis (LDA)."
>>
>> Not sure if that was the same method you are using.
>>
>>
>>  I found a function in Matlab, ClassificationDiscriminant, that performs
>> exactly the same as Python and R. So what is the real difference between
>> calling the classify algorithm in Matlab and this ClassificationDiscriminant
>> function?
>>
>>
>> On Thu, Feb 14, 2013 at 4:14 PM, <[email protected]> wrote:
>>
>>>  matlab doc online says linear classifier is lda by default.
>>>
>>>
>>>
>>> Andrew Winterman <[email protected]> schrieb:
>>>
>>>> Logistic regression can be used as a linear classifier. Maybe that's
>>>> matlab's linear classifier?
>>>>
>>>> On Thursday, February 14, 2013, David Reed wrote:
>>>>
>>>>> I was mistaken,  R is providing the exact same results as Python.  Is
>>>>> there a difference between a linear classifer and LDA?  Matlab never uses
>>>>> the words Linear Discriminant Analysis, its just says linear classifier,
>>>>> but is giving different results than these other two software packages.
>>>>>
>>>>>
>>>>> On Thu, Feb 14, 2013 at 1:19 PM, David Reed <[email protected]>wrote:
>>>>>
>>>>>> I don't think I can provide the data, but I'm trying to create some
>>>>>> simulated data that produces a similar difference.
>>>>>>
>>>>>>  I was just messing with R, and for LDA got a different result from
>>>>>> the other 2.  I wonder if its just something I am doing wrong.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 14, 2013 at 10:07 AM, Andreas Mueller <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> On 02/14/2013 04:04 PM, Andreas Mueller wrote:
>>>>>>> > On 02/14/2013 03:59 PM, David Reed wrote:
>>>>>>> >> I dont think this is the problem.  My data is definetly
>>>>>>> oversampled, I
>>>>>>> >> have 5000 samples for the 1 feature.
>>>>>>> >>
>>>>>>> >> I also should say that the problem that led be to LDA was seeing
>>>>>>> there
>>>>>>> >> was a large bias between SVM classification accuracy in sklearn
>>>>>>> and
>>>>>>> >> matlab.  I am using the same parameters on both, and again
>>>>>>> testing on
>>>>>>> >> my training.  Using the same univariate data set, I see 0.63 from
>>>>>>> >> matlab and 0.58 from sklearn.
>>>>>>> >>
>>>>>>> > Which version of scikit-learn are you using? Are you using sparse
>>>>>>> matrices?
>>>>>>> > There was a weird bug in using sparse matrices and SVMs in an
>>>>>>> earlier
>>>>>>> > version of sklearn.
>>>>>>> > If there is still a discrepancy in either of the algorithms, we
>>>>>>> must
>>>>>>> > investigate!
>>>>>>> >
>>>>>>>  Would it be possible to provide your data or even better, some
>>>>>>> small dataset
>>>>>>> to reproduce the discrepancy?
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Free Next-Gen Firewall Hardware Offer
>>>>>>> Buy your Sophos next-gen firewall before the end March 2013
>>>>>>> and get the hardware for free! Learn more.
>>>>>>> http://p.sf.net/sfu/sophos-d2d-feb
>>>>>>> _______________________________________________
>>>>>>> Scikit-learn-general mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>> --
>>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail
>>> gesendet.
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Free Next-Gen Firewall Hardware Offer
>>> Buy your Sophos next-gen firewall before the end March 2013
>>> and get the hardware for free! Learn more.
>>> http://p.sf.net/sfu/sophos-d2d-feb
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Free Next-Gen Firewall Hardware Offer
>> Buy your Sophos next-gen firewall before the end March 2013
>> and get the hardware for free! Learn more.http://p.sf.net/sfu/sophos-d2d-feb
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Free Next-Gen Firewall Hardware Offer
>> Buy your Sophos next-gen firewall before the end March 2013
>> and get the hardware for free! Learn more.
>> http://p.sf.net/sfu/sophos-d2d-feb
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
------------------------------------------------------------------------------
The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, 
is your hub for all things parallel software development, from weekly thought 
leadership blogs to news, videos, case studies, tutorials, tech docs, 
whitepapers, evaluation guides, and opinion stories. Check out the most 
recent posts - join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to