Re: SGD diferent confusion matrix for each run

Salman Mahmood Fri, 31 Aug 2012 02:04:09 -0700

Thanks a lot lance. Let me elaborate the problem if it was a bit confusing.

Assuming I am making a binary classifier using SGD. I have got 50 positive and 
50 negative examples to train the classifier. After training and testing the 
model, the confusion matrix tells you the number of correctly and incorrectly 
classified instances. Let's assume I got 85% correct and 15% incorrect 
instances.

Now if I run my program again using the same 50 negative and 50 positive 
examples, then according to my knowledge the classifier should yield the same 
results as before (cause not even a single training or testing data was 
changed), but this is not the case. I get different results for different runs. 
The confusion matrix figures changes each time I generate a model keeping the 
data constant. What I do is, I generate a model several times and keep a look 
for the accuracy, and if it is above 90%, then I stop running the code and 
hence an accurate model is created.

So what you are saying is to shuffle my data before I use it for training and 
testing?
Thanks! 
On Aug 31, 2012, at 10:33 AM, Lance Norskog wrote:

> Now I remember: SGD wants its data input in random order. You need to
> permute the order of your data.
> 
> If that does not help, another trick: for each data point, randomly
> generate 5 or 10 or 20 points which are close. And again, randomly
> permute the entire input set.
> 
> On Thu, Aug 30, 2012 at 5:23 PM, Lance Norskog <[email protected]> wrote:
>> The more data you have, the closer each run will be. How much data do you 
>> have?
>> 
>> On Thu, Aug 30, 2012 at 2:49 PM, Salman Mahmood <[email protected]> 
>> wrote:
>>> I have noticed that every time I train and test a model using the same data 
>>> (in SGD algo), I get different confusion matrix. Meaning, if I generate a 
>>> model and look at the confusion matrix, it might say 90% correctly 
>>> classified instances, but if I generate the model again (with the SAME data 
>>> for training and testing as before) and test it, the confusion matrix 
>>> changes and it might say 75% correctly classified instances.
>>> 
>>> Is this a desired behavior?
>> 
>> 
>> 
>> --
>> Lance Norskog
>> [email protected]
> 
> 
> 
> -- 
> Lance Norskog
> [email protected]

Re: SGD diferent confusion matrix for each run

Reply via email to