Thanks a lot lance. Let me elaborate the problem if it was a bit confusing.
Assuming I am making a binary classifier using SGD. I have got 50 positive and 50 negative examples to train the classifier. After training and testing the model, the confusion matrix tells you the number of correctly and incorrectly classified instances. Let's assume I got 85% correct and 15% incorrect instances. Now if I run my program again using the same 50 negative and 50 positive examples, then according to my knowledge the classifier should yield the same results as before (cause not even a single training or testing data was changed), but this is not the case. I get different results for different runs. The confusion matrix figures changes each time I generate a model keeping the data constant. What I do is, I generate a model several times and keep a look for the accuracy, and if it is above 90%, then I stop running the code and hence an accurate model is created. So what you are saying is to shuffle my data before I use it for training and testing? Thanks! On Aug 31, 2012, at 10:33 AM, Lance Norskog wrote: > Now I remember: SGD wants its data input in random order. You need to > permute the order of your data. > > If that does not help, another trick: for each data point, randomly > generate 5 or 10 or 20 points which are close. And again, randomly > permute the entire input set. > > On Thu, Aug 30, 2012 at 5:23 PM, Lance Norskog <[email protected]> wrote: >> The more data you have, the closer each run will be. How much data do you >> have? >> >> On Thu, Aug 30, 2012 at 2:49 PM, Salman Mahmood <[email protected]> >> wrote: >>> I have noticed that every time I train and test a model using the same data >>> (in SGD algo), I get different confusion matrix. Meaning, if I generate a >>> model and look at the confusion matrix, it might say 90% correctly >>> classified instances, but if I generate the model again (with the SAME data >>> for training and testing as before) and test it, the confusion matrix >>> changes and it might say 75% correctly classified instances. >>> >>> Is this a desired behavior? >> >> >> >> -- >> Lance Norskog >> [email protected] > > > > -- > Lance Norskog > [email protected]
