Hello Alexandre,
Will attempt an implementation first as a patch to the relevant source code , 
if I don't succeed will certainly open an issue
With my proposed code.
Thanks for the speedy reply.


-----Original Message-----
From: Alexandre Gramfort [mailto:alexandre.gramf...@telecom-paristech.fr] 
Sent: 11 November 2013 19:37
To: scikit-learn-general
Subject: Re: [Scikit-learn-general] code snippet for computing average 
confusion matrix in k-fold validation

hi Paolo,

I think too that a cross_val_confusion or cross_val_confusion_matrix function 
in sklearn.cross_validation would be handy.

Maybe you can contribute yourself? or at least open an issue?

Best,
Alex


On Mon, Nov 11, 2013 at 4:55 PM, Paolo Di Prodi <paolo.dipr...@contextis.co.uk> 
wrote:
> Hello there,
> correct me if I am wrong, but I couldn't find a method for calculating the 
> average confusion matrix from a k-fold validation routine.
> The average confusion matrix is quite handy and is used in many scientific 
> papers ... well at least the one I read!
> So I wrote a couple of functions that might be useful for other users (saw 
> also some questions on Stack Overflow).
> There are things that I don't know how to implement correctly like 
> knowing how many classes there are in total, and what happens if the matrix 
> has a different dimension in the case that some of the classes are not 
> present in the group set.
> Hopefully somebody can contribute!
>
> Example:
> clf = RandomForestClassifier(n_estimators=10)
> total_classes=list(set(Y))
> kfolder_confusion(clf,X,Y,total_classes=len(total_classes),n_folds=10)
>
>
> def average_matrix(cm):
>     """ Given a confusion matrix calculate the average """
>     result=numpy.zeros((cm.shape[0],cm.shape[1]))
>     for i in range(0,cm.shape[0]):
>         for j in range(0,cm.shape[1]):
>             result[i][j]=cm[i][j]/ (cm[i , :].sum()+cm[: , j].sum() - 
> cm[i][j] )
>     return result
>
> def kfolder_confusion(clf,corpus,label_features,total_classes,n_folds=10):
>     """ Do a K fold validation and compute the average confusion matrix """
>     kf = KFold(len(label_features), n_folds=n_folds, indices=False)
>     #initialize an empty confusion matrix
>     partial_sum=numpy.zeros((total_classes,total_classes))
>
>     for train, test in kf:
>
>         train_data = corpus[train==True]
>         test_data = corpus[test==False]
>         train_label=label_features[train==True]
>         test_label=label_features[test==False]
>         clf.fit(train_data, train_label)
>         label_pred= clf.predict(test_data)
>         # Compute confusion matrix for each fold
>         cm = confusion_matrix(test_label, label_pred)
>         # Keep the temporary sym
>         partial_sum=numpy.add(cm,partial_sum)
>
>     average_cm=average_matrix(partial_sum)
>
>     # Show the average confusion matrix
>     pl.matshow(average_cm)
>     pl.title('Average confusion matrix')
>     pl.colorbar()
>     pl.ylabel('True label')
>     pl.xlabel('Predicted label')
>     pl.show()
>
> ----------------------------------------------------------------------
> -------- November Webinars for C, C++, Fortran Developers Accelerate 
> application performance with scalable programming models. Explore 
> techniques for threading, error checking, porting, and tuning. Get the 
> most from the latest Intel processors and coprocessors. See abstracts 
> and register 
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.c
> lktrk _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers Accelerate application 
performance with scalable programming models. Explore techniques for threading, 
error checking, porting, and tuning. Get the most from the latest Intel 
processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to