Re: [Scikit-learn-general] Evaluation measure for imbalanced data

Dayvid Victor Wed, 23 Jul 2014 09:58:26 -0700

Hi,

Like Mathiel Blondel said, the AUC (Area under the ROC Curve) is the most
popular metric.


def auc_score(y_true, y_pred, pos_label=1):
    fp_rate, tp_rate, thresholds = sk.metrics.roc_curve(
        y_true, y_pred, pos_label=pos_label)
    return sk.metrics.auc(fp_rate, tp_rate)


Or you might use the trapezoid aproximation: auc = (1 + TP_rate - FN_rate)/
2

But in case you want to know others, check out those references, they
present some other metrics that you might find interesting:


   - [57] V. García, R.A. Mollineda, J.S. Sánchez, Classifier performance
   assessment in two-class imbalanced problems, Internal Communication. (2012)

   - [105] T. Raeder, G. Forman, N.V. Chawla, Learning from imbalanced
   data: evaluation matters, in: D.E. Holmes, L.C. Jain (Eds.), Data Mining:
   Found. and Intell. Paradigms, vol. ISRL 23, Springer-Verlag, 2012, pp.
   315–331. (2012).


And if you're working with specific imbalanced dataset in pattern
recognition, I'd recommed the following paper:

   - López, Victoria, et al. "An insight into classification with
   imbalanced data: Empirical results and current trends on using data
   intrinsic characteristics."*Information Sciences* 250 (2013): 113-141.


Good luck!




On Wed, Jul 23, 2014 at 12:53 PM, Emanuele Olivetti <emanu...@relativita.com
> wrote:

>  Hi,
>
> In addition to what has already been suggested, especially Chi^2 and MCC,
> I would suggest this:
>   http://dx.doi.org/10.1109/PRNI.2012.14         (full disclosure: it is
> one of my papers)
> which is, in short, a Bayesian equivalent of Chi^2 / MCC, which works for
> binary and multi-class and does not suffer most (if not all) the problems
> of Chi^2 and MCC. Notice that, in the multi-class case, the proposed method
> is also extended to detect the case where only some of the classes are not
> discriminated, but not all.
>
> A Python implementation of the proposed algorithm (a bit updated since
> that paper) is here:
>   https://github.com/emanuele/inference_with_classifiers
> Feel free to ask if you need help.
>
> An extended version of the paper is in preparation.
>
> Best,
>
> Emanuele
>
>
> On 07/22/2014 05:26 PM, Hamed Zamani wrote:
>
>  Hi,
>
>  I am working on a binary classification problem in which both training
> and test data are highly imbalanced. In other words, the number of
> instances available in one class is far more than the other one.
>
>  Would you please let me know which evaluation measure is the best one to
> compare different methods in imbalanced situations? Please note that
> predicting the label of instances of the class which contains lower
> instances is really harder than predicting the labels of the other
> instances and I am looking for a evaluation measure which consider this
> issue.
>
>  I am wondering if you also provide me a reference for your opinions.
>
>  Thanks a lot,
> Best regards,
>  Hamed
>
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.http://p.sf.net/sfu/bds
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
*Dayvid Victor R. de Oliveira*
PhD Candidate in Computer Science at Federal University of Pernambuco (UFPE)
MSc in Computer Science at Federal University of Pernambuco (UFPE)
BSc in Computer Engineering - Federal University of Pernambuco (UFPE)

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

Reply via email to