Re: [Scikit-learn-general] Evaluation measure for imbalanced data

Dayvid Victor Wed, 23 Jul 2014 10:07:56 -0700

Hamed, I am sorry, the correct trapezoidal approximation is:

   - AUC = (1 + TP_rate - FP_rate) / 2


Also, keep in mind that, when dealing with binary imbalanced datasets, you
can calculate as:

auc = (1.0 + t_mn - (1.0 - t_mj)) / 2; Where t_mn is the minority class
accuracy, and t_mj the majority class accuracy. []'s


On Wed, Jul 23, 2014 at 1:57 PM, Dayvid Victor <victor.d...@gmail.com>
wrote:

> Hi,
>
> Like Mathiel Blondel said, the AUC (Area under the ROC Curve) is the most
> popular metric.
>
> def auc_score(y_true, y_pred, pos_label=1):
>
>     fp_rate, tp_rate, thresholds = sk.metrics.roc_curve(
>
>         y_true, y_pred, pos_label=pos_label)
>
>     return sk.metrics.auc(fp_rate, tp_rate)
>
>
> Or you might use the trapezoid aproximation: auc = (1 + TP_rate -
> FN_rate)/ 2
>
> But in case you want to know others, check out those references, they
> present some other metrics that you might find interesting:
>
>
>    - [57] V. García, R.A. Mollineda, J.S. Sánchez, Classifier performance
>    assessment in two-class imbalanced problems, Internal Communication. (2012)
>
>    - [105] T. Raeder, G. Forman, N.V. Chawla, Learning from imbalanced
>    data: evaluation matters, in: D.E. Holmes, L.C. Jain (Eds.), Data Mining:
>    Found. and Intell. Paradigms, vol. ISRL 23, Springer-Verlag, 2012, pp.
>    315–331. (2012).
>
>
> And if you're working with specific imbalanced dataset in pattern
> recognition, I'd recommed the following paper:
>
>    - López, Victoria, et al. "An insight into classification with
>    imbalanced data: Empirical results and current trends on using data
>    intrinsic characteristics."*Information Sciences* 250 (2013): 113-141.
>
>
> Good luck!
>
>
>
>
> On Wed, Jul 23, 2014 at 12:53 PM, Emanuele Olivetti <
> emanu...@relativita.com> wrote:
>
>>  Hi,
>>
>> In addition to what has already been suggested, especially Chi^2 and MCC,
>> I would suggest this:
>>   http://dx.doi.org/10.1109/PRNI.2012.14         (full disclosure: it is
>> one of my papers)
>> which is, in short, a Bayesian equivalent of Chi^2 / MCC, which works for
>> binary and multi-class and does not suffer most (if not all) the problems
>> of Chi^2 and MCC. Notice that, in the multi-class case, the proposed method
>> is also extended to detect the case where only some of the classes are not
>> discriminated, but not all.
>>
>> A Python implementation of the proposed algorithm (a bit updated since
>> that paper) is here:
>>   https://github.com/emanuele/inference_with_classifiers
>> Feel free to ask if you need help.
>>
>> An extended version of the paper is in preparation.
>>
>> Best,
>>
>> Emanuele
>>
>>
>> On 07/22/2014 05:26 PM, Hamed Zamani wrote:
>>
>>  Hi,
>>
>>  I am working on a binary classification problem in which both training
>> and test data are highly imbalanced. In other words, the number of
>> instances available in one class is far more than the other one.
>>
>>  Would you please let me know which evaluation measure is the best one
>> to compare different methods in imbalanced situations? Please note that
>> predicting the label of instances of the class which contains lower
>> instances is really harder than predicting the labels of the other
>> instances and I am looking for a evaluation measure which consider this
>> issue.
>>
>>  I am wondering if you also provide me a reference for your opinions.
>>
>>  Thanks a lot,
>> Best regards,
>>  Hamed
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck
>> Code Sight - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.http://p.sf.net/sfu/bds
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck
>> Code Sight - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.
>> http://p.sf.net/sfu/bds
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> --
> *Dayvid Victor R. de Oliveira*
> PhD Candidate in Computer Science at Federal University of Pernambuco
> (UFPE)
> MSc in Computer Science at Federal University of Pernambuco (UFPE)
> BSc in Computer Engineering - Federal University of Pernambuco (UFPE)
>



-- 
*Dayvid Victor R. de Oliveira*
PhD Candidate in Computer Science at Federal University of Pernambuco (UFPE)
MSc in Computer Science at Federal University of Pernambuco (UFPE)
BSc in Computer Engineering - Federal University of Pernambuco (UFPE)

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

Reply via email to