Hi

I would appreciate if you could let me know what is the best way to
categorize the approaches which have been developed to deal with imbalance
class problem?

*This article
<https://www.sciencedirect.com/science/article/pii/S0020025513005124>
categorizes them into:*

   1. Preprocessing: includes oversampling, undersampling and hybrid
   methods,
   2. Cost-sensitive learning: includes direct methods and meta-learning
   which the latter further divides into thresholding and sampling,
   3. Ensemble techniques: includes cost-sensitive ensembles and data
   preprocessing in conjunction with ensemble learning.

*The second <https://dl.acm.org/citation.cfm?id=2907070> classification:*

   1. Data Pre-processing: includes distribution change and weighting the
   data space. One-class learning is considered as distribution change.
   2. Special-purpose Learning Methods
   3. Prediction Post-processing: includes threshold method and
   cost-sensitive post-processing
   4. Hybrid Methods:

*The third article
<https://link.springer.com/article/10.1007/s13748-016-0094-0>:*

   1. Data-level methods
   2. Algorithm-level methods
   3. Hybrid methods

The last classification also considers output adjustment as an independent
approach.

Could you please let me know the class-weight in the sklearn's classifiers
e.g., logistic regression is classified into which category? Is it true to
say:

In case of the first categorization, it falls into cost-sensitive learning

In case of the second taxonomy, it would be classified into the third
category i.e., cost-sensitive post-processing

In case of the third classification, it should fall into algorithm level

Best regards,
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to