Hi I would appreciate if you could let me know what is the best way to categorize the approaches which have been developed to deal with imbalance class problem?
*This article <https://www.sciencedirect.com/science/article/pii/S0020025513005124> categorizes them into:* 1. Preprocessing: includes oversampling, undersampling and hybrid methods, 2. Cost-sensitive learning: includes direct methods and meta-learning which the latter further divides into thresholding and sampling, 3. Ensemble techniques: includes cost-sensitive ensembles and data preprocessing in conjunction with ensemble learning. *The second <https://dl.acm.org/citation.cfm?id=2907070> classification:* 1. Data Pre-processing: includes distribution change and weighting the data space. One-class learning is considered as distribution change. 2. Special-purpose Learning Methods 3. Prediction Post-processing: includes threshold method and cost-sensitive post-processing 4. Hybrid Methods: *The third article <https://link.springer.com/article/10.1007/s13748-016-0094-0>:* 1. Data-level methods 2. Algorithm-level methods 3. Hybrid methods The last classification also considers output adjustment as an independent approach. Could you please let me know the class-weight in the sklearn's classifiers e.g., logistic regression is classified into which category? Is it true to say: In case of the first categorization, it falls into cost-sensitive learning In case of the second taxonomy, it would be classified into the third category i.e., cost-sensitive post-processing In case of the third classification, it should fall into algorithm level Best regards,
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn