[scikit-learn] classification model that can handle missing values w/o learning from missing values

Martin Gütlein Tue, 21 Feb 2023 06:50:37 -0800

Hi,

I am looking for a classification model in python that can handlemissing values, without imputation and "without learning from missingvalues", i.e. without using the fact that the information is missing forthe inference.


Explained with the help of decision trees:

* The algorithm should NOT learn whether missing values should go to theleft or right child (like the HistGradientBoostingClassifier).* Instead it could built the prediction for each child node andaggregate these (like some Random Forest implementations do).

If that is not possible in sci-kit learn, maybe you have alreadydiscussed this? Or you know of a fork of sci-kit learn that is able todo this, or some other python library?


Any help would be really appreciated, kind regards,
Martin

P.S. Here is my use-case, in case you are interested: I have a binaryclassification problem with a positive and a negative class, and twotypes of features A and B. In my training data, I have a lot more data(90%) where B is missing. In my test data, I always have B, which isgood because the B features are better than the A features. In the caseswhere B is present in the training data, the ratio of positive examplesis much higher than when its missing. So whatHistGradientBoostingClassifier does, it uses the fact that B is notmissing in the test data, and predicts way too many positives.(Additionally, some feature values of type A are also often missing)

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] classification model that can handle missing values w/o learning from missing values

Reply via email to