Re: [scikit-learn] classification model that can handle missing values w/o learning from missing values

Martin Gütlein Thu, 02 Mar 2023 01:03:41 -0800

It would already help us, if someone could confirm that this is notpossible in sci-kit learn, because we are still not entirely sure thatwe have no missed something.?


Regards,
Martin


Am 21.02.2023 15:48 schrieb Martin Gütlein:

Hi,

I am looking for a classification model in python that can handle
missing values, without imputation and "without learning from missing
values", i.e. without using the fact that the information is missing
for the inference.

Explained with the help of decision trees:
* The algorithm should NOT learn whether missing values should go to
the left or right child (like the HistGradientBoostingClassifier).
* Instead it could built the prediction for each child node and
aggregate these (like some Random Forest implementations do).

If that is not possible in sci-kit learn, maybe you have already
discussed this? Or you know of a fork of sci-kit learn that is able to
do this, or some other python library?

Any help would be really appreciated, kind regards,
Martin


P.S. Here is my use-case, in case you are interested: I have a binary
classification problem with a positive and a negative class, and two
types of features A and B. In my training data, I have a lot more data
(90%) where B is missing. In my test data, I always have B, which is
good because the B features are better than the A features. In the
cases where B is present in the training data, the ratio of positive
examples is much higher than when its missing. So what
HistGradientBoostingClassifier does, it uses the fact that B is not
missing in the test data, and predicts way too many positives.
(Additionally, some feature values of type A are also often missing)

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] classification model that can handle missing values w/o learning from missing values

Reply via email to