Dear Martin, >From what I understand, you want a classifier that: 1. Is not based on imputation 2. Ignores whether a value is missing or not for the inference
It seems to me that those two requirements are in contradiction, and it is not clear to me how such a classifier would be theoretically grounded. Best, Gaël On Thu, Mar 02, 2023 at 09:01:45AM +0000, Martin Gütlein wrote: > It would already help us, if someone could confirm that this is not possible > in sci-kit learn, because we are still not entirely sure that we have no > missed something.? > Regards, > Martin > Am 21.02.2023 15:48 schrieb Martin Gütlein: > > Hi, > > I am looking for a classification model in python that can handle > > missing values, without imputation and "without learning from missing > > values", i.e. without using the fact that the information is missing > > for the inference. > > Explained with the help of decision trees: > > * The algorithm should NOT learn whether missing values should go to > > the left or right child (like the HistGradientBoostingClassifier). > > * Instead it could built the prediction for each child node and > > aggregate these (like some Random Forest implementations do). > > If that is not possible in sci-kit learn, maybe you have already > > discussed this? Or you know of a fork of sci-kit learn that is able to > > do this, or some other python library? > > Any help would be really appreciated, kind regards, > > Martin > > P.S. Here is my use-case, in case you are interested: I have a binary > > classification problem with a positive and a negative class, and two > > types of features A and B. In my training data, I have a lot more data > > (90%) where B is missing. In my test data, I always have B, which is > > good because the B features are better than the A features. In the > > cases where B is present in the training data, the ratio of positive > > examples is much higher than when its missing. So what > > HistGradientBoostingClassifier does, it uses the fact that B is not > > missing in the test data, and predicts way too many positives. > > (Additionally, some feature values of type A are also often missing) > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn