Hello,

I would like to add a custom loss function for gradient boosting regression. The function is similar to least squares, except that for each example it is OK to either undershoot or overshoot the target - loss is zero then. There is an additional binary indicator called "under" telling us whether it is OK to undershoot or overshoot. For example:

y    under    p    loss
5    1        4    0
5    0        4    1
5    1        6    1
5    0        6    0

Below is my attempt at implementation. I have three questions:

1. Is it correct?
2. How would you pass "under" to the loss function?
3. Functions other than LeastSquaresError() seem to _update_terminal_regions_. Is this necessary in this case, and if so, how to do it?


    def __call__(self, y, pred, sample_weight=None):
        if sample_weight is None:
            squares = (y - pred.ravel()) ** 2.0

            # the custom part
            overshoot_ok = (pred > y) & (under == 0)
            undershoot_ok = (pred < y) & (under == 1)
            squares[overshoot_ok] = 0
            squares[undershoot_ok] = 0

            return np.mean(squares)
        else:
            (...)

    def negative_gradient(self, y, pred, **kargs):
        diffs = y - pred.ravel()

        overshoot_ok = (pred > y) & (under == 0)
        undershoot_ok = (pred < y) & (under == 1)
        diffs[overshoot_ok] = 0
        diffs[undershoot_ok] = 0

        return diffs
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to