zhengruifeng commented on issue #25926: [SPARK-9612][ML] Add instance weight support for GBTs URL: https://github.com/apache/spark/pull/25926#issuecomment-539339480 @imatiach-msft Thanks for reviewing! As to the points on *weighted* prediction error: After previous discussions, we should sample the data without weights, and pass the weights into the base model (decision tree). So the input passed to a decsion tree, should contain the label (unweighted prediction error) and the instance weights (which will be also used in `minWeightFractionPerNode`). In this way, I guess we do not need to cache weighted error. Moreover, the code `predError.values.mean()` with weighted `predError` is not equal to the average weighted error in this PR. PS: If I recall correctly, XGBoost pass weighted gradients and hessions into base learner. It use minimum hession (`min_child_weight`) to limit tree growth, which is quite different from MLLIB.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
