imatiach-msft commented on a change in pull request #21632:
[SPARK-19591][ML][MLlib] Add sample weights to decision trees
URL: https://github.com/apache/spark/pull/21632#discussion_r250461968
##########
File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
##########
@@ -74,6 +74,21 @@ private[ml] trait DecisionTreeParams extends PredictorParams
" child to have fewer than minInstancesPerNode, the split will be
discarded as invalid." +
" Should be >= 1.", ParamValidators.gtEq(1))
+ /**
+ * Minimum fraction of the weighted sample count that each child must have
after split.
+ * If a split causes the fraction of the total weight in the left or right
child to be less than
+ * minWeightFractionPerNode, the split will be discarded as invalid.
+ * Should be in the interval [0.0, 0.5).
+ * (default = 0.0)
+ * @group param
+ */
+ final val minWeightFractionPerNode: DoubleParam = new DoubleParam(this,
+ "minWeightFractionPerNode", "Minimum fraction of the weighted sample count
that each child " +
+ "must have after split. If a split causes the fraction of the total weight
in the left or " +
+ "right child to be less than minWeightFractionPerNode, the split will be
discarded as " +
+ "invalid. Should be in interval [0.0, 0.5)",
+ ParamValidators.inRange(0.0, 0.5, lowerInclusive = true, upperInclusive =
false))
Review comment:
"the max is 0.5 because at least one of the two children will have <= 50% of
the samples?"
exactly, the two children together must add up to a ratio of 1, and this is
for discarding the split if one of them is too low, so it only makes sense to
discard the smaller one
"= 0.5 isn't really meaningful because that will always be true"
sorry, not sure what always true refers to - if you mean that we would
always discard the split yes. 0.5 doesn't make sense because we would discard
all splits.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]