imatiach-msft commented on a change in pull request #21632: 
[SPARK-19591][ML][MLlib] Add sample weights to decision trees
URL: https://github.com/apache/spark/pull/21632#discussion_r250461968
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
 ##########
 @@ -74,6 +74,21 @@ private[ml] trait DecisionTreeParams extends PredictorParams
     " child to have fewer than minInstancesPerNode, the split will be 
discarded as invalid." +
     " Should be >= 1.", ParamValidators.gtEq(1))
 
+  /**
+   * Minimum fraction of the weighted sample count that each child must have 
after split.
+   * If a split causes the fraction of the total weight in the left or right 
child to be less than
+   * minWeightFractionPerNode, the split will be discarded as invalid.
+   * Should be in the interval [0.0, 0.5).
+   * (default = 0.0)
+   * @group param
+   */
+  final val minWeightFractionPerNode: DoubleParam = new DoubleParam(this,
+    "minWeightFractionPerNode", "Minimum fraction of the weighted sample count 
that each child " +
+    "must have after split. If a split causes the fraction of the total weight 
in the left or " +
+    "right child to be less than minWeightFractionPerNode, the split will be 
discarded as " +
+    "invalid. Should be in interval [0.0, 0.5)",
+    ParamValidators.inRange(0.0, 0.5, lowerInclusive = true, upperInclusive = 
false))
 
 Review comment:
   "the max is 0.5 because at least one of the two children will have <= 50% of 
the samples?"
   exactly, the two children together must add up to a ratio of 1, and this is 
for discarding the split if one of them is too low, so it only makes sense to 
discard the smaller one
   "= 0.5 isn't really meaningful because that will always be true"
   sorry, not sure what always true refers to - if you mean that we would 
always discard the split yes.  0.5 doesn't make sense because we would discard 
all splits.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to