huaxingao commented on a change in pull request #27097:
[SPARK-9478][ML][PYSPARK] Add sample weights to Random Forest
URL: https://github.com/apache/spark/pull/27097#discussion_r365497142
##########
File path:
mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
##########
@@ -259,6 +263,37 @@ class RandomForestClassifierSuite extends MLTest with
DefaultReadWriteTest {
})
}
+ test("training with sample weights") {
+ val df = binaryDataset
+ val numClasses = 2
+ // (numTrees, maxDepth, subsamplingRate, fractionInTol)
+ val testParams = Seq(
+ (20, 5, 1.0, 0.96),
+ (20, 10, 1.0, 0.96),
+ (20, 10, 0.95, 0.96)
+ )
Review comment:
The reason I suggested testing different impurities is because when
calculating best split, the impurity path (both ```entropy``` and ```gini```)
is affected by sample weight. However, after taking a look at the DecisionTree
test, I saw both ```entropy``` and ```gini``` are tested with sample weight
there, so this is already covered in DecisionTree test, no need to test here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]