imatiach-msft commented on a change in pull request #21632:
[SPARK-19591][ML][MLlib] Add sample weights to decision trees
URL: https://github.com/apache/spark/pull/21632#discussion_r250245965
##########
File path: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala
##########
@@ -268,4 +269,20 @@ object MLTestingUtils extends SparkFunSuite {
assert(newDatasetF.schema(featuresColName).dataType.equals(new
ArrayType(FloatType, false)))
(newDataset, newDatasetD, newDatasetF)
}
+
+ def modelPredictionEquals[M <: PredictionModel[_, M]](
Review comment:
Due to propagation of error during model training some predictions may be
significantly out of tolerance (specifically for this test that one prediction
which varied between values of 0.4 and 4 for the regressor, which is why in the
latest update I check for 99 percent of values being within tolerance - I can
actually make this bound tighter at 99.9 percent). If we make the tolerance
that large so that the 0.4 vs 4 prediction passes then this test will always
pass even if a bug was introduced and suddenly most prediction differences
between the models may start to range from 0.4 to 4 which would not be correct,
so making the tolerance very large doesn't make sense. We want to make sure
that the models are roughly equivalent but they can't be exactly equal. At the
very least, we want to make sure they don't diverge more with future changes
(or if they do it should be for a good reason).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]