imatiach-msft commented on a change in pull request #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#discussion_r250245965
########## File path: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ########## @@ -268,4 +269,20 @@ object MLTestingUtils extends SparkFunSuite { assert(newDatasetF.schema(featuresColName).dataType.equals(new ArrayType(FloatType, false))) (newDataset, newDatasetD, newDatasetF) } + + def modelPredictionEquals[M <: PredictionModel[_, M]]( Review comment: Due to propagation of error during model training some predictions may be significantly out of tolerance (specifically for this test that one prediction which varied between values of 0.4 and 4 for the regressor, which is why in the latest update I check for 99 percent of values being within tolerance - I can actually make this bound tighter at 99.9 percent). If we make the tolerance that large so that the 0.4 vs 4 prediction passes then this test will always pass even if a bug was introduced and suddenly most prediction differences between the models may start to range from 0.4 to 4 which would not be correct, so making the tolerance very large doesn't make sense. We want to make sure that the models are roughly equivalent but they can't be exactly equal. At the very least, we want to make sure they don't diverge more with future changes (or if they do it should be for a good reason). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org