imatiach-msft commented on a change in pull request #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees URL: https://github.com/apache/spark/pull/21632#discussion_r250051805
########## File path: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ########## @@ -268,4 +269,20 @@ object MLTestingUtils extends SparkFunSuite { assert(newDatasetF.schema(featuresColName).dataType.equals(new ArrayType(FloatType, false))) (newDataset, newDatasetD, newDatasetF) } + + def modelPredictionEquals[M <: PredictionModel[_, M]]( Review comment: For regression case, it seems I can slightly increase the tolerance and get .99 of the cases within tolerance, but there still seems to be a prediction that differs - the difference is due to the model being slightly different due to the propagation of error (eg the splits in the trees are slightly different and over the course of training the trees diverge). For the classification case, the predictions differ more - we are comparing the 0/1 labels, tolerance isn't used there; again the difference in the models seems to be due to propagation of error. I've updated the regressor tests; unfortunately for classification I don't think I can do much else. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org