[GitHub] imatiach-msft commented on a change in pull request #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

GitBox Wed, 23 Jan 2019 07:36:15 -0800

imatiach-msft commented on a change in pull request #21632: 
[SPARK-19591][ML][MLlib] Add sample weights to decision trees
URL: https://github.com/apache/spark/pull/21632#discussion_r250245965


 ##########
 File path: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala
 ##########
 @@ -268,4 +269,20 @@ object MLTestingUtils extends SparkFunSuite {
     assert(newDatasetF.schema(featuresColName).dataType.equals(new 
ArrayType(FloatType, false)))
     (newDataset, newDatasetD, newDatasetF)
   }
+
+  def modelPredictionEquals[M <: PredictionModel[_, M]](
 
 Review comment:
   Due to propagation of error during model training some predictions may be 
significantly out of tolerance (specifically for this test that one prediction 
which varied between values of 0.4 and 4 for the regressor, which is why in the 
latest update I check for 99 percent of values being within tolerance - I can 
actually make this bound tighter at 99.9 percent).  If we make the tolerance 
that large so that the 0.4 vs 4 prediction passes then this test will always 
pass even if a bug was introduced and suddenly most prediction differences 
between the models may start to range from 0.4 to 4 which would not be correct, 
so making the tolerance very large doesn't make sense.  We want to make sure 
that the models are roughly equivalent but they can't be exactly equal.  At the 
very least, we want to make sure they don't diverge more with future changes 
(or if they do it should be for a good reason).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] imatiach-msft commented on a change in pull request #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision trees

Reply via email to