Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r165343004 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -917,11 +916,15 @@ private[spark] object RandomForest extends Logging { // being spun up that will definitely do no work. val numPartitions = math.min(continuousFeatures.length, input.partitions.length) + val numInput = input.count() --- End diff -- we can get this from the `metadata.numExamples * fraction` operation in the calling method in order to avoid another job to perform the count
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org