Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20472#discussion_r165343004
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -917,11 +916,15 @@ private[spark] object RandomForest extends Logging {
// being spun up that will definitely do no work.
val numPartitions = math.min(continuousFeatures.length,
input.partitions.length)
+ val numInput = input.count()
--- End diff --
we can get this from the `metadata.numExamples * fraction` operation in the
calling method in order to avoid another job to perform the count
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]