Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20472#discussion_r165341639
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
    @@ -917,11 +916,15 @@ private[spark] object RandomForest extends Logging {
           // being spun up that will definitely do no work.
           val numPartitions = math.min(continuousFeatures.length, 
input.partitions.length)
     
    +      val numInput = input.count()
    +      val bcNumInput = input.sparkContext.broadcast(numInput)
    +
           input
             .flatMap(point => continuousFeatures.map(idx => (idx, 
point.features(idx))))
    --- End diff --
    
    instead of adding the filter method there, here you can avoid the 
generation of the record itself if `point.features(idx)` is 0.0


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to