[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

srowen Wed, 07 Feb 2018 14:16:27 -0800

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20472#discussion_r166770380
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
    @@ -931,7 +925,8 @@ private[spark] object RandomForest extends Logging {
         val numFeatures = metadata.numFeatures
         val splits: Array[Array[Split]] = Array.tabulate(numFeatures) {
           case i if metadata.isContinuous(i) =>
    -        val split = continuousSplits(i)
    +        // some features may only contains zero, so continuousSplits will 
not have a record
    +        val split = if (continuousSplits.contains(i)) continuousSplits(i) 
else Array.empty[Split]
    --- End diff --
    
    Just `continuousSplits.getOrElse(i, Array.empty[Split])`? Not that it 
matters, but also avoids searching the map twice.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

Reply via email to