Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20472#discussion_r166770380
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -931,7 +925,8 @@ private[spark] object RandomForest extends Logging {
val numFeatures = metadata.numFeatures
val splits: Array[Array[Split]] = Array.tabulate(numFeatures) {
case i if metadata.isContinuous(i) =>
- val split = continuousSplits(i)
+ // some features may only contains zero, so continuousSplits will
not have a record
+ val split = if (continuousSplits.contains(i)) continuousSplits(i)
else Array.empty[Split]
--- End diff --
Just `continuousSplits.getOrElse(i, Array.empty[Split])`? Not that it
matters, but also avoids searching the map twice.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]