Github user codedeft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2607#discussion_r19570062
  
    --- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
 ---
    @@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.LabeledPoint
     import org.apache.spark.mllib.tree.{RandomForest, DecisionTree, impurity}
     import org.apache.spark.mllib.tree.configuration.{Algo, Strategy}
     import org.apache.spark.mllib.tree.configuration.Algo._
    -import org.apache.spark.mllib.tree.model.{RandomForestModel, 
DecisionTreeModel}
    +import org.apache.spark.mllib.tree.model.{WeightedEnsembleModel, 
DecisionTreeModel}
    --- End diff --
    
    @manishamde Sounds good.
    
    Just a side note. Because RF models tend to be much bigger than boosted 
ensembles, we've encountered situations where the model was *too* big to fit in 
a single machine memory. RandomForest model is in a way a good model for 
embarassingly parallel predictions so a model could potentially reside in a 
distributed fashion.
    
    But we haven't yet decided whether we really want to do this (i.e. are 
humongous models really useful in practice and do we really expect crazy 
scenarios of gigantic models surpassing dozens of GBs?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to