Github user codedeft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2607#discussion_r19567364
  
    --- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/DecisionTreeRunner.scala
 ---
    @@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.LabeledPoint
     import org.apache.spark.mllib.tree.{RandomForest, DecisionTree, impurity}
     import org.apache.spark.mllib.tree.configuration.{Algo, Strategy}
     import org.apache.spark.mllib.tree.configuration.Algo._
    -import org.apache.spark.mllib.tree.model.{RandomForestModel, 
DecisionTreeModel}
    +import org.apache.spark.mllib.tree.model.{WeightedEnsembleModel, 
DecisionTreeModel}
    --- End diff --
    
    Yea, I guess from the design perspective, it's tempting to unify these 
under the same umbrella.
    
    IMO, RandomForest is *mostly* a specific instance of a generic ensemble 
model, so this makes sense.
    
    However, I think that boosted models have some specific things about them 
due to their sequential nature (as opposed to parallel nature of RandomForest). 
E.g., if you have 1000 models, you can potentially predict based on the *first* 
100 models whereas with RandomForest you can pick any 100. You also have to do 
overfitting/underfitting analyses on boosted models sequentially, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to