All supervised learning algorithms in Spark work the same way. You provide a set of ‘features’ (X) and a corresponding label (y) as part of a pipeline and call the fit method on the pipeline. The output of this is a model. You can then provide new examples (new Xs) to a transform method on the model that will give you a prediction for those examples. This means that the code for running different algorithms often looks very similar. The details of the algorithm are hidden behind the fit/transform interface.
In the case of Random Forest the implementation in Spark (i.e. behind the interface) is to create a number of different decision tree models (often quite simple models) and then ensemble the results of each decision tree. You don’t need to ‘create’ the decision trees yourself, that is handled by the implementation. Hope that helps Robin ------------------------------------------------------------------------------- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action> > On 4 Aug 2016, at 09:48, 陈哲 <czhenj...@gmail.com> wrote: > > Hi all > I'm trying to use spark ml to do some prediction with random forest. By > reading the example code > https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaRandomForestClassifierExample.java > > <https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaRandomForestClassifierExample.java> > , I can only find out it's similar to > https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java > > <https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaDecisionTreeClassificationExample.java>. > Is random forest algorithm suppose to use multiple decision trees to work. > I'm new about spark and ml. Is there anyone help me, maybe provide > example about using multiple decision trees in random forest in spark > > Thanks > Best Regards > Patrick