Hi Aakash, First you will want to get the the random forest model stage from the best pipeline model result, for example if RF is the first stage:
rfModel = model.bestModel.stages[0] Then you can check the values of the params you tuned like this: rfModel.getNumTrees On Mon, Apr 16, 2018 at 7:52 AM, Aakash Basu <aakash.spark....@gmail.com> wrote: > Hi, > > I am running a Random Forest model on a dataset using hyper parameter > tuning with Spark's paramGrid and Train Validation Split. > > Can anyone tell me how to get the best set for all the four parameters? > > I used: > > model.bestModel() > model.metrics() > > > But none of them seem to work. > > > Below is the code chunk: > > paramGrid = ParamGridBuilder() \ > .addGrid(rf.numTrees, [50, 100, 150, 200]) \ > .addGrid(rf.maxDepth, [5, 10, 15, 20]) \ > .addGrid(rf.minInfoGain, [0.001, 0.01, 0.1, 0.6]) \ > .addGrid(rf.minInstancesPerNode, [5, 15, 30, 50, 100]) \ > .build() > > tvs = TrainValidationSplit(estimator=pipeline, > estimatorParamMaps=paramGrid, > evaluator=MulticlassClassificationEvaluator(), > # 80% of the data will be used for training, 20% > for validation. > trainRatio=0.8) > > model = tvs.fit(trainingData) > > predictions = model.transform(testData) > > evaluator = MulticlassClassificationEvaluator( > labelCol="label", predictionCol="prediction", metricName="accuracy") > accuracy = evaluator.evaluate(predictions) > print("Accuracy = %g" % accuracy) > print("Test Error = %g" % (1.0 - accuracy)) > > > Any help? > > > Thanks, > Aakash. >