Hi, I am running a Random Forest model on a dataset using hyper parameter tuning with Spark's paramGrid and Train Validation Split.
Can anyone tell me how to get the best set for all the four parameters? I used: model.bestModel() model.metrics() But none of them seem to work. Below is the code chunk: paramGrid = ParamGridBuilder() \ .addGrid(rf.numTrees, [50, 100, 150, 200]) \ .addGrid(rf.maxDepth, [5, 10, 15, 20]) \ .addGrid(rf.minInfoGain, [0.001, 0.01, 0.1, 0.6]) \ .addGrid(rf.minInstancesPerNode, [5, 15, 30, 50, 100]) \ .build() tvs = TrainValidationSplit(estimator=pipeline, estimatorParamMaps=paramGrid, evaluator=MulticlassClassificationEvaluator(), # 80% of the data will be used for training, 20% for validation. trainRatio=0.8) model = tvs.fit(trainingData) predictions = model.transform(testData) evaluator = MulticlassClassificationEvaluator( labelCol="label", predictionCol="prediction", metricName="accuracy") accuracy = evaluator.evaluate(predictions) print("Accuracy = %g" % accuracy) print("Test Error = %g" % (1.0 - accuracy)) Any help? Thanks, Aakash.