Hi There, I was comparing Randomforest in sparkml(org.apache.spark.ml.classification) and spark mllib(org.apache.spark.mllib.tree) using the same datasets and same parameter settings, spark mllib always gives me better results on test data sets. I was wondering
1. Did anyone notice similar performance difference as I do? 2. How to output parameters for Pipelinemodel? for example: I want to output the parameters trained for RandomForestClassifier. None of these (model.params.toString or model.explainParams() or model.extractParamMap()) output meaningful parameters such as totalNumNodes etc. *val *rf = *new *RandomForestClassifier() .setFeaturesCol(*"features"*) .setLabelCol(*"label"*) .setNumTrees(100) .setFeatureSubsetStrategy(*"auto"*) .setImpurity(*"entropy"*) .setMaxDepth(4) .setMaxBins(32) *val *indexer = *new *StringIndexer() .setInputCol(*"category"*) .setOutputCol(*"label"*) *val *pipeline = *new *Pipeline().setStages(*Array*(indexer, rf)) *val *model: PipelineModel = pipeline.fit(trainingData) thanks, pengcheng