Waleed Esmail created SPARK-23414: ------------------------------------- Summary: Plotting using matplotlib in MLlib pyspark Key: SPARK-23414 URL: https://issues.apache.org/jira/browse/SPARK-23414 Project: Spark Issue Type: Question Components: MLlib Affects Versions: 2.2.1 Reporter: Waleed Esmail
Dear MLlib experts, I just want to plot a fancy confusion matrix (true values vs predicted values) like the one produced by seaborn module in python, so I did the following: {code:java} labelIndexer = StringIndexer(inputCol="label", outputCol="indexedLabel").fit(output) # Automatically identify categorical features, and index them. # We specify maxCategories so features with > 4 distinct values are treated as continuous. featureIndexer = VectorIndexer(inputCol="features", outputCol="indexedFeatures").fit(output) # Split the data into training and test sets (30% held out for testing) (trainingData, testData) = output.randomSplit([0.7, 0.3]) dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures", maxDepth=15) # Chain indexers and tree in a Pipeline pipeline = Pipeline(stages=[labelIndexer, featureIndexer, dt]) # Train model. This also runs the indexers. model = pipeline.fit(trainingData) # Make predictions. predictions = model.transform(testData) predictionAndLabels = predictions.select("prediction", "indexedLabel") y_predicted = np.array(predictions.select("prediction").collect()) y_test = np.array(predictions.select("indexedLabel").collect()) from sklearn.metrics import confusion_matrix import matplotlib.ticker as ticker figcm, ax = plt.subplots() cm = confusion_matrix(y_test, y_predicted) # for normalization cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] sns.heatmap(cm, square=True, annot=True, cbar=False) plt.xlabel('predication') plt.ylabel('true value') {code} is this the right way to do it?!. please note that I am new to Spark and MLlib thank you in advance, -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org