[MLLib] Is the order of the coefficients in a LogisticRegresionModel kept ?
Hi everyone, This may sound like a stupid question but I need to be sure of this : Given a dataframe composed by « n » features : f1, f2, …, fn For each row of my dataframe, I create a labeled point : val row_i = LabeledPoint(label, Vectors.dense(v1_i,v2_i,…, vn_i) ) where v1_i,v2_i,…, vn_i are respectively the values of the features f1, f2, …, fn of the i th row. Then, I fit a pipeline composed by a standardScaler and a logisticRegression model. When I get back my LogisticRegressionModel and StandardScalerModel from the pipeline, I’m calling the getters : LogisticRegressionModel.coefficients, StandardScalerModel.mean and StandardScalerModel.std This gives me 3 vectors of length « n » My question is the following : Am I assured that the element of index « j » of each vectors correspond to the feature « j » ? Is the "*order*" of the feature kept ? e.g : Is StandardScalerModel.mean(j) the mean of the feature « j » of my data frame ? Thanks for your time. Regards, J. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Is-the-order-of-the-coefficients-in-a-LogisticRegresionModel-kept-tp26137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Is it relevant to use BinaryClassificationMetrics.aucROC / aucPR with LogisticRegressionModel ?
Hi filthysocks, Thanks for the answer. Indeed, using the clearThreshold() function solved my problem :). Regards, Jean. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-relevant-to-use-BinaryClassificationMetrics-aucROC-aucPR-with-LogisticRegressionModel-tp25465p25475.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Is it relevant to use BinaryClassificationMetrics.aucROC / aucPR with LogisticRegressionModel ?
Hi guys, This may be a stupid question. But I m facing an issue here. I found the class BinaryClassificationMetrics and I wanted to compute the aucROC or aucPR of my model. The thing is that the predict method of a LogisticRegressionModel only returns the predicted class, and not the probability of belonging to the positive class. So I will get: val metrics = new BinaryClassificationMetrics(predictionAndLabels) val aucROC = metrics.areaUnderROC with predictionAndLabels as a RDD[(predictedClass,label)]. Here, because the predicted class will always be 0 or 1, there is no way to vary the threshold to get the aucROC, right Or am I totally wrong ? So, is it relevant to use BinaryClassificationMetrics.areUnderROC with MLlib's classification models which in many cases only return the predicted class and not the probability ? Nevertheless, an easy solution for LogisticRegression would be to create my own method who takes the weights' vector of the model as a parameter and computes a predictionAndLabels with the real belonging probabilities. But is this the only solution Thanks in advance. Regards, Jean. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-relevant-to-use-BinaryClassificationMetrics-aucROC-aucPR-with-LogisticRegressionModel-tp25465.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org