Xusen Yin created SPARK-13641: --------------------------------- Summary: getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the original column names Key: SPARK-13641 URL: https://issues.apache.org/jira/browse/SPARK-13641 Project: Spark Issue Type: Bug Components: ML, SparkR Reporter: Xusen Yin Priority: Minor
getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the original column names. Let's take the HouseVotes84 data set as an example: {code} case m: XXXModel => val attrs = AttributeGroup.fromStructField( m.summary.predictions.schema(m.summary.featuresCol)) attrs.attributes.get.map(_.name.get) {code} The code above gets features' names from the features column. Usually, the features column is generated by RFormula. The latter has a VectorAssembler in it, which leads the output attributes not equal with the original ones. E.g., we want to learn the HouseVotes84's features' name "V1, V2, ..., V16". But with RFormula, we can only get "V1_n, V2_y, ..., V16_y" because [the transform function of VectorAssembler|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala#L75] adds salts of the column names. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org