Xusen Yin created SPARK-13641:
---------------------------------

             Summary: getModelFeatures of ml.api.r.SparkRWrapper cannot 
(always) reveal the original column names
                 Key: SPARK-13641
                 URL: https://issues.apache.org/jira/browse/SPARK-13641
             Project: Spark
          Issue Type: Bug
          Components: ML, SparkR
            Reporter: Xusen Yin
            Priority: Minor


getModelFeatures of ml.api.r.SparkRWrapper cannot (always) reveal the original 
column names. Let's take the HouseVotes84 data set as an example:

{code}
case m: XXXModel =>
  val attrs = AttributeGroup.fromStructField(
    m.summary.predictions.schema(m.summary.featuresCol))
  attrs.attributes.get.map(_.name.get)
{code}

The code above gets features' names from the features column. Usually, the 
features column is generated by RFormula. The latter has a VectorAssembler in 
it, which leads the output attributes not equal with the original ones.

E.g., we want to learn the HouseVotes84's features' name "V1, V2, ..., V16". 
But with RFormula, we can only get "V1_n, V2_y, ..., V16_y" because [the 
transform function of 
VectorAssembler|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala#L75]
 adds salts of the column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to