GitHub user rekhajoshm opened a pull request:

    https://github.com/apache/spark/pull/9440

    [Spark-11478] [ML] ML StringIndexer return inconsistent schema

    ```val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), 
(4, "a"), (5, "c")), 2)
    val df = sqlContext.createDataFrame(data).toDF("id", "label")
    val indexer = new StringIndexer()
      .setInputCol("label")
      .setOutputCol("labelIndex")
      .fit(df)
    val transformed = indexer.transform(df)
    
    println(transformed.schema.toString())
    println(indexer.transformSchema(df.schema))
    ```
    Verified that two print of transformed schema return different nullable
    StructType(StructField(id,IntegerType,false), 
StructField(label,StringType,true), StructField(labelIndex,DoubleType,true))
    
    StructType(StructField(id,IntegerType,false), 
StructField(label,StringType,true), StructField(labelIndex,DoubleType,false))

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rekhajoshm/spark SPARK-11478

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9440.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9440
    
----
commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi <[email protected]>
Date:   2015-05-05T23:10:08Z

    Merge pull request #1 from apache/master
    
    Pulling functionality from apache spark

commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi <[email protected]>
Date:   2015-05-08T21:49:09Z

    Merge pull request #2 from apache/master
    
    pull latest from apache spark

commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi <[email protected]>
Date:   2015-06-22T00:08:08Z

    Merge pull request #3 from apache/master
    
    Pulling functionality from apache spark

commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi <[email protected]>
Date:   2015-09-17T01:03:09Z

    Merge pull request #4 from apache/master
    
    Pulling functionality from apache spark

commit eae53fb16dccdb4eb072466cae2429083461e406
Author: Joshi <[email protected]>
Date:   2015-11-03T18:34:12Z

    fix for ML StringIndexer inconsistent schema

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to