[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

jkbradley Mon, 07 Dec 2015 17:54:22 -0800

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10166#discussion_r46906261
  
    --- Diff: docs/ml-features.md ---
    @@ -951,9 +951,157 @@ indexed.show()
     </div>
     </div>
     
    +
    +## IndexToString
    +
    +Symmetrically to `StringIndexer`, `IndexToString` maps a column of label 
indices
    +back to a column containing the original labels as strings. The common use 
case
    +is to produce indices from labels with `StringIndexer`, train a model with 
those
    +indices and retrieve the original labels from the column of predicted 
indices
    +with `IndexToString`. However, you are free to supply your own labels.
    +
    +**Examples**
    +
    +Building on the `StringIndexer` example, let's assume we have the following
    +DataFrame with columns `id` and `categoryIndex`:
    +
    +~~~~
    + id | categoryIndex
    +----|---------------
    + 0  | 0.0
    + 1  | 2.0
    + 2  | 1.0
    + 3  | 0.0
    + 4  | 0.0
    + 5  | 1.0
    +~~~~
    +
    +Applying `IndexToString` with `categoryIndex` as the input column,
    +`originalCategory` as the output column and the previous `StringIndexer`'s
    +labels as labels, we are able to retrieve our original labels:
    +
    +~~~~
    + id | categoryIndex | originalCategory
    +----|---------------|-----------------
    + 0  | 0.0           | a
    + 1  | 2.0           | b
    + 2  | 1.0           | c
    + 3  | 0.0           | a
    + 4  | 0.0           | a
    + 5  | 1.0           | c
    +~~~~
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +
    +Refer to the [IndexToString Scala 
docs](api/scala/index.html#org.apache.spark.ml.feature.IndexToString)
    +for more details on the API.
    +
    +{% highlight scala %}
    +import org.apache.spark.ml.feature.{IndexToString, StringIndexer}
    +
    +val df = sqlContext.createDataFrame(Seq(
    +  (0, "a"),
    +  (1, "b"),
    +  (2, "c"),
    +  (3, "a"),
    +  (4, "a"),
    +  (5, "c")
    +)).toDF("id", "category")
    +
    +val indexer = new StringIndexer()
    +  .setInputCol("category")
    +  .setOutputCol("categoryIndex")
    +  .fit(df)
    +val indexed = indexer.transform(df)
    +
    +val converter = new IndexToString()
    +  .setInputCol("categoryIndex")
    +  .setOutputCol("originalCategory")
    +  .setLabels(indexer.labels)
    --- End diff --
    
    You probably don't need to specify labels; they should be pulled from 
column metadata.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Reply via email to