Since you use two steps (StringIndexer and OneHotEncoder) to encode
categories to Vector, I guess you want to decode the eventual vector into
their original categories.
Suppose you have a DataFrame with only one column named "name", there are
three categories: "b", "a", "c" (ranked by frequency). Y
or would it be common practice to just retain the original categories in
another df?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Dense-Vectors-outputs-in-feature-engineering-tp27331p27337.html
Sent from the Apache Spark User List mailing list archive at
Thanks Disha, that worked out well. Can you point me to an example of how to
decode my feature vectors in the dataframe, back into their categories?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Dense-Vectors-outputs-in-feature-engineering-tp27331p27336.
Hi Ian,
You can create a dense vector of you features as follows:
- String Index your features
- Invoke One Hot Encoding on them, which generates a sparse vector
- Now, in case you wish to merge these features, then use VectorAssembler
(optional)
- After transforming the dataframe to return spa