Hi Vishnu, The string and indexer map is generated at model training step and used at model prediction step. It means that the string and indexer map will not changed when prediction. You will use the original trained model when you do prediction.
2015-11-29 4:33 GMT+08:00 Vishnu Viswanath <vishnu.viswanat...@gmail.com>: > Hi All, > > I have a general question on using StringIndexer. > StringIndexer gives an index to each label in the feature starting from 0 ( > 0 for least frequent word). > > Suppose I am building a model, and I use StringIndexer for transforming on > of my column. > e.g., suppose A was most frequent word followed by B and C. > > So the StringIndexer will generate > > A 0.0 > B 1.0 > C 2.0 > > After building the model, I am going to do some prediction using this model, > So I do the same transformation on my new data which I need to predict. And > suppose the new dataset has C as the most frequent word, followed by B and > A. So the StringIndexer will assign index as > > C 0.0 > B 1.0 > A 2.0 > > These indexes are different from what we used for modeling. So won’t this > give me a wrong prediction if I use StringIndexer? > > -- > Thanks and Regards, > Vishnu Viswanath, > www.vishnuviswanath.com --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org