I read following links 

  
http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
 
  
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html
 

It seems that I should use DictVectorizer, but 

  
http://www.mail-archive.com/[email protected]/msg07994.html

mentions RandomForestClassifier doesn't support sparse matrix with one-hot 
encoding values; and 

  np.unique(column, return_inverse=True)

is recommended. 

With this function, how should I encode value appropriately in the testing data 
set with unseen value?

Thanks 

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to