[Scikit-learn-general] What is the right way to convert unseen categorical value into numeric?

ChungHung Liu Tue, 22 Oct 2013 21:48:00 -0700

I read following links 

  
http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
 
  
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html


It seems that I should use DictVectorizer, but 

  
http://www.mail-archive.com/[email protected]/msg07994.html

mentions RandomForestClassifier doesn't support sparse matrix with one-hot 
encoding values; and 

  np.unique(column, return_inverse=True)

is recommended. 

With this function, how should I encode value appropriately in the testing data 
set with unseen value?

Thanks 

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] What is the right way to convert unseen categorical value into numeric?

Reply via email to