I read following links http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html
It seems that I should use DictVectorizer, but http://www.mail-archive.com/[email protected]/msg07994.html mentions RandomForestClassifier doesn't support sparse matrix with one-hot encoding values; and np.unique(column, return_inverse=True) is recommended. With this function, how should I encode value appropriately in the testing data set with unseen value? Thanks ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
