This is very common but currently not that easy.
There is a fix here:
https://github.com/scikit-learn/scikit-learn/pull/6559
In the meantime, I think the easiest way is to use pandas' get_dummies
function.
On 03/19/2016 02:17 PM, Алексей Драль wrote:
Hi there,
I have a data set which contains string categorical variables (like
"category_A", "category_B"). I would like to generate dummy variables from
them, but I can't use OneHotEncoder as it expects matrix of integers. I
cannot use LabelEncoder neither, because I cannot provide columns to
process. I wrote a simple class to do so that
applies DictionaryVectorizer per column and stores fitted processors. This
use case looks so common, that I expect that sklearn should contain some
functionality to do so. Could you please assist me if I miss any
standard preprocessor to generate dummy variables from strings for
specified columns?
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general