Hi.
In general, please stay on the mailing list.
We could make the check_array in FunctionTransformer optional via a
parameter.
Cheers,
Andy
On 03/28/2016 01:34 PM, Алексей Драль wrote:
Hi Andreas,
Nice, I didn't know about make_pipeline before, thank you. I have
exactly the situation that you pointed out "categories are strings
that can frequently don't show up only in test split". I'll take this
approach in mind for the next time.
P.S. testing revealed usage of check_array in FunctionTransformer,
which can lead to problems when dtype objects are strings.
P.P.S. at first, I was wondering if it would be valuable to make a
pull request, but CategoricalEncoder should fix the problem.
2016-03-28 18:58 GMT+03:00 Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>>:
Untested code:
make_pipeline(FunctionTransformer(lambda X: pd.get_dummies(X)),
SomeClassifier())
giant caveat: that will only work if the categories are exactly
the same in all possible X that you pass.
Otherwise weird stuff will happen.
On 03/26/2016 07:21 AM, Алексей Драль wrote:
Hi Andreas,
Sadly enough, get_dummies is not applicable in pipelines. Thank
you for a link with a fix.
2016-03-25 18:57 GMT+03:00 Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>>:
This is very common but currently not that easy.
There is a fix here:
https://github.com/scikit-learn/scikit-learn/pull/6559
In the meantime, I think the easiest way is to use pandas'
get_dummies function.
On 03/19/2016 02:17 PM, Алексей Драль wrote:
Hi there,
I have a data set which contains string categorical
variables (like
"category_A", "category_B"). I would like to generate dummy
variables from
them, but I can't use OneHotEncoder as it expects matrix of
integers. I
cannot use LabelEncoder neither, because I cannot provide
columns to
process. I wrote a simple class to do so that
applies DictionaryVectorizer per column and stores fitted
processors. This
use case looks so common, that I expect that sklearn should
contain some
functionality to do so. Could you please assist me if I miss any
standard preprocessor to generate dummy variables from
strings for
specified columns?
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general