Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-11 Thread Liam Geron
That's a great tip actually, I was unaware about the MultiOutputClassifier
option. I'll give it a try!

Thanks,
Liam


On Wed, Apr 10, 2019 at 11:03 PM Joel Nothman 
wrote:

> I think it's a bit weird if we're returning sparse output from
> OneVsRestClassifier.predict if it wasn't fit on sparse Y.
>
> Actually, I would be in favour of deprecating multilabel support in
> OneVsRestClassifier, since it is performing "binary relevance method" for
> multilabel, not actually OvR. MultiOutputClassifier duplicates this
> functionality (more or less), outputs a dense array (indeed it doesn't
> support sparse Y and perhaps it should) and lives closer to functional
> alternatives to binary relevance, such as ClassifierChain.
>
> On Thu, 11 Apr 2019 at 05:32, Liam Geron  wrote:
>
>> Unfortunately I don't believe that you get that level of freedom, it's an
>> API call that automatically calls the model's predict method so I don't
>> think that I get to specify something like model.predict(X).toarray(). I
>> could be wrong however, I don't pretend to be an expert on Cloud ML by any
>> stretch.
>>
>> Thanks,
>> Liam
>>
>> On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka <
>> m...@sebastianraschka.com> wrote:
>>
>>> Hm, weird that their platform seems to be so picky about it. Have you
>>> tried to just make the output of the pipeline dense? I.e.,
>>>
>>> (model.predict(X)).toarray()
>>>
>>> Best,
>>> Sebastian
>>>
>>> > On Apr 10, 2019, at 1:10 PM, Liam Geron  wrote:
>>> >
>>> > Hi Sebastian,
>>> >
>>> > Thanks for the advice! The model actually works on it's own in python
>>> fine luckily, so I don't think that that is the issue exactly. I have tried
>>> rolling my own estimator to wrap the pipeline to have it call the
>>> predict_proba method to return a dense array, however I then came across
>>> the problem that I would have to have that custom estimator defined on the
>>> Cloud ML end, which I'm unsure how to do.
>>> >
>>> > Thanks,
>>> > Liam
>>> >
>>> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka <
>>> m...@sebastianraschka.com> wrote:
>>> > Hi Liam,
>>> >
>>> > not sure what your exact error message is, but it may also be that the
>>> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer
>>> returns sparse arrays. You could probably fix your issues by inserting a
>>> "DenseTransformer" into your pipelone (a simple class that just transforms
>>> an array from a sparse to a dense format). I've implemented sth like that
>>> that you can import or copy it from here:
>>> >
>>> >
>>> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
>>> >
>>> > The usage would then basically be
>>> >
>>> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense',
>>> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
>>> >
>>> > Best,
>>> > Sebastian
>>> >
>>> >
>>> >
>>> >
>>> > > On Apr 10, 2019, at 12:25 PM, Liam Geron  wrote:
>>> > >
>>> > > Hi all,
>>> > >
>>> > > I was hoping to get some guidance re: changing the result of the
>>> predict method of the OneVsRestClassifier to return a dense array rather
>>> than a sparse array, given that Google Cloud ML only accepts dense numpy
>>> arrays as a result of a given models predict method. Right now my model
>>> architecture looks like:
>>> > >
>>> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
>>> OneVsRestClassifier(XGBClassifier()))])
>>> > >
>>> > > Which returns a sparse array with the predict method. I saw the
>>> Stack Overflow post here:
>>> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
>>> > >
>>> > > which recommends overwriting the predict method with the
>>> predict_proba method, however I found that I can't serialize the model
>>> after doing so. I also have a stack overflow post here:
>>> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
>>> which details the specific pickling error.
>>> > >
>>> > > Is this a known issue? Is there an accepted way to convert this into
>>> a dense array?
>>> > >
>>> > > Thanks,
>>> > > Liam Geron
>>> > > ___
>>> > > scikit-learn mailing list
>>> > > scikit-learn@python.org
>>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>>> >
>>> > ___
>>> > scikit-learn mailing list
>>> > scikit-learn@python.org
>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>> > ___
>>> > scikit-learn mailing list
>>> > scikit-learn@python.org
>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> ___
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> ___
>> scikit-learn mailing list
>> 

Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Joel Nothman
I think it's a bit weird if we're returning sparse output from
OneVsRestClassifier.predict if it wasn't fit on sparse Y.

Actually, I would be in favour of deprecating multilabel support in
OneVsRestClassifier, since it is performing "binary relevance method" for
multilabel, not actually OvR. MultiOutputClassifier duplicates this
functionality (more or less), outputs a dense array (indeed it doesn't
support sparse Y and perhaps it should) and lives closer to functional
alternatives to binary relevance, such as ClassifierChain.

On Thu, 11 Apr 2019 at 05:32, Liam Geron  wrote:

> Unfortunately I don't believe that you get that level of freedom, it's an
> API call that automatically calls the model's predict method so I don't
> think that I get to specify something like model.predict(X).toarray(). I
> could be wrong however, I don't pretend to be an expert on Cloud ML by any
> stretch.
>
> Thanks,
> Liam
>
> On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka <
> m...@sebastianraschka.com> wrote:
>
>> Hm, weird that their platform seems to be so picky about it. Have you
>> tried to just make the output of the pipeline dense? I.e.,
>>
>> (model.predict(X)).toarray()
>>
>> Best,
>> Sebastian
>>
>> > On Apr 10, 2019, at 1:10 PM, Liam Geron  wrote:
>> >
>> > Hi Sebastian,
>> >
>> > Thanks for the advice! The model actually works on it's own in python
>> fine luckily, so I don't think that that is the issue exactly. I have tried
>> rolling my own estimator to wrap the pipeline to have it call the
>> predict_proba method to return a dense array, however I then came across
>> the problem that I would have to have that custom estimator defined on the
>> Cloud ML end, which I'm unsure how to do.
>> >
>> > Thanks,
>> > Liam
>> >
>> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka <
>> m...@sebastianraschka.com> wrote:
>> > Hi Liam,
>> >
>> > not sure what your exact error message is, but it may also be that the
>> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer
>> returns sparse arrays. You could probably fix your issues by inserting a
>> "DenseTransformer" into your pipelone (a simple class that just transforms
>> an array from a sparse to a dense format). I've implemented sth like that
>> that you can import or copy it from here:
>> >
>> >
>> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
>> >
>> > The usage would then basically be
>> >
>> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense',
>> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
>> >
>> > Best,
>> > Sebastian
>> >
>> >
>> >
>> >
>> > > On Apr 10, 2019, at 12:25 PM, Liam Geron  wrote:
>> > >
>> > > Hi all,
>> > >
>> > > I was hoping to get some guidance re: changing the result of the
>> predict method of the OneVsRestClassifier to return a dense array rather
>> than a sparse array, given that Google Cloud ML only accepts dense numpy
>> arrays as a result of a given models predict method. Right now my model
>> architecture looks like:
>> > >
>> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
>> OneVsRestClassifier(XGBClassifier()))])
>> > >
>> > > Which returns a sparse array with the predict method. I saw the Stack
>> Overflow post here:
>> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
>> > >
>> > > which recommends overwriting the predict method with the
>> predict_proba method, however I found that I can't serialize the model
>> after doing so. I also have a stack overflow post here:
>> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
>> which details the specific pickling error.
>> > >
>> > > Is this a known issue? Is there an accepted way to convert this into
>> a dense array?
>> > >
>> > > Thanks,
>> > > Liam Geron
>> > > ___
>> > > scikit-learn mailing list
>> > > scikit-learn@python.org
>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>> >
>> > ___
>> > scikit-learn mailing list
>> > scikit-learn@python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>> > ___
>> > scikit-learn mailing list
>> > scikit-learn@python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Liam Geron
Unfortunately I don't believe that you get that level of freedom, it's an
API call that automatically calls the model's predict method so I don't
think that I get to specify something like model.predict(X).toarray(). I
could be wrong however, I don't pretend to be an expert on Cloud ML by any
stretch.

Thanks,
Liam

On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka 
wrote:

> Hm, weird that their platform seems to be so picky about it. Have you
> tried to just make the output of the pipeline dense? I.e.,
>
> (model.predict(X)).toarray()
>
> Best,
> Sebastian
>
> > On Apr 10, 2019, at 1:10 PM, Liam Geron  wrote:
> >
> > Hi Sebastian,
> >
> > Thanks for the advice! The model actually works on it's own in python
> fine luckily, so I don't think that that is the issue exactly. I have tried
> rolling my own estimator to wrap the pipeline to have it call the
> predict_proba method to return a dense array, however I then came across
> the problem that I would have to have that custom estimator defined on the
> Cloud ML end, which I'm unsure how to do.
> >
> > Thanks,
> > Liam
> >
> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka <
> m...@sebastianraschka.com> wrote:
> > Hi Liam,
> >
> > not sure what your exact error message is, but it may also be that the
> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer
> returns sparse arrays. You could probably fix your issues by inserting a
> "DenseTransformer" into your pipelone (a simple class that just transforms
> an array from a sparse to a dense format). I've implemented sth like that
> that you can import or copy it from here:
> >
> >
> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
> >
> > The usage would then basically be
> >
> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense',
> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
> >
> > Best,
> > Sebastian
> >
> >
> >
> >
> > > On Apr 10, 2019, at 12:25 PM, Liam Geron  wrote:
> > >
> > > Hi all,
> > >
> > > I was hoping to get some guidance re: changing the result of the
> predict method of the OneVsRestClassifier to return a dense array rather
> than a sparse array, given that Google Cloud ML only accepts dense numpy
> arrays as a result of a given models predict method. Right now my model
> architecture looks like:
> > >
> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
> OneVsRestClassifier(XGBClassifier()))])
> > >
> > > Which returns a sparse array with the predict method. I saw the Stack
> Overflow post here:
> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
> > >
> > > which recommends overwriting the predict method with the predict_proba
> method, however I found that I can't serialize the model after doing so. I
> also have a stack overflow post here:
> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
> which details the specific pickling error.
> > >
> > > Is this a known issue? Is there an accepted way to convert this into a
> dense array?
> > >
> > > Thanks,
> > > Liam Geron
> > > ___
> > > scikit-learn mailing list
> > > scikit-learn@python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Sebastian Raschka
Hm, weird that their platform seems to be so picky about it. Have you tried to 
just make the output of the pipeline dense? I.e., 

(model.predict(X)).toarray()

Best,
Sebastian

> On Apr 10, 2019, at 1:10 PM, Liam Geron  wrote:
> 
> Hi Sebastian,
> 
> Thanks for the advice! The model actually works on it's own in python fine 
> luckily, so I don't think that that is the issue exactly. I have tried 
> rolling my own estimator to wrap the pipeline to have it call the 
> predict_proba method to return a dense array, however I then came across the 
> problem that I would have to have that custom estimator defined on the Cloud 
> ML end, which I'm unsure how to do.
> 
> Thanks,
> Liam
> 
> On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka  
> wrote:
> Hi Liam,
> 
> not sure what your exact error message is, but it may also be that the 
> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer returns 
> sparse arrays. You could probably fix your issues by inserting a 
> "DenseTransformer" into your pipelone (a simple class that just transforms an 
> array from a sparse to a dense format). I've implemented sth like that that 
> you can import or copy it from here:
> 
> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
> 
> The usage would then basically be
> 
> model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', 
> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
> 
> Best,
> Sebastian
> 
> 
> 
> 
> > On Apr 10, 2019, at 12:25 PM, Liam Geron  wrote:
> > 
> > Hi all,
> > 
> > I was hoping to get some guidance re: changing the result of the predict 
> > method of the OneVsRestClassifier to return a dense array rather than a 
> > sparse array, given that Google Cloud ML only accepts dense numpy arrays as 
> > a result of a given models predict method. Right now my model architecture 
> > looks like:
> > 
> > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', 
> > OneVsRestClassifier(XGBClassifier()))])
> > 
> > Which returns a sparse array with the predict method. I saw the Stack 
> > Overflow post here: 
> > https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
> > 
> > which recommends overwriting the predict method with the predict_proba 
> > method, however I found that I can't serialize the model after doing so. I 
> > also have a stack overflow post here: 
> > https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
> >  which details the specific pickling error.
> > 
> > Is this a known issue? Is there an accepted way to convert this into a 
> > dense array?
> > 
> > Thanks,
> > Liam Geron
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Liam Geron
Hi Sebastian,

Thanks for the advice! The model actually works on it's own in python fine
luckily, so I don't think that that is the issue exactly. I have tried
rolling my own estimator to wrap the pipeline to have it call the
predict_proba method to return a dense array, however I then came across
the problem that I would have to have that custom estimator defined on the
Cloud ML end, which I'm unsure how to do.

Thanks,
Liam

On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka 
wrote:

> Hi Liam,
>
> not sure what your exact error message is, but it may also be that the
> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer
> returns sparse arrays. You could probably fix your issues by inserting a
> "DenseTransformer" into your pipelone (a simple class that just transforms
> an array from a sparse to a dense format). I've implemented sth like that
> that you can import or copy it from here:
>
>
> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
>
> The usage would then basically be
>
> model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense',
> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
>
> Best,
> Sebastian
>
>
>
>
> > On Apr 10, 2019, at 12:25 PM, Liam Geron  wrote:
> >
> > Hi all,
> >
> > I was hoping to get some guidance re: changing the result of the predict
> method of the OneVsRestClassifier to return a dense array rather than a
> sparse array, given that Google Cloud ML only accepts dense numpy arrays as
> a result of a given models predict method. Right now my model architecture
> looks like:
> >
> > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
> OneVsRestClassifier(XGBClassifier()))])
> >
> > Which returns a sparse array with the predict method. I saw the Stack
> Overflow post here:
> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
> >
> > which recommends overwriting the predict method with the predict_proba
> method, however I found that I can't serialize the model after doing so. I
> also have a stack overflow post here:
> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
> which details the specific pickling error.
> >
> > Is this a known issue? Is there an accepted way to convert this into a
> dense array?
> >
> > Thanks,
> > Liam Geron
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Sebastian Raschka
Hi Liam,

not sure what your exact error message is, but it may also be that the 
XGBClassifier only accepts dense arrays? I think the TfidfVectorizer returns 
sparse arrays. You could probably fix your issues by inserting a 
"DenseTransformer" into your pipelone (a simple class that just transforms an 
array from a sparse to a dense format). I've implemented sth like that that you 
can import or copy it from here:

https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py

The usage would then basically be

model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', 
DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])

Best,
Sebastian




> On Apr 10, 2019, at 12:25 PM, Liam Geron  wrote:
> 
> Hi all,
> 
> I was hoping to get some guidance re: changing the result of the predict 
> method of the OneVsRestClassifier to return a dense array rather than a 
> sparse array, given that Google Cloud ML only accepts dense numpy arrays as a 
> result of a given models predict method. Right now my model architecture 
> looks like:
> 
> model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', 
> OneVsRestClassifier(XGBClassifier()))])
> 
> Which returns a sparse array with the predict method. I saw the Stack 
> Overflow post here: 
> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
> 
> which recommends overwriting the predict method with the predict_proba 
> method, however I found that I can't serialize the model after doing so. I 
> also have a stack overflow post here: 
> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
>  which details the specific pickling error.
> 
> Is this a known issue? Is there an accepted way to convert this into a dense 
> array?
> 
> Thanks,
> Liam Geron
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn