Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-11 Thread Liam Geron
That's a great tip actually, I was unaware about the MultiOutputClassifier
option. I'll give it a try!

Thanks,
Liam


On Wed, Apr 10, 2019 at 11:03 PM Joel Nothman 
wrote:

> I think it's a bit weird if we're returning sparse output from
> OneVsRestClassifier.predict if it wasn't fit on sparse Y.
>
> Actually, I would be in favour of deprecating multilabel support in
> OneVsRestClassifier, since it is performing "binary relevance method" for
> multilabel, not actually OvR. MultiOutputClassifier duplicates this
> functionality (more or less), outputs a dense array (indeed it doesn't
> support sparse Y and perhaps it should) and lives closer to functional
> alternatives to binary relevance, such as ClassifierChain.
>
> On Thu, 11 Apr 2019 at 05:32, Liam Geron  wrote:
>
>> Unfortunately I don't believe that you get that level of freedom, it's an
>> API call that automatically calls the model's predict method so I don't
>> think that I get to specify something like model.predict(X).toarray(). I
>> could be wrong however, I don't pretend to be an expert on Cloud ML by any
>> stretch.
>>
>> Thanks,
>> Liam
>>
>> On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka <
>> m...@sebastianraschka.com> wrote:
>>
>>> Hm, weird that their platform seems to be so picky about it. Have you
>>> tried to just make the output of the pipeline dense? I.e.,
>>>
>>> (model.predict(X)).toarray()
>>>
>>> Best,
>>> Sebastian
>>>
>>> > On Apr 10, 2019, at 1:10 PM, Liam Geron  wrote:
>>> >
>>> > Hi Sebastian,
>>> >
>>> > Thanks for the advice! The model actually works on it's own in python
>>> fine luckily, so I don't think that that is the issue exactly. I have tried
>>> rolling my own estimator to wrap the pipeline to have it call the
>>> predict_proba method to return a dense array, however I then came across
>>> the problem that I would have to have that custom estimator defined on the
>>> Cloud ML end, which I'm unsure how to do.
>>> >
>>> > Thanks,
>>> > Liam
>>> >
>>> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka <
>>> m...@sebastianraschka.com> wrote:
>>> > Hi Liam,
>>> >
>>> > not sure what your exact error message is, but it may also be that the
>>> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer
>>> returns sparse arrays. You could probably fix your issues by inserting a
>>> "DenseTransformer" into your pipelone (a simple class that just transforms
>>> an array from a sparse to a dense format). I've implemented sth like that
>>> that you can import or copy it from here:
>>> >
>>> >
>>> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
>>> >
>>> > The usage would then basically be
>>> >
>>> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense',
>>> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
>>> >
>>> > Best,
>>> > Sebastian
>>> >
>>> >
>>> >
>>> >
>>> > > On Apr 10, 2019, at 12:25 PM, Liam Geron  wrote:
>>> > >
>>> > > Hi all,
>>> > >
>>> > > I was hoping to get some guidance re: changing the result of the
>>> predict method of the OneVsRestClassifier to return a dense array rather
>>> than a sparse array, given that Google Cloud ML only accepts dense numpy
>>> arrays as a result of a given models predict method. Right now my model
>>> architecture looks like:
>>> > >
>>> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
>>> OneVsRestClassifier(XGBClassifier()))])
>>> > >
>>> > > Which returns a sparse array with the predict method. I saw the
>>> Stack Overflow post here:
>>> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
>>> > >
>>> > > which recommends overwriting the predict method with the
>>> predict_proba method, however I found that I can't serialize the model
>>> after doing so. I also have a stack overflow post here:
>>> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
>>> which details the specific pickling error.
>>> > >
>>> > > Is this a

Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Liam Geron
Unfortunately I don't believe that you get that level of freedom, it's an
API call that automatically calls the model's predict method so I don't
think that I get to specify something like model.predict(X).toarray(). I
could be wrong however, I don't pretend to be an expert on Cloud ML by any
stretch.

Thanks,
Liam

On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka 
wrote:

> Hm, weird that their platform seems to be so picky about it. Have you
> tried to just make the output of the pipeline dense? I.e.,
>
> (model.predict(X)).toarray()
>
> Best,
> Sebastian
>
> > On Apr 10, 2019, at 1:10 PM, Liam Geron  wrote:
> >
> > Hi Sebastian,
> >
> > Thanks for the advice! The model actually works on it's own in python
> fine luckily, so I don't think that that is the issue exactly. I have tried
> rolling my own estimator to wrap the pipeline to have it call the
> predict_proba method to return a dense array, however I then came across
> the problem that I would have to have that custom estimator defined on the
> Cloud ML end, which I'm unsure how to do.
> >
> > Thanks,
> > Liam
> >
> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka <
> m...@sebastianraschka.com> wrote:
> > Hi Liam,
> >
> > not sure what your exact error message is, but it may also be that the
> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer
> returns sparse arrays. You could probably fix your issues by inserting a
> "DenseTransformer" into your pipelone (a simple class that just transforms
> an array from a sparse to a dense format). I've implemented sth like that
> that you can import or copy it from here:
> >
> >
> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
> >
> > The usage would then basically be
> >
> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense',
> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
> >
> > Best,
> > Sebastian
> >
> >
> >
> >
> > > On Apr 10, 2019, at 12:25 PM, Liam Geron  wrote:
> > >
> > > Hi all,
> > >
> > > I was hoping to get some guidance re: changing the result of the
> predict method of the OneVsRestClassifier to return a dense array rather
> than a sparse array, given that Google Cloud ML only accepts dense numpy
> arrays as a result of a given models predict method. Right now my model
> architecture looks like:
> > >
> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
> OneVsRestClassifier(XGBClassifier()))])
> > >
> > > Which returns a sparse array with the predict method. I saw the Stack
> Overflow post here:
> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
> > >
> > > which recommends overwriting the predict method with the predict_proba
> method, however I found that I can't serialize the model after doing so. I
> also have a stack overflow post here:
> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
> which details the specific pickling error.
> > >
> > > Is this a known issue? Is there an accepted way to convert this into a
> dense array?
> > >
> > > Thanks,
> > > Liam Geron
> > > ___
> > > scikit-learn mailing list
> > > scikit-learn@python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Liam Geron
Hi Sebastian,

Thanks for the advice! The model actually works on it's own in python fine
luckily, so I don't think that that is the issue exactly. I have tried
rolling my own estimator to wrap the pipeline to have it call the
predict_proba method to return a dense array, however I then came across
the problem that I would have to have that custom estimator defined on the
Cloud ML end, which I'm unsure how to do.

Thanks,
Liam

On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka 
wrote:

> Hi Liam,
>
> not sure what your exact error message is, but it may also be that the
> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer
> returns sparse arrays. You could probably fix your issues by inserting a
> "DenseTransformer" into your pipelone (a simple class that just transforms
> an array from a sparse to a dense format). I've implemented sth like that
> that you can import or copy it from here:
>
>
> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
>
> The usage would then basically be
>
> model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense',
> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
>
> Best,
> Sebastian
>
>
>
>
> > On Apr 10, 2019, at 12:25 PM, Liam Geron  wrote:
> >
> > Hi all,
> >
> > I was hoping to get some guidance re: changing the result of the predict
> method of the OneVsRestClassifier to return a dense array rather than a
> sparse array, given that Google Cloud ML only accepts dense numpy arrays as
> a result of a given models predict method. Right now my model architecture
> looks like:
> >
> > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
> OneVsRestClassifier(XGBClassifier()))])
> >
> > Which returns a sparse array with the predict method. I saw the Stack
> Overflow post here:
> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
> >
> > which recommends overwriting the predict method with the predict_proba
> method, however I found that I can't serialize the model after doing so. I
> also have a stack overflow post here:
> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
> which details the specific pickling error.
> >
> > Is this a known issue? Is there an accepted way to convert this into a
> dense array?
> >
> > Thanks,
> > Liam Geron
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Liam Geron
Hi all,

I was hoping to get some guidance re: changing the result of the predict
method of the OneVsRestClassifier to return a dense array rather than a
sparse array, given that Google Cloud ML only accepts dense numpy arrays as
a result of a given models predict method. Right now my model architecture
looks like:

model = Pipeline([('tfidf', TfidfVectorizer()), ('clf',
OneVsRestClassifier(XGBClassifier()))])

Which returns a sparse array with the predict method. I saw the Stack
Overflow post here:
https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba

which recommends overwriting the predict method with the predict_proba
method, however I found that I can't serialize the model after doing so. I
also have a stack overflow post here:
https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
which
details the specific pickling error.

Is this a known issue? Is there an accepted way to convert this into a
dense array?

Thanks,
Liam Geron
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Google Cloud ML Engine Error with Sklearn

2019-01-28 Thread Liam Geron
Hi scikit learn contributors,

I am currently attempting to transfer our preexisting models into cloud ML
for scalability, however I am encountering bugs while running through some
tutorial code found here (
https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/sklearn/notebooks/Online%20Prediction%20with%20scikit-learn.ipynb
).

On both my local machine in a virtual environment and on the cloud shell
I'm encountering errors when it comes to version creation and online
prediction. For version creation on my local machine and on the cloud shell
I'm encountering this error: *"ERROR: (gcloud.ml-engine.versions.create)
Bad model detected with error:  "Failed to load model: Could not load the
model: /tmp/model/0001/model.joblib. 32. (Error code: 0)""* with Python
3.6.4 (local) and Python 3.5.6 (cloud shell) when running the command:

*"gcloud ml-engine versions create $VERSION_NAME \*
*--model $MODEL_NAME \*
*--config config.yaml"*

This is running with joblib version "0.13.1" and sklearn version "0.19.1".

Any help would be greatly appreciated.

Thank you,
Liam Geron
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Google Cloud ML Error

2019-01-25 Thread Liam Geron
As in updated the sklearn module or the joblib module? I'm currently
running sklearn on 0.19.1 and joblib on 0.13.1. Do I need to be running
them on a specific version?

On Fri, Jan 25, 2019 at 2:35 PM Bill Ross  wrote:

> Have you updated the project since this:
>
> Since joblib is involved here as well, I'd look at that checkin. Joblib
> expects there to be a model, maybe it is just configure to look in the
> wrong place.
>
>
> On 1/25/19 10:54 AM, Liam Geron wrote:
>
> No such luck, the file doesn't seem to exist. Here's the output on my
> local:* "ls: /tmp/model/0001/model.joblib: No such file or directory"*
>
> and *"/tmp/model/0001/model.joblib: cannot open
> `/tmp/model/0001/model.joblib' (No such file or directory)"*
>
> and on the cloud shell: *"ls: cannot access
> '/tmp/model/0001/model.joblib': No such file or directory"*
>
> and *"/bin/sh: 1: file: not found".*
>
> On Fri, Jan 25, 2019 at 1:29 PM Bill Ross  wrote:
>
>> Dumb generic cross-check from supporting compchem code in the day: What
>> do these give? Might yield a clue, e.g. all model files seeing this got
>> corrupted somehow.
>>
>> $ file */tmp/model/0001/model.joblib*
>>
>> *$ ls -l **/tmp/model/0001/model.joblib*
>>
>>
>> On 1/25/19 9:26 AM, Liam Geron wrote:
>>
>> Hi scikit learn contributors,
>>
>> I am currently attempting to transfer our preexisting models into cloud
>> ML for scalability, however I am encountering bugs while running through
>> some tutorial code found here (
>> https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/sklearn/notebooks/Online%20Prediction%20with%20scikit-learn.ipynb
>> ).
>>
>> On both my local machine in a virtual environment and on the cloud shell
>> I'm encountering errors when it comes to version creation and online
>> prediction. For version creation on my local machine and on the cloud shell
>> I'm encountering this error: *"ERROR: (gcloud.ml-engine.versions.create)
>> Bad model detected with error:  "Failed to load model: Could not load the
>> model: /tmp/model/0001/model.joblib. 32. (Error code: 0)""* with Python
>> 3.6.4 (local) and Python 3.5.6 (cloud shell) when running the command:
>>
>> *"gcloud ml-engine versions create $VERSION_NAME \*
>> *--model $MODEL_NAME \*
>> *--config config.yaml"*
>>
>> Any help would be greatly appreciated.
>>
>> Thank you,
>> Liam Geron
>>
>> ___
>> scikit-learn mailing 
>> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
> ___
> scikit-learn mailing 
> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Google Cloud ML Error

2019-01-25 Thread Liam Geron
No such luck, the file doesn't seem to exist. Here's the output on my local:*
"ls: /tmp/model/0001/model.joblib: No such file or directory"*

and *"/tmp/model/0001/model.joblib: cannot open
`/tmp/model/0001/model.joblib' (No such file or directory)"*

and on the cloud shell: *"ls: cannot access '/tmp/model/0001/model.joblib':
No such file or directory"*

and *"/bin/sh: 1: file: not found".*

On Fri, Jan 25, 2019 at 1:29 PM Bill Ross  wrote:

> Dumb generic cross-check from supporting compchem code in the day: What do
> these give? Might yield a clue, e.g. all model files seeing this got
> corrupted somehow.
>
> $ file */tmp/model/0001/model.joblib*
>
> *$ ls -l **/tmp/model/0001/model.joblib*
>
>
> On 1/25/19 9:26 AM, Liam Geron wrote:
>
> Hi scikit learn contributors,
>
> I am currently attempting to transfer our preexisting models into cloud ML
> for scalability, however I am encountering bugs while running through some
> tutorial code found here (
> https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/sklearn/notebooks/Online%20Prediction%20with%20scikit-learn.ipynb
> ).
>
> On both my local machine in a virtual environment and on the cloud shell
> I'm encountering errors when it comes to version creation and online
> prediction. For version creation on my local machine and on the cloud shell
> I'm encountering this error: *"ERROR: (gcloud.ml-engine.versions.create)
> Bad model detected with error:  "Failed to load model: Could not load the
> model: /tmp/model/0001/model.joblib. 32. (Error code: 0)""* with Python
> 3.6.4 (local) and Python 3.5.6 (cloud shell) when running the command:
>
> *"gcloud ml-engine versions create $VERSION_NAME \*
> *--model $MODEL_NAME \*
> *--config config.yaml"*
>
> Any help would be greatly appreciated.
>
> Thank you,
> Liam Geron
>
> ___
> scikit-learn mailing 
> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Google Cloud ML Error

2019-01-25 Thread Liam Geron
Hi scikit learn contributors,

I am currently attempting to transfer our preexisting models into cloud ML
for scalability, however I am encountering bugs while running through some
tutorial code found here (
https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/sklearn/notebooks/Online%20Prediction%20with%20scikit-learn.ipynb
).

On both my local machine in a virtual environment and on the cloud shell
I'm encountering errors when it comes to version creation and online
prediction. For version creation on my local machine and on the cloud shell
I'm encountering this error: *"ERROR: (gcloud.ml-engine.versions.create)
Bad model detected with error:  "Failed to load model: Could not load the
model: /tmp/model/0001/model.joblib. 32. (Error code: 0)""* with Python
3.6.4 (local) and Python 3.5.6 (cloud shell) when running the command:

*"gcloud ml-engine versions create $VERSION_NAME \*
*--model $MODEL_NAME \*
*--config config.yaml"*

Any help would be greatly appreciated.

Thank you,
Liam Geron
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn