Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
That's a great tip actually, I was unaware about the MultiOutputClassifier option. I'll give it a try! Thanks, Liam On Wed, Apr 10, 2019 at 11:03 PM Joel Nothman wrote: > I think it's a bit weird if we're returning sparse output from > OneVsRestClassifier.predict if it wasn't fit on sparse Y. > > Actually, I would be in favour of deprecating multilabel support in > OneVsRestClassifier, since it is performing "binary relevance method" for > multilabel, not actually OvR. MultiOutputClassifier duplicates this > functionality (more or less), outputs a dense array (indeed it doesn't > support sparse Y and perhaps it should) and lives closer to functional > alternatives to binary relevance, such as ClassifierChain. > > On Thu, 11 Apr 2019 at 05:32, Liam Geron wrote: > >> Unfortunately I don't believe that you get that level of freedom, it's an >> API call that automatically calls the model's predict method so I don't >> think that I get to specify something like model.predict(X).toarray(). I >> could be wrong however, I don't pretend to be an expert on Cloud ML by any >> stretch. >> >> Thanks, >> Liam >> >> On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka < >> m...@sebastianraschka.com> wrote: >> >>> Hm, weird that their platform seems to be so picky about it. Have you >>> tried to just make the output of the pipeline dense? I.e., >>> >>> (model.predict(X)).toarray() >>> >>> Best, >>> Sebastian >>> >>> > On Apr 10, 2019, at 1:10 PM, Liam Geron wrote: >>> > >>> > Hi Sebastian, >>> > >>> > Thanks for the advice! The model actually works on it's own in python >>> fine luckily, so I don't think that that is the issue exactly. I have tried >>> rolling my own estimator to wrap the pipeline to have it call the >>> predict_proba method to return a dense array, however I then came across >>> the problem that I would have to have that custom estimator defined on the >>> Cloud ML end, which I'm unsure how to do. >>> > >>> > Thanks, >>> > Liam >>> > >>> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka < >>> m...@sebastianraschka.com> wrote: >>> > Hi Liam, >>> > >>> > not sure what your exact error message is, but it may also be that the >>> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer >>> returns sparse arrays. You could probably fix your issues by inserting a >>> "DenseTransformer" into your pipelone (a simple class that just transforms >>> an array from a sparse to a dense format). I've implemented sth like that >>> that you can import or copy it from here: >>> > >>> > >>> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py >>> > >>> > The usage would then basically be >>> > >>> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', >>> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) >>> > >>> > Best, >>> > Sebastian >>> > >>> > >>> > >>> > >>> > > On Apr 10, 2019, at 12:25 PM, Liam Geron wrote: >>> > > >>> > > Hi all, >>> > > >>> > > I was hoping to get some guidance re: changing the result of the >>> predict method of the OneVsRestClassifier to return a dense array rather >>> than a sparse array, given that Google Cloud ML only accepts dense numpy >>> arrays as a result of a given models predict method. Right now my model >>> architecture looks like: >>> > > >>> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', >>> OneVsRestClassifier(XGBClassifier()))]) >>> > > >>> > > Which returns a sparse array with the predict method. I saw the >>> Stack Overflow post here: >>> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba >>> > > >>> > > which recommends overwriting the predict method with the >>> predict_proba method, however I found that I can't serialize the model >>> after doing so. I also have a stack overflow post here: >>> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a >>> which details the specific pickling error. >>> > > >>> > > Is this a
Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
Unfortunately I don't believe that you get that level of freedom, it's an API call that automatically calls the model's predict method so I don't think that I get to specify something like model.predict(X).toarray(). I could be wrong however, I don't pretend to be an expert on Cloud ML by any stretch. Thanks, Liam On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka wrote: > Hm, weird that their platform seems to be so picky about it. Have you > tried to just make the output of the pipeline dense? I.e., > > (model.predict(X)).toarray() > > Best, > Sebastian > > > On Apr 10, 2019, at 1:10 PM, Liam Geron wrote: > > > > Hi Sebastian, > > > > Thanks for the advice! The model actually works on it's own in python > fine luckily, so I don't think that that is the issue exactly. I have tried > rolling my own estimator to wrap the pipeline to have it call the > predict_proba method to return a dense array, however I then came across > the problem that I would have to have that custom estimator defined on the > Cloud ML end, which I'm unsure how to do. > > > > Thanks, > > Liam > > > > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka < > m...@sebastianraschka.com> wrote: > > Hi Liam, > > > > not sure what your exact error message is, but it may also be that the > XGBClassifier only accepts dense arrays? I think the TfidfVectorizer > returns sparse arrays. You could probably fix your issues by inserting a > "DenseTransformer" into your pipelone (a simple class that just transforms > an array from a sparse to a dense format). I've implemented sth like that > that you can import or copy it from here: > > > > > https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py > > > > The usage would then basically be > > > > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', > DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) > > > > Best, > > Sebastian > > > > > > > > > > > On Apr 10, 2019, at 12:25 PM, Liam Geron wrote: > > > > > > Hi all, > > > > > > I was hoping to get some guidance re: changing the result of the > predict method of the OneVsRestClassifier to return a dense array rather > than a sparse array, given that Google Cloud ML only accepts dense numpy > arrays as a result of a given models predict method. Right now my model > architecture looks like: > > > > > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', > OneVsRestClassifier(XGBClassifier()))]) > > > > > > Which returns a sparse array with the predict method. I saw the Stack > Overflow post here: > https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba > > > > > > which recommends overwriting the predict method with the predict_proba > method, however I found that I can't serialize the model after doing so. I > also have a stack overflow post here: > https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a > which details the specific pickling error. > > > > > > Is this a known issue? Is there an accepted way to convert this into a > dense array? > > > > > > Thanks, > > > Liam Geron > > > ___ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ___ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
Hi Sebastian, Thanks for the advice! The model actually works on it's own in python fine luckily, so I don't think that that is the issue exactly. I have tried rolling my own estimator to wrap the pipeline to have it call the predict_proba method to return a dense array, however I then came across the problem that I would have to have that custom estimator defined on the Cloud ML end, which I'm unsure how to do. Thanks, Liam On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka wrote: > Hi Liam, > > not sure what your exact error message is, but it may also be that the > XGBClassifier only accepts dense arrays? I think the TfidfVectorizer > returns sparse arrays. You could probably fix your issues by inserting a > "DenseTransformer" into your pipelone (a simple class that just transforms > an array from a sparse to a dense format). I've implemented sth like that > that you can import or copy it from here: > > > https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py > > The usage would then basically be > > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', > DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) > > Best, > Sebastian > > > > > > On Apr 10, 2019, at 12:25 PM, Liam Geron wrote: > > > > Hi all, > > > > I was hoping to get some guidance re: changing the result of the predict > method of the OneVsRestClassifier to return a dense array rather than a > sparse array, given that Google Cloud ML only accepts dense numpy arrays as > a result of a given models predict method. Right now my model architecture > looks like: > > > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', > OneVsRestClassifier(XGBClassifier()))]) > > > > Which returns a sparse array with the predict method. I saw the Stack > Overflow post here: > https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba > > > > which recommends overwriting the predict method with the predict_proba > method, however I found that I can't serialize the model after doing so. I > also have a stack overflow post here: > https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a > which details the specific pickling error. > > > > Is this a known issue? Is there an accepted way to convert this into a > dense array? > > > > Thanks, > > Liam Geron > > ___ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
Hi all, I was hoping to get some guidance re: changing the result of the predict method of the OneVsRestClassifier to return a dense array rather than a sparse array, given that Google Cloud ML only accepts dense numpy arrays as a result of a given models predict method. Right now my model architecture looks like: model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) Which returns a sparse array with the predict method. I saw the Stack Overflow post here: https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba which recommends overwriting the predict method with the predict_proba method, however I found that I can't serialize the model after doing so. I also have a stack overflow post here: https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a which details the specific pickling error. Is this a known issue? Is there an accepted way to convert this into a dense array? Thanks, Liam Geron ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Google Cloud ML Engine Error with Sklearn
Hi scikit learn contributors, I am currently attempting to transfer our preexisting models into cloud ML for scalability, however I am encountering bugs while running through some tutorial code found here ( https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/sklearn/notebooks/Online%20Prediction%20with%20scikit-learn.ipynb ). On both my local machine in a virtual environment and on the cloud shell I'm encountering errors when it comes to version creation and online prediction. For version creation on my local machine and on the cloud shell I'm encountering this error: *"ERROR: (gcloud.ml-engine.versions.create) Bad model detected with error: "Failed to load model: Could not load the model: /tmp/model/0001/model.joblib. 32. (Error code: 0)""* with Python 3.6.4 (local) and Python 3.5.6 (cloud shell) when running the command: *"gcloud ml-engine versions create $VERSION_NAME \* *--model $MODEL_NAME \* *--config config.yaml"* This is running with joblib version "0.13.1" and sklearn version "0.19.1". Any help would be greatly appreciated. Thank you, Liam Geron ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Google Cloud ML Error
As in updated the sklearn module or the joblib module? I'm currently running sklearn on 0.19.1 and joblib on 0.13.1. Do I need to be running them on a specific version? On Fri, Jan 25, 2019 at 2:35 PM Bill Ross wrote: > Have you updated the project since this: > > Since joblib is involved here as well, I'd look at that checkin. Joblib > expects there to be a model, maybe it is just configure to look in the > wrong place. > > > On 1/25/19 10:54 AM, Liam Geron wrote: > > No such luck, the file doesn't seem to exist. Here's the output on my > local:* "ls: /tmp/model/0001/model.joblib: No such file or directory"* > > and *"/tmp/model/0001/model.joblib: cannot open > `/tmp/model/0001/model.joblib' (No such file or directory)"* > > and on the cloud shell: *"ls: cannot access > '/tmp/model/0001/model.joblib': No such file or directory"* > > and *"/bin/sh: 1: file: not found".* > > On Fri, Jan 25, 2019 at 1:29 PM Bill Ross wrote: > >> Dumb generic cross-check from supporting compchem code in the day: What >> do these give? Might yield a clue, e.g. all model files seeing this got >> corrupted somehow. >> >> $ file */tmp/model/0001/model.joblib* >> >> *$ ls -l **/tmp/model/0001/model.joblib* >> >> >> On 1/25/19 9:26 AM, Liam Geron wrote: >> >> Hi scikit learn contributors, >> >> I am currently attempting to transfer our preexisting models into cloud >> ML for scalability, however I am encountering bugs while running through >> some tutorial code found here ( >> https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/sklearn/notebooks/Online%20Prediction%20with%20scikit-learn.ipynb >> ). >> >> On both my local machine in a virtual environment and on the cloud shell >> I'm encountering errors when it comes to version creation and online >> prediction. For version creation on my local machine and on the cloud shell >> I'm encountering this error: *"ERROR: (gcloud.ml-engine.versions.create) >> Bad model detected with error: "Failed to load model: Could not load the >> model: /tmp/model/0001/model.joblib. 32. (Error code: 0)""* with Python >> 3.6.4 (local) and Python 3.5.6 (cloud shell) when running the command: >> >> *"gcloud ml-engine versions create $VERSION_NAME \* >> *--model $MODEL_NAME \* >> *--config config.yaml"* >> >> Any help would be greatly appreciated. >> >> Thank you, >> Liam Geron >> >> ___ >> scikit-learn mailing >> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > ___ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Google Cloud ML Error
No such luck, the file doesn't seem to exist. Here's the output on my local:* "ls: /tmp/model/0001/model.joblib: No such file or directory"* and *"/tmp/model/0001/model.joblib: cannot open `/tmp/model/0001/model.joblib' (No such file or directory)"* and on the cloud shell: *"ls: cannot access '/tmp/model/0001/model.joblib': No such file or directory"* and *"/bin/sh: 1: file: not found".* On Fri, Jan 25, 2019 at 1:29 PM Bill Ross wrote: > Dumb generic cross-check from supporting compchem code in the day: What do > these give? Might yield a clue, e.g. all model files seeing this got > corrupted somehow. > > $ file */tmp/model/0001/model.joblib* > > *$ ls -l **/tmp/model/0001/model.joblib* > > > On 1/25/19 9:26 AM, Liam Geron wrote: > > Hi scikit learn contributors, > > I am currently attempting to transfer our preexisting models into cloud ML > for scalability, however I am encountering bugs while running through some > tutorial code found here ( > https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/sklearn/notebooks/Online%20Prediction%20with%20scikit-learn.ipynb > ). > > On both my local machine in a virtual environment and on the cloud shell > I'm encountering errors when it comes to version creation and online > prediction. For version creation on my local machine and on the cloud shell > I'm encountering this error: *"ERROR: (gcloud.ml-engine.versions.create) > Bad model detected with error: "Failed to load model: Could not load the > model: /tmp/model/0001/model.joblib. 32. (Error code: 0)""* with Python > 3.6.4 (local) and Python 3.5.6 (cloud shell) when running the command: > > *"gcloud ml-engine versions create $VERSION_NAME \* > *--model $MODEL_NAME \* > *--config config.yaml"* > > Any help would be greatly appreciated. > > Thank you, > Liam Geron > > ___ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Google Cloud ML Error
Hi scikit learn contributors, I am currently attempting to transfer our preexisting models into cloud ML for scalability, however I am encountering bugs while running through some tutorial code found here ( https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/sklearn/notebooks/Online%20Prediction%20with%20scikit-learn.ipynb ). On both my local machine in a virtual environment and on the cloud shell I'm encountering errors when it comes to version creation and online prediction. For version creation on my local machine and on the cloud shell I'm encountering this error: *"ERROR: (gcloud.ml-engine.versions.create) Bad model detected with error: "Failed to load model: Could not load the model: /tmp/model/0001/model.joblib. 32. (Error code: 0)""* with Python 3.6.4 (local) and Python 3.5.6 (cloud shell) when running the command: *"gcloud ml-engine versions create $VERSION_NAME \* *--model $MODEL_NAME \* *--config config.yaml"* Any help would be greatly appreciated. Thank you, Liam Geron ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn