Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
That's a great tip actually, I was unaware about the MultiOutputClassifier option. I'll give it a try! Thanks, Liam On Wed, Apr 10, 2019 at 11:03 PM Joel Nothman wrote: > I think it's a bit weird if we're returning sparse output from > OneVsRestClassifier.predict if it wasn't fit on sparse Y. > > Actually, I would be in favour of deprecating multilabel support in > OneVsRestClassifier, since it is performing "binary relevance method" for > multilabel, not actually OvR. MultiOutputClassifier duplicates this > functionality (more or less), outputs a dense array (indeed it doesn't > support sparse Y and perhaps it should) and lives closer to functional > alternatives to binary relevance, such as ClassifierChain. > > On Thu, 11 Apr 2019 at 05:32, Liam Geron wrote: > >> Unfortunately I don't believe that you get that level of freedom, it's an >> API call that automatically calls the model's predict method so I don't >> think that I get to specify something like model.predict(X).toarray(). I >> could be wrong however, I don't pretend to be an expert on Cloud ML by any >> stretch. >> >> Thanks, >> Liam >> >> On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka < >> m...@sebastianraschka.com> wrote: >> >>> Hm, weird that their platform seems to be so picky about it. Have you >>> tried to just make the output of the pipeline dense? I.e., >>> >>> (model.predict(X)).toarray() >>> >>> Best, >>> Sebastian >>> >>> > On Apr 10, 2019, at 1:10 PM, Liam Geron wrote: >>> > >>> > Hi Sebastian, >>> > >>> > Thanks for the advice! The model actually works on it's own in python >>> fine luckily, so I don't think that that is the issue exactly. I have tried >>> rolling my own estimator to wrap the pipeline to have it call the >>> predict_proba method to return a dense array, however I then came across >>> the problem that I would have to have that custom estimator defined on the >>> Cloud ML end, which I'm unsure how to do. >>> > >>> > Thanks, >>> > Liam >>> > >>> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka < >>> m...@sebastianraschka.com> wrote: >>> > Hi Liam, >>> > >>> > not sure what your exact error message is, but it may also be that the >>> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer >>> returns sparse arrays. You could probably fix your issues by inserting a >>> "DenseTransformer" into your pipelone (a simple class that just transforms >>> an array from a sparse to a dense format). I've implemented sth like that >>> that you can import or copy it from here: >>> > >>> > >>> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py >>> > >>> > The usage would then basically be >>> > >>> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', >>> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) >>> > >>> > Best, >>> > Sebastian >>> > >>> > >>> > >>> > >>> > > On Apr 10, 2019, at 12:25 PM, Liam Geron wrote: >>> > > >>> > > Hi all, >>> > > >>> > > I was hoping to get some guidance re: changing the result of the >>> predict method of the OneVsRestClassifier to return a dense array rather >>> than a sparse array, given that Google Cloud ML only accepts dense numpy >>> arrays as a result of a given models predict method. Right now my model >>> architecture looks like: >>> > > >>> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', >>> OneVsRestClassifier(XGBClassifier()))]) >>> > > >>> > > Which returns a sparse array with the predict method. I saw the >>> Stack Overflow post here: >>> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba >>> > > >>> > > which recommends overwriting the predict method with the >>> predict_proba method, however I found that I can't serialize the model >>> after doing so. I also have a stack overflow post here: >>> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a >>> which details the specific pickling error. >>> > > >>> > > Is this a known issue? Is there an accepted way to convert this into >>> a dense array? >>> > > >>> > > Thanks, >>> > > Liam Geron >>> > > ___ >>> > > scikit-learn mailing list >>> > > scikit-learn@python.org >>> > > https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> > ___ >>> > scikit-learn mailing list >>> > scikit-learn@python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> > ___ >>> > scikit-learn mailing list >>> > scikit-learn@python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> ___ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> ___ >> scikit-learn mailing list >>
Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
I think it's a bit weird if we're returning sparse output from OneVsRestClassifier.predict if it wasn't fit on sparse Y. Actually, I would be in favour of deprecating multilabel support in OneVsRestClassifier, since it is performing "binary relevance method" for multilabel, not actually OvR. MultiOutputClassifier duplicates this functionality (more or less), outputs a dense array (indeed it doesn't support sparse Y and perhaps it should) and lives closer to functional alternatives to binary relevance, such as ClassifierChain. On Thu, 11 Apr 2019 at 05:32, Liam Geron wrote: > Unfortunately I don't believe that you get that level of freedom, it's an > API call that automatically calls the model's predict method so I don't > think that I get to specify something like model.predict(X).toarray(). I > could be wrong however, I don't pretend to be an expert on Cloud ML by any > stretch. > > Thanks, > Liam > > On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka < > m...@sebastianraschka.com> wrote: > >> Hm, weird that their platform seems to be so picky about it. Have you >> tried to just make the output of the pipeline dense? I.e., >> >> (model.predict(X)).toarray() >> >> Best, >> Sebastian >> >> > On Apr 10, 2019, at 1:10 PM, Liam Geron wrote: >> > >> > Hi Sebastian, >> > >> > Thanks for the advice! The model actually works on it's own in python >> fine luckily, so I don't think that that is the issue exactly. I have tried >> rolling my own estimator to wrap the pipeline to have it call the >> predict_proba method to return a dense array, however I then came across >> the problem that I would have to have that custom estimator defined on the >> Cloud ML end, which I'm unsure how to do. >> > >> > Thanks, >> > Liam >> > >> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka < >> m...@sebastianraschka.com> wrote: >> > Hi Liam, >> > >> > not sure what your exact error message is, but it may also be that the >> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer >> returns sparse arrays. You could probably fix your issues by inserting a >> "DenseTransformer" into your pipelone (a simple class that just transforms >> an array from a sparse to a dense format). I've implemented sth like that >> that you can import or copy it from here: >> > >> > >> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py >> > >> > The usage would then basically be >> > >> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', >> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) >> > >> > Best, >> > Sebastian >> > >> > >> > >> > >> > > On Apr 10, 2019, at 12:25 PM, Liam Geron wrote: >> > > >> > > Hi all, >> > > >> > > I was hoping to get some guidance re: changing the result of the >> predict method of the OneVsRestClassifier to return a dense array rather >> than a sparse array, given that Google Cloud ML only accepts dense numpy >> arrays as a result of a given models predict method. Right now my model >> architecture looks like: >> > > >> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', >> OneVsRestClassifier(XGBClassifier()))]) >> > > >> > > Which returns a sparse array with the predict method. I saw the Stack >> Overflow post here: >> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba >> > > >> > > which recommends overwriting the predict method with the >> predict_proba method, however I found that I can't serialize the model >> after doing so. I also have a stack overflow post here: >> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a >> which details the specific pickling error. >> > > >> > > Is this a known issue? Is there an accepted way to convert this into >> a dense array? >> > > >> > > Thanks, >> > > Liam Geron >> > > ___ >> > > scikit-learn mailing list >> > > scikit-learn@python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > ___ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > ___ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
Unfortunately I don't believe that you get that level of freedom, it's an API call that automatically calls the model's predict method so I don't think that I get to specify something like model.predict(X).toarray(). I could be wrong however, I don't pretend to be an expert on Cloud ML by any stretch. Thanks, Liam On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka wrote: > Hm, weird that their platform seems to be so picky about it. Have you > tried to just make the output of the pipeline dense? I.e., > > (model.predict(X)).toarray() > > Best, > Sebastian > > > On Apr 10, 2019, at 1:10 PM, Liam Geron wrote: > > > > Hi Sebastian, > > > > Thanks for the advice! The model actually works on it's own in python > fine luckily, so I don't think that that is the issue exactly. I have tried > rolling my own estimator to wrap the pipeline to have it call the > predict_proba method to return a dense array, however I then came across > the problem that I would have to have that custom estimator defined on the > Cloud ML end, which I'm unsure how to do. > > > > Thanks, > > Liam > > > > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka < > m...@sebastianraschka.com> wrote: > > Hi Liam, > > > > not sure what your exact error message is, but it may also be that the > XGBClassifier only accepts dense arrays? I think the TfidfVectorizer > returns sparse arrays. You could probably fix your issues by inserting a > "DenseTransformer" into your pipelone (a simple class that just transforms > an array from a sparse to a dense format). I've implemented sth like that > that you can import or copy it from here: > > > > > https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py > > > > The usage would then basically be > > > > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', > DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) > > > > Best, > > Sebastian > > > > > > > > > > > On Apr 10, 2019, at 12:25 PM, Liam Geron wrote: > > > > > > Hi all, > > > > > > I was hoping to get some guidance re: changing the result of the > predict method of the OneVsRestClassifier to return a dense array rather > than a sparse array, given that Google Cloud ML only accepts dense numpy > arrays as a result of a given models predict method. Right now my model > architecture looks like: > > > > > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', > OneVsRestClassifier(XGBClassifier()))]) > > > > > > Which returns a sparse array with the predict method. I saw the Stack > Overflow post here: > https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba > > > > > > which recommends overwriting the predict method with the predict_proba > method, however I found that I can't serialize the model after doing so. I > also have a stack overflow post here: > https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a > which details the specific pickling error. > > > > > > Is this a known issue? Is there an accepted way to convert this into a > dense array? > > > > > > Thanks, > > > Liam Geron > > > ___ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ___ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
Hm, weird that their platform seems to be so picky about it. Have you tried to just make the output of the pipeline dense? I.e., (model.predict(X)).toarray() Best, Sebastian > On Apr 10, 2019, at 1:10 PM, Liam Geron wrote: > > Hi Sebastian, > > Thanks for the advice! The model actually works on it's own in python fine > luckily, so I don't think that that is the issue exactly. I have tried > rolling my own estimator to wrap the pipeline to have it call the > predict_proba method to return a dense array, however I then came across the > problem that I would have to have that custom estimator defined on the Cloud > ML end, which I'm unsure how to do. > > Thanks, > Liam > > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka > wrote: > Hi Liam, > > not sure what your exact error message is, but it may also be that the > XGBClassifier only accepts dense arrays? I think the TfidfVectorizer returns > sparse arrays. You could probably fix your issues by inserting a > "DenseTransformer" into your pipelone (a simple class that just transforms an > array from a sparse to a dense format). I've implemented sth like that that > you can import or copy it from here: > > https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py > > The usage would then basically be > > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', > DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) > > Best, > Sebastian > > > > > > On Apr 10, 2019, at 12:25 PM, Liam Geron wrote: > > > > Hi all, > > > > I was hoping to get some guidance re: changing the result of the predict > > method of the OneVsRestClassifier to return a dense array rather than a > > sparse array, given that Google Cloud ML only accepts dense numpy arrays as > > a result of a given models predict method. Right now my model architecture > > looks like: > > > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', > > OneVsRestClassifier(XGBClassifier()))]) > > > > Which returns a sparse array with the predict method. I saw the Stack > > Overflow post here: > > https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba > > > > which recommends overwriting the predict method with the predict_proba > > method, however I found that I can't serialize the model after doing so. I > > also have a stack overflow post here: > > https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a > > which details the specific pickling error. > > > > Is this a known issue? Is there an accepted way to convert this into a > > dense array? > > > > Thanks, > > Liam Geron > > ___ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
Hi Sebastian, Thanks for the advice! The model actually works on it's own in python fine luckily, so I don't think that that is the issue exactly. I have tried rolling my own estimator to wrap the pipeline to have it call the predict_proba method to return a dense array, however I then came across the problem that I would have to have that custom estimator defined on the Cloud ML end, which I'm unsure how to do. Thanks, Liam On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka wrote: > Hi Liam, > > not sure what your exact error message is, but it may also be that the > XGBClassifier only accepts dense arrays? I think the TfidfVectorizer > returns sparse arrays. You could probably fix your issues by inserting a > "DenseTransformer" into your pipelone (a simple class that just transforms > an array from a sparse to a dense format). I've implemented sth like that > that you can import or copy it from here: > > > https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py > > The usage would then basically be > > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', > DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) > > Best, > Sebastian > > > > > > On Apr 10, 2019, at 12:25 PM, Liam Geron wrote: > > > > Hi all, > > > > I was hoping to get some guidance re: changing the result of the predict > method of the OneVsRestClassifier to return a dense array rather than a > sparse array, given that Google Cloud ML only accepts dense numpy arrays as > a result of a given models predict method. Right now my model architecture > looks like: > > > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', > OneVsRestClassifier(XGBClassifier()))]) > > > > Which returns a sparse array with the predict method. I saw the Stack > Overflow post here: > https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba > > > > which recommends overwriting the predict method with the predict_proba > method, however I found that I can't serialize the model after doing so. I > also have a stack overflow post here: > https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a > which details the specific pickling error. > > > > Is this a known issue? Is there an accepted way to convert this into a > dense array? > > > > Thanks, > > Liam Geron > > ___ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML
Hi Liam, not sure what your exact error message is, but it may also be that the XGBClassifier only accepts dense arrays? I think the TfidfVectorizer returns sparse arrays. You could probably fix your issues by inserting a "DenseTransformer" into your pipelone (a simple class that just transforms an array from a sparse to a dense format). I've implemented sth like that that you can import or copy it from here: https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py The usage would then basically be model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) Best, Sebastian > On Apr 10, 2019, at 12:25 PM, Liam Geron wrote: > > Hi all, > > I was hoping to get some guidance re: changing the result of the predict > method of the OneVsRestClassifier to return a dense array rather than a > sparse array, given that Google Cloud ML only accepts dense numpy arrays as a > result of a given models predict method. Right now my model architecture > looks like: > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', > OneVsRestClassifier(XGBClassifier()))]) > > Which returns a sparse array with the predict method. I saw the Stack > Overflow post here: > https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba > > which recommends overwriting the predict method with the predict_proba > method, however I found that I can't serialize the model after doing so. I > also have a stack overflow post here: > https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a > which details the specific pickling error. > > Is this a known issue? Is there an accepted way to convert this into a dense > array? > > Thanks, > Liam Geron > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn