Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Joel Nothman
I think it's a bit weird if we're returning sparse output from OneVsRestClassifier.predict if it wasn't fit on sparse Y. Actually, I would be in favour of deprecating multilabel support in OneVsRestClassifier, since it is performing "binary relevance method" for multilabel, not actually OvR.

Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Liam Geron
Unfortunately I don't believe that you get that level of freedom, it's an API call that automatically calls the model's predict method so I don't think that I get to specify something like model.predict(X).toarray(). I could be wrong however, I don't pretend to be an expert on Cloud ML by any

Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Sebastian Raschka
Hm, weird that their platform seems to be so picky about it. Have you tried to just make the output of the pipeline dense? I.e., (model.predict(X)).toarray() Best, Sebastian > On Apr 10, 2019, at 1:10 PM, Liam Geron wrote: > > Hi Sebastian, > > Thanks for the advice! The model actually

Re: [scikit-learn] Feature engineering functionality - new package

2019-04-10 Thread Sole Galli
Hi Nicolas, You are right, I am just checking this in the source code. Sorry for the confusion and thanks for the quick response Cheers Sole On Wed, 10 Apr 2019 at 18:43, Nicolas Goix wrote: > Hi Sole, > > I'm not sure the 2 limitations you mentioned are correct. > 1) in your example, using

Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Liam Geron
Hi Sebastian, Thanks for the advice! The model actually works on it's own in python fine luckily, so I don't think that that is the issue exactly. I have tried rolling my own estimator to wrap the pipeline to have it call the predict_proba method to return a dense array, however I then came

Re: [scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Sebastian Raschka
Hi Liam, not sure what your exact error message is, but it may also be that the XGBClassifier only accepts dense arrays? I think the TfidfVectorizer returns sparse arrays. You could probably fix your issues by inserting a "DenseTransformer" into your pipelone (a simple class that just

Re: [scikit-learn] Feature engineering functionality - new package

2019-04-10 Thread Nicolas Goix
Hi Sole, I'm not sure the 2 limitations you mentioned are correct. 1) in your example, using the ColumnTransformer you can impute different values for different columns. 2) the sklearn transformers do learn on the training set and are able to perpetuate the values learnt from the train set to

[scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

2019-04-10 Thread Liam Geron
Hi all, I was hoping to get some guidance re: changing the result of the predict method of the OneVsRestClassifier to return a dense array rather than a sparse array, given that Google Cloud ML only accepts dense numpy arrays as a result of a given models predict method. Right now my model

Re: [scikit-learn] Feature engineering functionality - new package

2019-04-10 Thread Sole Galli
> > Dear Scikit-Learn team, > > Feature engineering is a big task ahead of building machine learning > models. It involves imputation of missing values, encoding of categorical > variables, discretisation, variable transformation etc. > > Sklearn includes some functionality for feature