Hi all,
Hope you are doing well.
I was able to successfully complete multi label classification using SGD
Classifier inside OneVsRest Classifier.
Something peculiar is happening:
When I am using the classifier to predict on new data, the prediction
probability is 1 for particular 2 columns while it is always zero for
everything else.
Though this is theoretically correct, this wasn't the case before.
The input to classifier is a Sparse Matrix.
Only difference from previous implementation is use of Dict Ventorizer to
encode instead of One Hot encoding
Let me know in what are the ways this can be resolved. Should I make
any upstream changes?
Regards,
Sanant
On Wed, Dec 2, 2015 at 12:29 PM, Startup Hire <blrstartuph...@gmail.com>
wrote:
> Hi,
>
> I guess the error was due to the fact that I was using One hot encoding of
> a data frame which include Strings.
>
> Currently, I started using Dict Vectorizer to encode both my categorical
> variables ( in integers) and categorical variables which are strings.
>
> It seems to be working fine.
>
> My Y is as follows
>
> import scipy.sparse as sps
> from sklearn.feature_extraction import DictVectorizer
>
> vec = DictVectorizer()
>
> # Convert Panda Data frame to Dict
> train_df = df_modified[['locationid','dep_departtime',
> 'arr_arrivetime',
> 'arr_departtime',
> 'dep_arrivetime',
> 'departureairport_or_point',
> 'destinationairport_or_point',
> 'bookeddate',
> 'departuredate']]
>
> train_dict = train_df.T.to_dict().values()
>
> # Create Fit_Transform
>
> b=vec.fit(train_dict)
> a=b.transform(train_dict)
>
>
>
> I hope I am working in the right direction. Let me know your thoughts
>
> Regards,
> Sanant
>
>
>
>
> Subject: Re: [Scikit-learn-general] Multi Label classification using
> OneVsRest Classifier
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <565e7223.3090...@gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> Please provide the full traceback.
> What is the type of y here, and what are its entries?
>
>
> On 11/30/2015 07:45 PM, Startup Hire wrote:
> > Hi Pypers,
> >
> > Hope you are doing well.
> >
> > I am doing multi label classification in which my X and Y are sparse
> > matrices with Y properly binarized.
> >
> > Though my Y has multi-labels properly binarized, I am getting the
> > following error:
> >
> > Value Error: Multioutput target data is not supported with label
> binarization
> >
> >
> > The Classifier I am using is as follows:
> >
> > Classifier =
> > OneVsRestClassifier(SGDClassifier(random_state=0,
> loss='log',alpha=0.00001,penalty='elasticnet')).fit(Finaldata,y)
> >
> > Let me know in what are the ways this can be resolved. Should I make
> > any upstream changes?
> > Regards,
> > Sanant
>
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general