Re: [Scikit-learn-general] Scikit-learn-general Digest, Vol 71, Issue 17

kouami barnabas Wed, 16 Dec 2015 00:29:24 -0800

Please I would like to implement Hyperellipsoidal one class SVM using
SKLEARN
Is it possible?
If yes, how will I do?


Thanks for all to answer me


2015-12-11 18:56 GMT+01:00 <
scikit-learn-general-requ...@lists.sourceforge.net>:

> Send Scikit-learn-general mailing list submissions to
>         scikit-learn-general@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> or, via email, send a message with subject or body 'help' to
>         scikit-learn-general-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
>         scikit-learn-general-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Scikit-learn-general digest..."
>
>
> Today's Topics:
>
>    1. Re: (no subject) (Sujit Pal)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 11 Dec 2015 09:56:36 -0800
> From: Sujit Pal <sujitatgt...@gmail.com>
> Subject: Re: [Scikit-learn-general] (no subject)
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
>         <CALk-qmjTHS-7pby9UG1r1r-iFaz8gb=aD_qcmhhu4Cg3=
> 0n...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Mukesh,
>
> I think you are looking for *multi-label classifiers* where a record can be
> of multiple classes. According to this page:
> http://scikit-learn.org/stable/modules/multiclass.html
>
> The following classifiers support multilabel - Decision Tree, Random
> > Forest, Nearest Neighbor and Ridge Regression.
>
>
> By changing the binarizer to MultiLabelBinarizer, and the LinearSVC
> reference to one of the supported classifers, I was able to get this to run
> to completion. The predict(X) method returns only a single class, I used
> predict_proba(X) to get a vector of probabilities for each class. You
> probably need some sort of cutoff to determine if something is in a class
> or not. My changes are as follows. Replacing the binarizer:
>
> #lb = preprocessing.LabelBinarizer()
> > lb = preprocessing.MultiLabelBinarizer()
> > Y = lb.fit_transform(y_train_text)
>
>
> Replacing the classifier to one of the supported ones in the pipeline.
>
> > classifier = Pipeline([
> >     ('vectorizer', CountVectorizer()),
> >     ('tfidf', TfidfTransformer()),
> >     ('clf', OneVsRestClassifier(RandomForestClassifier()))])
> > #    ('clf', OneVsRestClassifier(KNeighborsClassifier()))])
> > #    ('clf', OneVsRestClassifier(LinearSVC()))])
>
>
> FInally replacing the call to predict(Xtest) with predict_proba(X_test).
>
> > classifier.fit(X_train, Y)
> > #predicted = classifier.predict(X_test)
> > predicted = classifier.predict_proba(X_test)
> > #all_labels = lb.inverse_transform(predicted)
>
>
> I just printed out the predicted matrix and this is what I get with
> KNeighborsClassifier and RandomForestClassifier.
>
> KNeighborsClassifier:
> > [[ 0.6  0.4]
> >  [ 1.   0. ]
> >  [ 1.   0.2]
> >  [ 1.   0.2]
> >  [ 0.6  0.6]
> >  [ 0.8  0.4]
> >  [ 0.6  0.8]]
> >
>
>
> > RandomForestClassifier:
> > [[ 0.3  0.3]
> >  [ 0.9  0.3]
> >  [ 1.   0.4]
> >  [ 0.7  0.2]
> >  [ 0.4  0.3]
> >  [ 0.4  0.2]
> >  [ 0.5  0.5]]
>
>
> If you threshold at 0.5 you will get reasonable results with
> KNeighborsClassifier, though not as accurate as hoped. Maybe it needs more
> input or some experimentation with hyperparameters. Something like this:
>
> #for item, labels in zip(X_test, predicted):
> > #    print '%s => %s' % (item, ', '.join(str(labels)))
> > for item, preds in zip(X_test, predicted):
> >     norm_preds = [(0 if x < 0.5 else 1) for x in preds.tolist()]
> >     pred_targets = ["" if x[1] == 0 else target_names[x[0]]
> >                     for x in enumerate(norm_preds)]
> >     print item, filter(lambda x: len(x.strip()) > 0, pred_targets)
>
>
> returns these results:
>
> nice day in nyc ['New York']
> > welcome to london ['New York']
> > london is rainy ['New York']
> > it is raining in britian ['New York']
> > it is raining in britian and the big apple ['New York', 'London']
> > it is raining in britian and nyc ['New York']
> > hello welcome to new york. enjoy it here and london too ['New York',
> > 'London']
>
>
> -sujit
>
> On Thu, Dec 10, 2015 at 10:29 PM, mukesh tiwari <
> mukeshtiwari.ii...@gmail.com> wrote:
>
> > Dear Sujit,
> > Thank you for reply and solution. It's working great but using this I can
> > determine only one feature at a time. The last line
> > "hello welcome to new york. enjoy it here and london too"  should output
> > "london, new york" but it's only giving "new york".
> >
> > I am trying to do sentiment analysis of hotel review based on 6 aspects
> > like Restaurant, Frontdesk, Room Amenities & Experience, Washrooms, Hotel
> > and Internet and all these categories has sub category (almost 90). I am
> > tagging each review with sentiment about the sub category. I can tag a
> > review with multiple sub category so my requirement is multi-label.
> Example
> > is given below.
> >
> > The location was excellent for this hotel as it's super close to the
> > airport and the wifi connection was relatively okay but those were the
> only
> > perks => * Positive Location, Positive Wifi*
> >
> > The room was filthy, we had to call reception twice to ask for toilet
> > paper as we didn't have any and there were stains on the walls, the
> toilet
> > seat, balls of hair on the floor, need I carry on => *Negative Room,
> > Negative Walls, Negative Toilet seat *
> >
> > The picture of the "breakfast buffet" says it all really.
> >
> > *Negative Breakfast *All in all we won't be coming back no matter how
> > close it is. => *Negative Experience*
> >
> > I am building my term matrix simply by term frequency, inverse document
> > frequency. In short, I have matrix n X m matrix (n samples and m
> features)
> >
> > >>> data
> > array([[1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0,
> >         0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],
> >        [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0,
> >         0, 1, 0, 1, 0, 1, 1, 0, 2, 1, 1, 0, 0],
> >        [0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
> >         1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]])
> >
> > and the output is
> > >>> y
> > [[1, 1, 0, 0, 0], [1, 1, 1, 1, 0], [1, 1, 1, 1, 1]]
> >
> > and now I need a classifier for this purpose.
> >
> >
> > Best regards,
> > Mukesh Tiwari
> >
> >
> > On Thu, Dec 10, 2015 at 11:08 PM, Sujit Pal <sujitatgt...@gmail.com>
> > wrote:
> >
> >> Hi Mukesh,
> >>
> >> I was getting the following error from your code on my environment
> >> (Python 2.7.11 - Anaconda 2.4.1, scikit-learn 0.17) on Mac OSX 10.9 for
> the
> >> following line:
> >>
> >>     Y = lb.fit_transform(y_train_text)
> >>> ValueError: You appear to be using a legacy multi-label data
> >>> representation. Sequence of sequences are no longer supported; use a
> binary
> >>> array or sparse matrix instead.
> >>
> >>
> >> To fix, I did this:
> >>
> >> y_train_text0 = [["new york"],["new york"],["new york"],["new
> >>> york"],["new york"],
> >>>                 ["new
> york"],["london"],["london"],["london"],["london"],
> >>>                 ["london"],["london"],["new york","london"],["new
> >>> york","london"]]
> >>> y_train_text = [x[0] for x in y_train_text0]
> >>
> >>
> >> and a cosmetic fix here:
> >>
> >> for item, labels in zip(X_test, all_labels):
> >>>     print '%s => %s' % (item, labels)
> >>
> >>
> >> and now getting following result:
> >>
> >> nice day in nyc => new york
> >>> welcome to london => london
> >>> london is rainy => london
> >>> it is raining in britian => london
> >>> it is raining in britian and the big apple => new york
> >>> it is raining in britian and nyc => new york
> >>> hello welcome to new york. enjoy it here and london too => new york
> >>
> >>
> >> -sujit
> >>
> >>
> >> On Thu, Dec 10, 2015 at 4:08 AM, mukesh tiwari <
> >> mukeshtiwari.ii...@gmail.com> wrote:
> >>
> >>> Hello Everyone,
> >>> I am trying to learn scikit and my problem is somewhat related to this
> >>> problem [1]. When I am trying to run the code
> >>>
> >>> import numpy as npfrom sklearn.pipeline import Pipelinefrom
> sklearn.feature_extraction.text import CountVectorizerfrom sklearn.svm
> import LinearSVCfrom sklearn.feature_extraction.text import
> TfidfTransformerfrom sklearn.multiclass import OneVsRestClassifierfrom
> sklearn import preprocessing
> >>>
> >>> X_train = np.array(["new york is a hell of a town",
> >>>                     "new york was originally dutch",
> >>>                     "the big apple is great",
> >>>                     "new york is also called the big apple",
> >>>                     "nyc is nice",
> >>>                     "people abbreviate new york city as nyc",
> >>>                     "the capital of great britain is london",
> >>>                     "london is in the uk",
> >>>                     "london is in england",
> >>>                     "london is in great britain",
> >>>                     "it rains a lot in london",
> >>>                     "london hosts the british museum",
> >>>                     "new york is great and so is london",
> >>>                     "i like london better than new york"])
> >>> y_train_text = [["new york"],["new york"],["new york"],["new
> york"],["new york"],
> >>>                 ["new
> york"],["london"],["london"],["london"],["london"],
> >>>                 ["london"],["london"],["new york","london"],["new
> york","london"]]
> >>>
> >>> X_test = np.array(['nice day in nyc',
> >>>                    'welcome to london',
> >>>                    'london is rainy',
> >>>                    'it is raining in britian',
> >>>                    'it is raining in britian and the big apple',
> >>>                    'it is raining in britian and nyc',
> >>>                    'hello welcome to new york. enjoy it here and
> london too'])
> >>> target_names = ['New York', 'London']
> >>>
> >>> lb = preprocessing.LabelBinarizer()
> >>> Y = lb.fit_transform(y_train_text)
> >>>
> >>> classifier = Pipeline([
> >>>     ('vectorizer', CountVectorizer()),
> >>>     ('tfidf', TfidfTransformer()),
> >>>     ('clf', OneVsRestClassifier(LinearSVC()))])
> >>>
> >>> classifier.fit(X_train, Y)
> >>> predicted = classifier.predict(X_test)
> >>> all_labels = lb.inverse_transform(predicted)
> >>> for item, labels in zip(X_test, all_labels):
> >>>     print '%s => %s' % (item, ', '.join(labels))
> >>>
> >>>
> >>> I am getting
> >>> Traceback (most recent call last):
> >>> File "phrase.py", line 37, in <module>
> >>> Y = lb.fit_transform(y_train_text)
> >>> File "/Library/Python/2.7/site-packages/sklearn/base.py", line 455, in
> fit_transform
> >>> return self.fit(X, **fit_params).transform(X)
> >>> File
> "/Library/Python/2.7/site-packages/sklearn/preprocessing/label.py", line
> 300, in fit
> >>> self.y_type_ = type_of_target(y)
> >>> File "/Library/Python/2.7/site-packages/sklearn/utils/multiclass.py",
> line 251, in type_of_target
> >>> raise ValueError('You appear to be using a legacy multi-label data'
> >>> ValueError: You appear to be using a legacy multi-label data
> representation. Sequence of sequences are no longer supported; use a binary
> array or sparse matrix instead.
> >>>
> >>> I tried to change the
> >>>
> >>> y_train_text = [["new york"],["new york"],["new york"],["new
> york"],["new york"],
> >>>                 ["new
> york"],["london"],["london"],["london"],["london"],
> >>>                 ["london"],["london"],["new york","london"],["new
> york","london"]]
> >>>
> >>> to y_train_text = [[1,0], [1,0], [1,0], [1,0], [1,0],
> >>>                    [1,0], [0,1], [0,1], [0,1], [0,1],
> >>>                    [0,1], [0,1], [1,1], [1,1]]
> >>>
> >>> then I am getting
> >>> ValueError: Multioutput target data is not supported with label
> binarization
> >>>
> >>> Could some one please tell me how to resolve this.
> >>>
> >>> Best regards,
> >>> Mukesh Tiwari
> >>>
> >>>
> >>> [1]
> >>>
> http://stackoverflow.com/questions/10526579/use-scikit-learn-to-classify-into-multiple-categories
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>> Scikit-learn-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
> >>>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>
> >>
> >
> >
> >
> ------------------------------------------------------------------------------
> >
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
>
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 71, Issue 17
> ****************************************************
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Scikit-learn-general Digest, Vol 71, Issue 17

Reply via email to