Please I would like to implement Hyperellipsoidal one class SVM using SKLEARN Is it possible? If yes, how will I do?
Thanks for all to answer me 2015-12-11 18:56 GMT+01:00 < scikit-learn-general-requ...@lists.sourceforge.net>: > Send Scikit-learn-general mailing list submissions to > scikit-learn-general@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > or, via email, send a message with subject or body 'help' to > scikit-learn-general-requ...@lists.sourceforge.net > > You can reach the person managing the list at > scikit-learn-general-ow...@lists.sourceforge.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Scikit-learn-general digest..." > > > Today's Topics: > > 1. Re: (no subject) (Sujit Pal) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 11 Dec 2015 09:56:36 -0800 > From: Sujit Pal <sujitatgt...@gmail.com> > Subject: Re: [Scikit-learn-general] (no subject) > To: scikit-learn-general@lists.sourceforge.net > Message-ID: > <CALk-qmjTHS-7pby9UG1r1r-iFaz8gb=aD_qcmhhu4Cg3= > 0n...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi Mukesh, > > I think you are looking for *multi-label classifiers* where a record can be > of multiple classes. According to this page: > http://scikit-learn.org/stable/modules/multiclass.html > > The following classifiers support multilabel - Decision Tree, Random > > Forest, Nearest Neighbor and Ridge Regression. > > > By changing the binarizer to MultiLabelBinarizer, and the LinearSVC > reference to one of the supported classifers, I was able to get this to run > to completion. The predict(X) method returns only a single class, I used > predict_proba(X) to get a vector of probabilities for each class. You > probably need some sort of cutoff to determine if something is in a class > or not. My changes are as follows. Replacing the binarizer: > > #lb = preprocessing.LabelBinarizer() > > lb = preprocessing.MultiLabelBinarizer() > > Y = lb.fit_transform(y_train_text) > > > Replacing the classifier to one of the supported ones in the pipeline. > > > classifier = Pipeline([ > > ('vectorizer', CountVectorizer()), > > ('tfidf', TfidfTransformer()), > > ('clf', OneVsRestClassifier(RandomForestClassifier()))]) > > # ('clf', OneVsRestClassifier(KNeighborsClassifier()))]) > > # ('clf', OneVsRestClassifier(LinearSVC()))]) > > > FInally replacing the call to predict(Xtest) with predict_proba(X_test). > > > classifier.fit(X_train, Y) > > #predicted = classifier.predict(X_test) > > predicted = classifier.predict_proba(X_test) > > #all_labels = lb.inverse_transform(predicted) > > > I just printed out the predicted matrix and this is what I get with > KNeighborsClassifier and RandomForestClassifier. > > KNeighborsClassifier: > > [[ 0.6 0.4] > > [ 1. 0. ] > > [ 1. 0.2] > > [ 1. 0.2] > > [ 0.6 0.6] > > [ 0.8 0.4] > > [ 0.6 0.8]] > > > > > > RandomForestClassifier: > > [[ 0.3 0.3] > > [ 0.9 0.3] > > [ 1. 0.4] > > [ 0.7 0.2] > > [ 0.4 0.3] > > [ 0.4 0.2] > > [ 0.5 0.5]] > > > If you threshold at 0.5 you will get reasonable results with > KNeighborsClassifier, though not as accurate as hoped. Maybe it needs more > input or some experimentation with hyperparameters. Something like this: > > #for item, labels in zip(X_test, predicted): > > # print '%s => %s' % (item, ', '.join(str(labels))) > > for item, preds in zip(X_test, predicted): > > norm_preds = [(0 if x < 0.5 else 1) for x in preds.tolist()] > > pred_targets = ["" if x[1] == 0 else target_names[x[0]] > > for x in enumerate(norm_preds)] > > print item, filter(lambda x: len(x.strip()) > 0, pred_targets) > > > returns these results: > > nice day in nyc ['New York'] > > welcome to london ['New York'] > > london is rainy ['New York'] > > it is raining in britian ['New York'] > > it is raining in britian and the big apple ['New York', 'London'] > > it is raining in britian and nyc ['New York'] > > hello welcome to new york. enjoy it here and london too ['New York', > > 'London'] > > > -sujit > > On Thu, Dec 10, 2015 at 10:29 PM, mukesh tiwari < > mukeshtiwari.ii...@gmail.com> wrote: > > > Dear Sujit, > > Thank you for reply and solution. It's working great but using this I can > > determine only one feature at a time. The last line > > "hello welcome to new york. enjoy it here and london too" should output > > "london, new york" but it's only giving "new york". > > > > I am trying to do sentiment analysis of hotel review based on 6 aspects > > like Restaurant, Frontdesk, Room Amenities & Experience, Washrooms, Hotel > > and Internet and all these categories has sub category (almost 90). I am > > tagging each review with sentiment about the sub category. I can tag a > > review with multiple sub category so my requirement is multi-label. > Example > > is given below. > > > > The location was excellent for this hotel as it's super close to the > > airport and the wifi connection was relatively okay but those were the > only > > perks => * Positive Location, Positive Wifi* > > > > The room was filthy, we had to call reception twice to ask for toilet > > paper as we didn't have any and there were stains on the walls, the > toilet > > seat, balls of hair on the floor, need I carry on => *Negative Room, > > Negative Walls, Negative Toilet seat * > > > > The picture of the "breakfast buffet" says it all really. > > > > *Negative Breakfast *All in all we won't be coming back no matter how > > close it is. => *Negative Experience* > > > > I am building my term matrix simply by term frequency, inverse document > > frequency. In short, I have matrix n X m matrix (n samples and m > features) > > > > >>> data > > array([[1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, > > 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0], > > [0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, > > 0, 1, 0, 1, 0, 1, 1, 0, 2, 1, 1, 0, 0], > > [0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, > > 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1]]) > > > > and the output is > > >>> y > > [[1, 1, 0, 0, 0], [1, 1, 1, 1, 0], [1, 1, 1, 1, 1]] > > > > and now I need a classifier for this purpose. > > > > > > Best regards, > > Mukesh Tiwari > > > > > > On Thu, Dec 10, 2015 at 11:08 PM, Sujit Pal <sujitatgt...@gmail.com> > > wrote: > > > >> Hi Mukesh, > >> > >> I was getting the following error from your code on my environment > >> (Python 2.7.11 - Anaconda 2.4.1, scikit-learn 0.17) on Mac OSX 10.9 for > the > >> following line: > >> > >> Y = lb.fit_transform(y_train_text) > >>> ValueError: You appear to be using a legacy multi-label data > >>> representation. Sequence of sequences are no longer supported; use a > binary > >>> array or sparse matrix instead. > >> > >> > >> To fix, I did this: > >> > >> y_train_text0 = [["new york"],["new york"],["new york"],["new > >>> york"],["new york"], > >>> ["new > york"],["london"],["london"],["london"],["london"], > >>> ["london"],["london"],["new york","london"],["new > >>> york","london"]] > >>> y_train_text = [x[0] for x in y_train_text0] > >> > >> > >> and a cosmetic fix here: > >> > >> for item, labels in zip(X_test, all_labels): > >>> print '%s => %s' % (item, labels) > >> > >> > >> and now getting following result: > >> > >> nice day in nyc => new york > >>> welcome to london => london > >>> london is rainy => london > >>> it is raining in britian => london > >>> it is raining in britian and the big apple => new york > >>> it is raining in britian and nyc => new york > >>> hello welcome to new york. enjoy it here and london too => new york > >> > >> > >> -sujit > >> > >> > >> On Thu, Dec 10, 2015 at 4:08 AM, mukesh tiwari < > >> mukeshtiwari.ii...@gmail.com> wrote: > >> > >>> Hello Everyone, > >>> I am trying to learn scikit and my problem is somewhat related to this > >>> problem [1]. When I am trying to run the code > >>> > >>> import numpy as npfrom sklearn.pipeline import Pipelinefrom > sklearn.feature_extraction.text import CountVectorizerfrom sklearn.svm > import LinearSVCfrom sklearn.feature_extraction.text import > TfidfTransformerfrom sklearn.multiclass import OneVsRestClassifierfrom > sklearn import preprocessing > >>> > >>> X_train = np.array(["new york is a hell of a town", > >>> "new york was originally dutch", > >>> "the big apple is great", > >>> "new york is also called the big apple", > >>> "nyc is nice", > >>> "people abbreviate new york city as nyc", > >>> "the capital of great britain is london", > >>> "london is in the uk", > >>> "london is in england", > >>> "london is in great britain", > >>> "it rains a lot in london", > >>> "london hosts the british museum", > >>> "new york is great and so is london", > >>> "i like london better than new york"]) > >>> y_train_text = [["new york"],["new york"],["new york"],["new > york"],["new york"], > >>> ["new > york"],["london"],["london"],["london"],["london"], > >>> ["london"],["london"],["new york","london"],["new > york","london"]] > >>> > >>> X_test = np.array(['nice day in nyc', > >>> 'welcome to london', > >>> 'london is rainy', > >>> 'it is raining in britian', > >>> 'it is raining in britian and the big apple', > >>> 'it is raining in britian and nyc', > >>> 'hello welcome to new york. enjoy it here and > london too']) > >>> target_names = ['New York', 'London'] > >>> > >>> lb = preprocessing.LabelBinarizer() > >>> Y = lb.fit_transform(y_train_text) > >>> > >>> classifier = Pipeline([ > >>> ('vectorizer', CountVectorizer()), > >>> ('tfidf', TfidfTransformer()), > >>> ('clf', OneVsRestClassifier(LinearSVC()))]) > >>> > >>> classifier.fit(X_train, Y) > >>> predicted = classifier.predict(X_test) > >>> all_labels = lb.inverse_transform(predicted) > >>> for item, labels in zip(X_test, all_labels): > >>> print '%s => %s' % (item, ', '.join(labels)) > >>> > >>> > >>> I am getting > >>> Traceback (most recent call last): > >>> File "phrase.py", line 37, in <module> > >>> Y = lb.fit_transform(y_train_text) > >>> File "/Library/Python/2.7/site-packages/sklearn/base.py", line 455, in > fit_transform > >>> return self.fit(X, **fit_params).transform(X) > >>> File > "/Library/Python/2.7/site-packages/sklearn/preprocessing/label.py", line > 300, in fit > >>> self.y_type_ = type_of_target(y) > >>> File "/Library/Python/2.7/site-packages/sklearn/utils/multiclass.py", > line 251, in type_of_target > >>> raise ValueError('You appear to be using a legacy multi-label data' > >>> ValueError: You appear to be using a legacy multi-label data > representation. Sequence of sequences are no longer supported; use a binary > array or sparse matrix instead. > >>> > >>> I tried to change the > >>> > >>> y_train_text = [["new york"],["new york"],["new york"],["new > york"],["new york"], > >>> ["new > york"],["london"],["london"],["london"],["london"], > >>> ["london"],["london"],["new york","london"],["new > york","london"]] > >>> > >>> to y_train_text = [[1,0], [1,0], [1,0], [1,0], [1,0], > >>> [1,0], [0,1], [0,1], [0,1], [0,1], > >>> [0,1], [0,1], [1,1], [1,1]] > >>> > >>> then I am getting > >>> ValueError: Multioutput target data is not supported with label > binarization > >>> > >>> Could some one please tell me how to resolve this. > >>> > >>> Best regards, > >>> Mukesh Tiwari > >>> > >>> > >>> [1] > >>> > http://stackoverflow.com/questions/10526579/use-scikit-learn-to-classify-into-multiple-categories > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> > >>> _______________________________________________ > >>> Scikit-learn-general mailing list > >>> Scikit-learn-general@lists.sourceforge.net > >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >>> > >>> > >> > >> > >> > ------------------------------------------------------------------------------ > >> > >> _______________________________________________ > >> Scikit-learn-general mailing list > >> Scikit-learn-general@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >> > >> > > > > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > > > ------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > End of Scikit-learn-general Digest, Vol 71, Issue 17 > **************************************************** >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general