Sorry for the red herring, but I've realized it's not an issue with Pipeline. The code below has the same behavior:
nw = dat.datetime.now() rndstat = nw.hour*3600+nw.minute*60+nw.second twenty_train = fetch_20newsgroups(subset='train', categories=categories, random_state = rndstat, shuffle=True, download_if_missing=False) twenty_test = fetch_20newsgroups(subset='test', categories=categories, random_state = rndstat, shuffle=True, download_if_missing=False) cv = CountVectorizer() X_train = cv.fit_transform(twenty_train.data) clf = MultinomialNB().fit(X_train,twenty_train.target) pred = clf.predict(cv.transform(twenty_test.data)) print(sum(pred == twenty_test.target)/len(twenty_test.target)) Andrew On Thu, Aug 27, 2015 at 3:45 PM, < scikit-learn-general-requ...@lists.sourceforge.net> wrote: > Send Scikit-learn-general mailing list submissions to > scikit-learn-general@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > or, via email, send a message with subject or body 'help' to > scikit-learn-general-requ...@lists.sourceforge.net > > You can reach the person managing the list at > scikit-learn-general-ow...@lists.sourceforge.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Scikit-learn-general digest..." > > > Today's Topics: > > 1. Tests against reference implementations, speed regression > tests (Andreas Mueller) > 2. Turning on sample weights for linear_model.LogisticRegression > (Valentin Stolbunov) > 3. Re: Turning on sample weights for > linear_model.LogisticRegression (Joel Nothman) > 4. Re: Turning on sample weights for > linear_model.LogisticRegression (Andy) > 5. Re: K-SVD implementation (????? ??????? (Alexey Umnov)) > 6. issue with pipeline always giving same results (Andrew Howe) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 25 Aug 2015 13:06:11 -0400 > From: Andreas Mueller <t3k...@gmail.com> > Subject: [Scikit-learn-general] Tests against reference > implementations, speed regression tests > To: scikit-learn-general@lists.sourceforge.net > Message-ID: <55dca083.6020...@gmail.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > Hey all. > > I will soon have some student dev resources and I'm pondering how to > best use them. > Apart from the hundreds of issues, one thing I was thinking about adding > is more tests against reference implementations, > and having speed regression tests. > > For the reference implementations, we could hard-code the results of > algorithms into the tests. That is done for some > algorithms, but only very few. It would guard us against "obvious" > functionality bugs, which still show up from time to time. > > For speed regression tests, it has happened that things got slower, in > particular with innocent looking things like input validation. > I think it would be good to have some tests that ensure that we don't > get too much slower. > I'm not entirely sure how do to that, though. > I know Vlad put some effort into a continuous benchmarking suite, but I > think since then there have been several > efforts to log speed of implementations in a consistent way, and we > might want to look into these. > > Do you think that these are interesting issues to work on, or do you > think there are more pressing ones? > > We still have a lot to do on the API side, though I'm a bit hesitant to > give that to new devs. > > Cheers, > Andy > > > > ------------------------------ > > Message: 2 > Date: Wed, 26 Aug 2015 19:15:53 -0500 > From: Valentin Stolbunov <valentin.stolbu...@gmail.com> > Subject: [Scikit-learn-general] Turning on sample weights for > linear_model.LogisticRegression > To: scikit-learn-general@lists.sourceforge.net > Message-ID: > <CAM5iThP3YExbMt8HFXvkRA5uZSY-1p1qRCJrefR0=Kf= > cyz...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello everyone, > > I noticed that two of the three solvers in the logistic regression module > (newton-cg and lbfgs) accept sample weights, but this feature is hidden > away from users by not recognizing sample_weight as parameter in .ft(). > Instead, sample_weight is set to ones (line 555 of logistic.py). To the > best of my knowledge this is because the default solver (liblinear) does > not support them? > > Could we instead allow sample_weight as a parameter (default None) and set > them to ones only if the chosen solver is liblinear (with appropriate > documentation notes - similar to the way the L1 penalty is supported only > by liblinear)? > > I realize that SGDClassifier's .fit() accepts sample weights and the loss > can be set to 'log', however this isn't exactly the same. > > What do you think? > > Valentin > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 3 > Date: Thu, 27 Aug 2015 11:29:40 +1000 > From: Joel Nothman <joel.noth...@gmail.com> > Subject: Re: [Scikit-learn-general] Turning on sample weights for > linear_model.LogisticRegression > To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net> > Message-ID: > <CAAkaFLU2=CV8kBBOWJz1-= > rt+nen56g7k5jfxb-3nygln-o...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > I agree. I suspect this was an unintentional omission, in fact. > > Apart from which, sample_weight support in liblinear could be merged from > https://github.com/scikit-learn/scikit-learn/pull/2784 which is dormant, > and merely needs some core contributors to show interest in merging it... > > On 27 August 2015 at 10:15, Valentin Stolbunov < > valentin.stolbu...@gmail.com > > wrote: > > > Hello everyone, > > > > I noticed that two of the three solvers in the logistic regression module > > (newton-cg and lbfgs) accept sample weights, but this feature is hidden > > away from users by not recognizing sample_weight as parameter in .ft(). > > Instead, sample_weight is set to ones (line 555 of logistic.py). To the > > best of my knowledge this is because the default solver (liblinear) does > > not support them? > > > > Could we instead allow sample_weight as a parameter (default None) and > set > > them to ones only if the chosen solver is liblinear (with appropriate > > documentation notes - similar to the way the L1 penalty is supported only > > by liblinear)? > > > > I realize that SGDClassifier's .fit() accepts sample weights and the loss > > can be set to 'log', however this isn't exactly the same. > > > > What do you think? > > > > Valentin > > > > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Wed, 26 Aug 2015 22:59:44 -0400 > From: Andy <t3k...@gmail.com> > Subject: Re: [Scikit-learn-general] Turning on sample weights for > linear_model.LogisticRegression > To: scikit-learn-general@lists.sourceforge.net > Message-ID: <55de7d20.5060...@gmail.com> > Content-Type: text/plain; charset=windows-1252; format=flowed > > On 08/26/2015 09:29 PM, Joel Nothman wrote: > > I agree. I suspect this was an unintentional omission, in fact. > > > > Apart from which, sample_weight support in liblinear could be merged > > from https://github.com/scikit-learn/scikit-learn/pull/2784 which is > > dormant, and merely needs some core contributors to show interest in > > merging it... > > > "merely" ;) > > > > ------------------------------ > > Message: 5 > Date: Thu, 27 Aug 2015 15:28:08 +0300 > From: ????? ??????? (Alexey Umnov) <alexe...@yandex.ru> > Subject: Re: [Scikit-learn-general] K-SVD implementation > To: "scikit-learn-general@lists.sourceforge.net" > <scikit-learn-general@lists.sourceforge.net> > Message-ID: <699781440678...@web24h.yandex.ru> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 6 > Date: Thu, 27 Aug 2015 15:44:38 +0300 > From: Andrew Howe <ahow...@gmail.com> > Subject: [Scikit-learn-general] issue with pipeline always giving same > results > To: scikit-learn-general@lists.sourceforge.net > Message-ID: > < > cannyi3rv7zp3k5jqufo3+v4eysccxxfurn15w6brzky02a_...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > I'm working through the tutorial, and also experimenting kind of on my > own. I'm on the text analysis example, and am curious about the relative > merits of analyzing by word frequency, relative frequency, and adjusted > relative frequency. Using the 20 newsgroups data, I've built a set of > pipelines within a cross_validation loop; the important part of the code is > here: > > # get the data > nw = dat.datetime.now() > rndstat = nw.hour*3600+nw.minute*60+nw.second > twenty_train = fetch_20newsgroups(subset='train', categories=categories, > random_state = rndstat, shuffle=True, download_if_missing=False) > twenty_test = fetch_20newsgroups(subset='test', categories=categories, > random_state = rndstat, shuffle=True, download_if_missing=False) > > # first with raw counts > text_clf = Pipeline([('vect', CountVectorizer()), ('clf', > MultinomialNB())]) > text_clf.fit(twenty_train.data,twenty_train.target) > pred = text_clf.predict(twenty_test.data) > test_ccrs[mccnt,0] = sum(pred == > twenty_test.target)/len(twenty_test.target) > > The issue is that everytime I run this, though I've confirmed the data > sampled is different, the value in test_ccrs is *always* the same. Am I > missing something? > > Thanks! > Andrew > > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > J. Andrew Howe, PhD > Editor-in-Chief, European Journal of Mathematical Sciences > Executive Editor, European Journal of Pure and Applied Mathematics > www.andrewhowe.com > http://www.linkedin.com/in/ahowe42 > https://www.researchgate.net/profile/John_Howe12/ > I live to learn, so I can learn to live. - me > <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > > ------------------------------------------------------------------------------ > > > ------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > End of Scikit-learn-general Digest, Vol 67, Issue 44 > **************************************************** >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general