Re: [Scikit-learn-general] issue with pipeline always giving same results (Andrew Howe)

Andrew Howe Thu, 27 Aug 2015 06:01:30 -0700

Sorry for the red herring, but I've realized it's not an issue with
Pipeline.  The code below has the same behavior:


nw = dat.datetime.now()
rndstat = nw.hour*3600+nw.minute*60+nw.second
twenty_train = fetch_20newsgroups(subset='train', categories=categories,
random_state = rndstat, shuffle=True, download_if_missing=False)
twenty_test = fetch_20newsgroups(subset='test', categories=categories,
random_state = rndstat, shuffle=True, download_if_missing=False)

cv = CountVectorizer()
X_train = cv.fit_transform(twenty_train.data)
clf = MultinomialNB().fit(X_train,twenty_train.target)
pred = clf.predict(cv.transform(twenty_test.data))
print(sum(pred == twenty_test.target)/len(twenty_test.target))

Andrew

On Thu, Aug 27, 2015 at 3:45 PM, <
scikit-learn-general-requ...@lists.sourceforge.net> wrote:

> Send Scikit-learn-general mailing list submissions to
>         scikit-learn-general@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> or, via email, send a message with subject or body 'help' to
>         scikit-learn-general-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
>         scikit-learn-general-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Scikit-learn-general digest..."
>
>
> Today's Topics:
>
>    1. Tests against reference implementations,  speed regression
>       tests (Andreas Mueller)
>    2. Turning on sample weights for     linear_model.LogisticRegression
>       (Valentin Stolbunov)
>    3. Re: Turning on sample weights for
>       linear_model.LogisticRegression (Joel Nothman)
>    4. Re: Turning on sample weights for
>       linear_model.LogisticRegression (Andy)
>    5. Re: K-SVD implementation (????? ??????? (Alexey Umnov))
>    6. issue with pipeline always giving same    results (Andrew Howe)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 25 Aug 2015 13:06:11 -0400
> From: Andreas Mueller <t3k...@gmail.com>
> Subject: [Scikit-learn-general] Tests against reference
>         implementations,        speed regression tests
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <55dca083.6020...@gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Hey all.
>
> I will soon have some student dev resources and I'm pondering how to
> best use them.
> Apart from the hundreds of issues, one thing I was thinking about adding
> is more tests against reference implementations,
> and having speed regression tests.
>
> For the reference implementations, we could hard-code the results of
> algorithms into the tests. That is done for some
> algorithms, but only very few. It would guard us against "obvious"
> functionality bugs, which still show up from time to time.
>
> For speed regression tests, it has happened that things got slower, in
> particular with innocent looking things like input validation.
> I think it would be good to have some tests that ensure that we don't
> get too much slower.
> I'm not entirely sure how do to that, though.
> I know Vlad put some effort into a continuous benchmarking suite, but I
> think since then there have been several
> efforts to log speed of implementations in a consistent way, and we
> might want to look into these.
>
> Do you think that these are interesting issues to work on, or do you
> think there are more pressing ones?
>
> We still have a lot to do on the API side, though I'm a bit hesitant to
> give that to new devs.
>
> Cheers,
> Andy
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 26 Aug 2015 19:15:53 -0500
> From: Valentin Stolbunov <valentin.stolbu...@gmail.com>
> Subject: [Scikit-learn-general] Turning on sample weights for
>         linear_model.LogisticRegression
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
>         <CAM5iThP3YExbMt8HFXvkRA5uZSY-1p1qRCJrefR0=Kf=
> cyz...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello everyone,
>
> I noticed that two of the three solvers in the logistic regression module
> (newton-cg and lbfgs) accept sample weights, but this feature is hidden
> away from users by not recognizing sample_weight as parameter in .ft().
> Instead, sample_weight is set to ones (line 555 of logistic.py). To the
> best of my knowledge this is because the default solver (liblinear) does
> not support them?
>
> Could we instead allow sample_weight as a parameter (default None) and set
> them to ones only if the chosen solver is liblinear (with appropriate
> documentation notes - similar to the way the L1 penalty is supported only
> by liblinear)?
>
> I realize that SGDClassifier's .fit() accepts sample weights and the loss
> can be set to 'log', however this isn't exactly the same.
>
> What do you think?
>
> Valentin
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 3
> Date: Thu, 27 Aug 2015 11:29:40 +1000
> From: Joel Nothman <joel.noth...@gmail.com>
> Subject: Re: [Scikit-learn-general] Turning on sample weights for
>         linear_model.LogisticRegression
> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
> Message-ID:
>         <CAAkaFLU2=CV8kBBOWJz1-=
> rt+nen56g7k5jfxb-3nygln-o...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I agree. I suspect this was an unintentional omission, in fact.
>
> Apart from which, sample_weight support in liblinear could be merged from
> https://github.com/scikit-learn/scikit-learn/pull/2784 which is dormant,
> and merely needs some core contributors to show interest in merging it...
>
> On 27 August 2015 at 10:15, Valentin Stolbunov <
> valentin.stolbu...@gmail.com
> > wrote:
>
> > Hello everyone,
> >
> > I noticed that two of the three solvers in the logistic regression module
> > (newton-cg and lbfgs) accept sample weights, but this feature is hidden
> > away from users by not recognizing sample_weight as parameter in .ft().
> > Instead, sample_weight is set to ones (line 555 of logistic.py). To the
> > best of my knowledge this is because the default solver (liblinear) does
> > not support them?
> >
> > Could we instead allow sample_weight as a parameter (default None) and
> set
> > them to ones only if the chosen solver is liblinear (with appropriate
> > documentation notes - similar to the way the L1 penalty is supported only
> > by liblinear)?
> >
> > I realize that SGDClassifier's .fit() accepts sample weights and the loss
> > can be set to 'log', however this isn't exactly the same.
> >
> > What do you think?
> >
> > Valentin
> >
> >
> >
> ------------------------------------------------------------------------------
> >
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 4
> Date: Wed, 26 Aug 2015 22:59:44 -0400
> From: Andy <t3k...@gmail.com>
> Subject: Re: [Scikit-learn-general] Turning on sample weights for
>         linear_model.LogisticRegression
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <55de7d20.5060...@gmail.com>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> On 08/26/2015 09:29 PM, Joel Nothman wrote:
> > I agree. I suspect this was an unintentional omission, in fact.
> >
> > Apart from which, sample_weight support in liblinear could be merged
> > from https://github.com/scikit-learn/scikit-learn/pull/2784 which is
> > dormant, and merely needs some core contributors to show interest in
> > merging it...
> >
> "merely" ;)
>
>
>
> ------------------------------
>
> Message: 5
> Date: Thu, 27 Aug 2015 15:28:08 +0300
> From: ????? ??????? (Alexey Umnov)      <alexe...@yandex.ru>
> Subject: Re: [Scikit-learn-general] K-SVD implementation
> To: "scikit-learn-general@lists.sourceforge.net"
>         <scikit-learn-general@lists.sourceforge.net>
> Message-ID: <699781440678...@web24h.yandex.ru>
> Content-Type: text/plain; charset="us-ascii"
>
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 6
> Date: Thu, 27 Aug 2015 15:44:38 +0300
> From: Andrew Howe <ahow...@gmail.com>
> Subject: [Scikit-learn-general] issue with pipeline always giving same
>         results
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
>         <
> cannyi3rv7zp3k5jqufo3+v4eysccxxfurn15w6brzky02a_...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I'm working through the tutorial, and also experimenting kind of on my
> own.  I'm on the text analysis example, and am curious about the relative
> merits of analyzing by word frequency, relative frequency, and adjusted
> relative frequency.  Using the 20 newsgroups data, I've built a set of
> pipelines within a cross_validation loop; the important part of the code is
> here:
>
> # get the data
> nw = dat.datetime.now()
> rndstat = nw.hour*3600+nw.minute*60+nw.second
> twenty_train = fetch_20newsgroups(subset='train', categories=categories,
> random_state = rndstat, shuffle=True, download_if_missing=False)
> twenty_test = fetch_20newsgroups(subset='test', categories=categories,
> random_state = rndstat, shuffle=True, download_if_missing=False)
>
> # first with raw counts
> text_clf = Pipeline([('vect', CountVectorizer()), ('clf',
> MultinomialNB())])
> text_clf.fit(twenty_train.data,twenty_train.target)
> pred = text_clf.predict(twenty_test.data)
> test_ccrs[mccnt,0] = sum(pred ==
> twenty_test.target)/len(twenty_test.target)
>
> The issue is that everytime I run this, though I've confirmed the data
> sampled is different, the value in test_ccrs is *always* the same.  Am I
> missing something?
>
> Thanks!
> Andrew
>
> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
> J. Andrew Howe, PhD
> Editor-in-Chief, European Journal of Mathematical Sciences
> Executive Editor, European Journal of Pure and Applied Mathematics
> www.andrewhowe.com
> http://www.linkedin.com/in/ahowe42
> https://www.researchgate.net/profile/John_Howe12/
> I live to learn, so I can learn to live. - me
> <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
>
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 67, Issue 44
> ****************************************************
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] issue with pipeline always giving same results (Andrew Howe)

Reply via email to