>
> I have various corpora still but they're on drives that I would have to
> dig out of storage. If the existing offers fall through let me know and I
> can pull them out.
>
I think I'm good, at least for now. Between what I had already and what
Matt had, I should have enough. Of course, if I nee
>
> Your current task is much clearer:
>
> > ... I am looking to insure that a Py3 port of SpamBayes
> > works the same as the Py2 code.
>
> For _that_ purpose, you can take any pile of email at all; split it into
> "ham" and "spam" at random, and "just" ensure you get the same results from
> the o
Hi Skip,
I have various corpora still but they're on drives that I would have to dig
out of storage. If the existing offers fall through let me know and I can
pull them out.
I have a very large amount of mail that I can't share but can definitely
run tests against, that's quite varied so should m
: Tim Peters
Cc: spambayes-dev@python.org
Subject: Re: [spambayes-dev] Anybody still have a test ham/spam database?
> Sorry, Skip - I don't. And I was surprised just now to see that we
apparently never checked test data files into the Sourceforge source tree
either!
>
> But it sh
[Skip Montanaro]
> > Sure, but constructing a suitable ham/spam corpus
> from scratch is a non-trivial task, as you no doubt
> remember.
Ah - but we had a much subtler task then: trying to construct a classifier
that was _useful_. Your current task is much clearer:
> ... I am looking to insure
> Sorry, Skip - I don't. And I was surprised just now to see that we
> apparently never checked test data files into the Sourceforge source tree
> either!
>
> But it shouldn't matter. SB learns pretty quickly, and it would be better to
> use _current_ examples of spam and ham anyway (their cha
> I have both the Ham and Spam directories that I think were Tim's
> original data and also a corpus of (as of today) 38,509 spam emails.
> I would be happy to give you any of that. Though you'll have to
> explain to me how to use this newfangled Google Drive.
Thanks, Matt. Sharing email sent. Let
Sorry, Skip - I don't. And I was surprised just now to see that we
apparently never checked test data files into the Sourceforge source tree
either!
But it shouldn't matter. SB learns pretty quickly, and it would be better
to use _current_ examples of spam and ham anyway (their characteristics
c
Skip,
> Does anyone still have their setup? If so, let me know. I can
> provide a writable folder on my Google Drive you can upload to.
I have both the Ham and Spam directories that I think were Tim's
original data and also a corpus of (as of today) 38,509 spam emails.
I would be happy to give yo
I'm going to take a crack at porting SpamBayes to Python 3. For that I
should probably have some test data. My goal is to replicate existing
behavior, not improve the breed.
I long ago deleted what I used BITD. Does anyone still have their
setup? If so, let me know. I can provide a writable folder
10 matches
Mail list logo