Re: [spambayes-dev] Anybody still have a test ham/spam database?

2018-07-12 Thread Skip Montanaro
> > I have various corpora still but they're on drives that I would have to > dig out of storage. If the existing offers fall through let me know and I > can pull them out. > I think I'm good, at least for now. Between what I had already and what Matt had, I should have enough. Of course, if I nee

Re: [spambayes-dev] Anybody still have a test ham/spam database?

2018-07-12 Thread Skip Montanaro
> > Your current task is much clearer: > > > ... I am looking to insure that a Py3 port of SpamBayes > > works the same as the Py2 code. > > For _that_ purpose, you can take any pile of email at all; split it into > "ham" and "spam" at random, and "just" ensure you get the same results from > the o

Re: [spambayes-dev] Anybody still have a test ham/spam database?

2018-07-12 Thread Tony Meyer
Hi Skip, I have various corpora still but they're on drives that I would have to dig out of storage. If the existing offers fall through let me know and I can pull them out. I have a very large amount of mail that I can't share but can definitely run tests against, that's quite varied so should m

Re: [spambayes-dev] Anybody still have a test ham/spam database?

2018-07-11 Thread Erik M. Brown via spambayes-dev
: Tim Peters Cc: spambayes-dev@python.org Subject: Re: [spambayes-dev] Anybody still have a test ham/spam database? > Sorry, Skip - I don't. And I was surprised just now to see that we apparently never checked test data files into the Sourceforge source tree either! > > But it sh

Re: [spambayes-dev] Anybody still have a test ham/spam database?

2018-07-10 Thread Tim Peters
[Skip Montanaro] > > Sure, but constructing a suitable ham/spam corpus > from scratch is a non-trivial task, as you no doubt > remember. Ah - but we had a much subtler task then: trying to construct a classifier that was _useful_. Your current task is much clearer: > ... I am looking to insure

Re: [spambayes-dev] Anybody still have a test ham/spam database?

2018-07-10 Thread Skip Montanaro
> Sorry, Skip - I don't. And I was surprised just now to see that we > apparently never checked test data files into the Sourceforge source tree > either! > > But it shouldn't matter. SB learns pretty quickly, and it would be better to > use _current_ examples of spam and ham anyway (their cha

Re: [spambayes-dev] Anybody still have a test ham/spam database?

2018-07-10 Thread Skip Montanaro
> I have both the Ham and Spam directories that I think were Tim's > original data and also a corpus of (as of today) 38,509 spam emails. > I would be happy to give you any of that. Though you'll have to > explain to me how to use this newfangled Google Drive. Thanks, Matt. Sharing email sent. Let

Re: [spambayes-dev] Anybody still have a test ham/spam database?

2018-07-10 Thread Tim Peters
Sorry, Skip - I don't. And I was surprised just now to see that we apparently never checked test data files into the Sourceforge source tree either! But it shouldn't matter. SB learns pretty quickly, and it would be better to use _current_ examples of spam and ham anyway (their characteristics c

Re: [spambayes-dev] Anybody still have a test ham/spam database?

2018-07-10 Thread Matthew Dixon Cowles
Skip, > Does anyone still have their setup? If so, let me know. I can > provide a writable folder on my Google Drive you can upload to. I have both the Ham and Spam directories that I think were Tim's original data and also a corpus of (as of today) 38,509 spam emails. I would be happy to give yo

[spambayes-dev] Anybody still have a test ham/spam database?

2018-07-10 Thread Skip Montanaro
I'm going to take a crack at porting SpamBayes to Python 3. For that I should probably have some test data. My goal is to replicate existing behavior, not improve the breed. I long ago deleted what I used BITD. Does anyone still have their setup? If so, let me know. I can provide a writable folder