[From Concept To Reality, L.L.C. <[EMAIL PROTECTED]>]
> I've been building up a rather large (in my opinion) SPAM
> collection and have removed all the person information out of them
> and invalidated all the e-mail addresses (changed 'em to
> @domain.com) in the headers.
>
> Just how many e-mails are needed to make a good size collection for
> testing purposes and further development? 1000? 5000? 25000?
> 
> And once I hit that number, where should I deposit them? Is there a
> web site, or should I just host it myself?

Collections of all sizes are of use to someone -- it depends on their
resources and goals.

The SB project doesn't collect spam, because our goal is
individualized training, and each person's ham/spam mix differs (and
changes over time).  Initial development was done on canned databases,
though.

Here's a good list of spam collections:

    http://www.paulgraham.com/spamarchives.html

If you intend to make a long-term commitment to this, you could ask
Paul Graham to include a link to you there.  Else you can find an
archive there to which you can contribute.
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev

Reply via email to