-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Quinlan writes: > Sidney Markowitz <[EMAIL PROTECTED]> writes: > > > But I think that my numbers show that 40-bit should be ok at 4 million > > and certainly at 3 million. > > I think it's perfectly acceptable to have a small number of collisions > and that size savings is a much more important factor. yeah. Don't forget, I was forwarding this as a datapoint for *multi*-word token use, which produces way more tokens. > Bear in mind, not only does a collision have to happen, but it has to > change the actual result *in the wrong direction* before we actually > start caring. The bogofilter/CRM-114 forward was pretty clear that collisions in multiword token use caused FPs: 'the hash collisions quickly caused outrageously bad classification mistakes'. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFAhbrEQTcbUG5Y7woRAgmtAJwMENiwuj0p5CJnapkFwmmt53DzCQCg6dnk fqN7PN4MnnUs4rjVk1ZOJcE= =TkyY -----END PGP SIGNATURE-----
