-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Daniel Quinlan writes:
> Sidney Markowitz <[EMAIL PROTECTED]> writes:
> 
> > But I think that my numbers show that 40-bit should be ok at 4 million 
> > and certainly at 3 million.
> 
> I think it's perfectly acceptable to have a small number of collisions
> and that size savings is a much more important factor.

yeah.  Don't forget, I was forwarding this as a datapoint for *multi*-word
token use, which produces way more tokens.

> Bear in mind, not only does a collision have to happen, but it has to
> change the actual result *in the wrong direction* before we actually
> start caring.

The bogofilter/CRM-114 forward was pretty clear that collisions in
multiword token use caused FPs: 'the hash collisions quickly caused
outrageously bad classification mistakes'.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAhbrEQTcbUG5Y7woRAgmtAJwMENiwuj0p5CJnapkFwmmt53DzCQCg6dnk
fqN7PN4MnnUs4rjVk1ZOJcE=
=TkyY
-----END PGP SIGNATURE-----

Reply via email to