> baseline vs. x-lookup_ip:

[. . .]

>     false negative percentages
>         2.228  1.671  won    -25.00%
>         3.343  3.064  won     -8.35%
>         5.292  4.735  won    -10.53%
>         4.735  4.457  won     -5.87%
>         2.786  2.507  won    -10.01%
> 
>     won   5 times
>     tied  0 times
>     lost  0 times

I'm glad to see that. That's the sort of improvement that I see with
that code, but I think it's the first time that anyone else has
reproduced it.

Still, as people have pointed out before, there's at least one
potential problem in the code. That's that data from DNS isn't
necessarily stable. If someone needed to un-train their database on a
message a day or two later, the tokens generated might easily not be
the same as they were when the message was first trained on. That
could send a token's count below zero.

That doesn't affect me in practice, but it would surely affect
someone if the code were used widely. Fixing it in general would
require some rather elaborate persistence mechanism, I think.

Regards,
Matt

_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev

Reply via email to