On Mon, Mar 14, 2005 at 04:46:06PM +0000, Paul Reilly wrote:
> 
> Is it possible to dump the bayesian tokens in
> human readable format still? It was quite useful
> but since 3.0.x they seen to be base64 encoded or
> some other way encoded. I couldn't see any sa-learn
> option, or any FAQ entry about it.

To expand a bit on what Matt said.

In general, no it's not possible to dump the bayesian tokens in a
readable (well they are readable, it's just hard to read them :))
format, unless you do a little work yourself.  It is possible to dump
them by making use the the given plugin hooks that allow you to fetch
the "raw" token value and match it to the SHA1 hash for the token.

FYI, the values you can see, via a --dump or --backup, are actually
hex representations of the binary SHA1 data.

The primary motivation for the change was indeed speed, and let me
tell you it was a lot.  Privacy never really entered into the picture,
although I suppose it is a nice side effect, except that with a plugin
it's pretty easy to map the token values.

I know, the next thing you're going to ask is how do I write a plugin
to do this, well, that is an exercise to the reader.  I did a proof of
concept back when I added the plugin hooks, and may have sent it to
the mailing list so check the archives.  For all the juicy details
check out the comments in this bug:
http://bugzilla.spamassassin.org/show_bug.cgi?id=3331

Of course, I have to ask, how do you find the data "quite useful?"  I
asked on the mailing list several times for examples of how people
might use that data and nothing came along that was very compelling,
at least enough for me to pursue a better more integrated fix.

Michael

Attachment: pgpmsJsR0OPCd.pgp
Description: PGP signature

Reply via email to