On Mon, Mar 14, 2005 at 04:46:06PM +0000, Paul Reilly wrote: > > Is it possible to dump the bayesian tokens in > human readable format still? It was quite useful > but since 3.0.x they seen to be base64 encoded or > some other way encoded. I couldn't see any sa-learn > option, or any FAQ entry about it.
To expand a bit on what Matt said. In general, no it's not possible to dump the bayesian tokens in a readable (well they are readable, it's just hard to read them :)) format, unless you do a little work yourself. It is possible to dump them by making use the the given plugin hooks that allow you to fetch the "raw" token value and match it to the SHA1 hash for the token. FYI, the values you can see, via a --dump or --backup, are actually hex representations of the binary SHA1 data. The primary motivation for the change was indeed speed, and let me tell you it was a lot. Privacy never really entered into the picture, although I suppose it is a nice side effect, except that with a plugin it's pretty easy to map the token values. I know, the next thing you're going to ask is how do I write a plugin to do this, well, that is an exercise to the reader. I did a proof of concept back when I added the plugin hooks, and may have sent it to the mailing list so check the archives. For all the juicy details check out the comments in this bug: http://bugzilla.spamassassin.org/show_bug.cgi?id=3331 Of course, I have to ask, how do you find the data "quite useful?" I asked on the mailing list several times for examples of how people might use that data and nothing came along that was very compelling, at least enough for me to pursue a better more integrated fix. Michael
pgpmsJsR0OPCd.pgp
Description: PGP signature
