currently in 3.0.0 we don't support "sa-learn --dump" containing readable token data anymore... there's a patch in http://bugzilla.spamassassin.org/show_bug.cgi?id=3331 to restore this capability. However, it slows down bayes scanning and learning quite a bit recording that data as well.
What do people think? is this functionality being removed a serious issue?
Well, I think it's fairly important to have the ability available via some means, mostly for debugging a bad bayes DB, and to get an understanding of how bayes works and look for bugs in it.
However, it seems a waste to slow down general bayes operations to get it.
Reading the bug it looks like the patch gives users the choice between the two formats, and that looks like the best idea of all.
This way those who want to monitor their bayes DB and can accept the performance hit can use the extended database, and those who want the speed and size gains of the hashed database can do that too.
Really, that means the choice of accepting or not accepting the patch is largely a factor of how much it adds to the developer maintenance. Clearly it's a win-win from the user side, since users can choose which mode they want. The big question is will it hinder further bayes development by increasing code complexity?
I think it's a very worthwhile patch, provided it doesn't significantly slow down the sa-dev team when it comes to making bayes fixes and enhancements.
