Hi, Ted Zlatanov <[EMAIL PROTECTED]> wrote:
> would you consider merging your code with the Gnus spam.el system? Sorry for the late reply. I was a bit busy and wanted to reread the "Spam Package Introduction" Info node to avoid making an uninformed answer. Having just read it, I'm not sure the scheme implemented in spam.el fits well with the way I want to work with Spambayes. One of the reasons is that I do *not* want to train the filter on every article. To have an efficient Spambayes filter, experiments made by Spambayes users and developers have shown that it is often a good idea to only train the filter on its mistakes (after an initial training). [ Personally, I don't even train the filter on every mistake, because there are articles that I believe are too well-crafted spam: I fear I'll pollute my Spambayes database if I train on these articles. These are articles that mostly contain words that are part of my usual ham. ] Therefore, I wouldn't want the "spam and ham processors" to do anything when I exit a group. I want to carefully select which articles get to train the filter. As a consequence, the paragraph in the "Spam Package Introduction" node that reads: ,---- | If the spam filter failed to mark a spam message, you can mark it | yourself, so that the message is processed as spam when you exit the | group: | | `M-d' | `M s x' | `S x' | Mark current article as spam, showing it with the `$' mark | (`gnus-summary-mark-as-spam'). | | Similarly, you can unmark an article if it has been erroneously marked | as spam. *Note Setting Marks::. `---- would be misleading to users, because marking articles as ham or spam wouldn't make any difference in the absence of any action from the "spam and ham processors". There's another thing in spam.el that doesn't seem to work the way I want: ,---- | The second thing that the Spam package does when you exit a group is | to move ham articles out of spam groups, and spam articles out of ham | groups. Ham in a spam group is moved to the group specified by the | variable `gnus-ham-process-destinations', or the group parameter | `ham-process-destination'. Spam in a ham group is moved to the group | specified by the variable `gnus-spam-process-destinations', or the | group parameter `spam-process-destination'. `---- This means that if, e.g., I had a ham that was classified as spam and I mark it as ham before leaving the group, then the article will be moved to the group specified by `gnus-ham-process-destinations'---regardless of the specific article. I prefer my way of doing that: if an article is misclassifed, there are two possibilities: - either I don't want to train the filter on the article (for instance, because several similar articles were misclassifed in a row and I already trained the filter on one of them). In this case, I usually simply use 'B m' to move the article manually to the right group. There is another possiblity that works well in the example I gave in the parenthesis: since the filter was trained on a similar article, you can expect it to classify the article correctly next time; therfore, you can call '(flo-spambayes-gnus-classify t)' in order to: 1. rerun the classifier on the article; 2. respool it afterwards (this is because of the "t" argument). The respooled article will eventually end up in the right group according to `nnmail-split-methods'. - or I use 'B s' (resp. 'B h') to tell the filter "Dude, this was spam!" (resp. "Dude, this was ham!"), i.e., I train the filter on the article. These key sequences, which are mapped to lambda expressions evaluating '(flo-spambayes-gnus-refile-as-spam t)' and '(flo-spambayes-gnus-refile-as-ham t)' respectively, do two things: 1. train the filter on the article; 2. respool it afterwards (this is because of the "t" argument). As a consequence, the article will (most probably) end up in the right group, according to `nnmail-split-methods'. [ I say "most probably", because it might be that the filter was so badly trained in the past that it still couldn't classify the article correctly the second time. This never happened to me, but I think it's possible. ] The key point here is that in either case, if the article was, e.g., something for the ding mailing-list wrongly classified as spam when the incoming mail was split, it will end up directly in my "ding" group after the corrective actions I described, not in whichever group specified by `gnus-ham-process-destinations'. Lastly, there's another thing I'm not sure about when reading the Info node: ,---- | The Spam package divides Gnus groups into three categories: ham | groups, spam groups, and unclassified groups. `---- What exactly do unclassified groups contain? With Spambayes, when you run an article through the classifer, it gets a spam score (between 0 and 1) and a category depending on the spam score. There are three categories: ham, unsure and spam (from lowest score to highest score). "unsure" means the article got a score that is not low enough to be confident it's ham, and not high enough to be confident it's spam. But it surely doesn't mean the article wasn't _classifed_ (i.e., it did go through the classifier---whose output was "unsure"). That's why I'm not sure the "unclassified group" mentioned in the above sentence is well-suited for articles marked as "unsure" by Spambayes. To rephrase it differently: you said a spam backend must provide a function that tells whether a message is ham or spam. But this is not suited to Spambayes, since there are 3 possible outcomes from the filter by default, not 2 (unless you tweak it to make the "unsure" score range vanish, but that would be silly in most cases). Regards, -- Florent _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev