Is there anyone here with experience working with the MoinMoin code base? I think using SpamBayes to deflect spam instead of the current BadContent/LocalBadContent approach would be useful. I wrote a couple messages to the moin-users mailing list, but received no responses. (In scanning the archive I don't see my message. Must have disappeared in a black hole.) In case someone's interested, here's what I wrote in my second post:
We all know wikis get spammed. I'm not up-to-speed on the latest versions of MoinMoin, but I think the concept used at least through the 1.3 series (the use of BadContent and LocalBadContent pages) is fundamentally flawed since it relies on the users to manually update "bad" words. You're always trying to catch up with the spammers. Instead, let me suggest that you incorporate a SpamBayes-based classifier into MoinMoin. I did this recently for a couple other websites I manage (Mojam and Musi-Cal - not wikis). It worked marvelously there. I now reject 100% of the spam submissions and also catch submission mistakes by good users that I would never have caught before. Here's how I envision it working. Whenever a form submission happens the new page is scored against the current SpamBayes database. If it scores as possible or probable spam, it is automatically reverted back to the last revision that scores as okay, and the full URL for that revision is mailed to all people in AdminGroup. An admin reviews that URL. If it's okay, the URL is added to the HamPages page. If not, it's added to the SpamPages page (both suitably protected for AdminGroup write only and not themselves checked by SpamBayes). Whenever those pages are saved the entire database is retrained from scratch. This should not generally be a problem, as there will probably only be a few pages in the database, so retraining should be quick. It should also be a relatively rare occurrence. If the suspect page was actually ham, after retraining, score it again. It should score as ham now. If so, just revert to it. If not, add it to the HamPages page a second time. I'm not entirely sure how to handle new pages which are spam, but I think you should be able to automatically DeletePage them, then revive them later if they turn out to be good. This all said, I can help from the SpamBayes side of things (write the tokenizer, suggest some synthetic tokens that might help improve the discrimination of ham and spam), but I'm not familiar with the MoinMoin code base, certainly not the latest versions. It's unlikely that I could implement it quickly on that side of things. If someone familiar with MoinMoin's code base would like to team up with me on this, let me know. Together we should be able to knock this off very quickly. Skip _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev