Mark Phillips wrote:
I agree with you. Since I found the links to the Bayesian filter in
python, I think it might not be that hard to implement in workflow.

1. User submits document for review
2. Document is scanned, and is sent to either the spam or the review
state (ham). The spam state holds spam that is close to the threshold.
All 100% spam is automatically rejected.
3. Reviewer has 2 worklists - spam and ham
3a. Reviewer can reject, publish, or spam the ham - spam goes to the
filter to train it, published material goes to the filter to train it.
3b. Reviewer can reject or publish the spam - rejected spam goes to the
filter to be trained. Published items go to the filter to train it.
A rough cut off the top of my head. Any suggestions?

You're missing a state: ham can be send to the spam state where it gets used to train the spam filter. But that same spam state is also the state where the suspected spam ends up in.

I'd keep the regular review queue in place (and the spam state) but add a spam-to-review state.

Option: submitted stuff just ends up in the regular review queue. No messing with two possible state destinations for one transition (though probably doable). No possibly expensive processing during the user's request (which might timeout or hose the server). Instead have a script or view (triggered from a cronjob?) go through the review queue once every few minutes to check whether it should transition a few items to the possible-spam state.

BTW, what the heck does "overgehaalde dekzwabber" mean in English? I
couldn't find a Dutch web translation service that would translate it.
:-)

:-) hard to translate. Something like "idiotic broom-used-to-sweep-a-ship's-deck". It loses a bit of expessiveness when translated :-)

Reinout



--
Reinout van Rees  - Programmer at http://zestsoftware.nl/
http://vanrees.org/weblog/          reinout @ vanrees.org
"Information overload isn't the problem. If it was, you'd
walk into a library and die." (David Allen)


_______________________________________________
Product-Developers mailing list
[email protected]
http://lists.plone.org/mailman/listinfo/product-developers

Reply via email to