> I have a graduation project and decide to use a spambayes technique to > classification arabic spam email by the python environment . > My questions how i can to run it with the python programming language ,what > is the packages that must to use it with python . > How i can to linking this technique with another preprocessing technique . > How it can work with arabic language . > How can I pass messages to the training and testing process .
You will need to download the SpamBayes source distribution so you get the test environment and are able to easily make changes to the code. I recently created a Git repository at GitHub: https://github.com/smontanaro/spambayes You can just clone that repository. If you make changes to the code you would like incorporated into SpamBayes, you can create a pull request when you are ready. Once you've downloaded the code you should familiarize yourself with the tokenizer code in spambayes/spambayes/tokenizer.py. (You can ignore everything in the website directory.) The tokenizer file contains many detailed comments about what did and didn't work when SpamBayes was originally developed. Arabic text will be full of non-ASCII characters. Search for "highbit" and "8bit" to decide how you want to handle that. I'm pretty sure you will have to modify that code. Also, if Arabic text uses something other than an ASCII space char to separate words you will have to fix that. It's unlikely you will need to modify the classifier, at least initially, but it will pay to read through that heavily commented code as well. The output of the tokenizer step is the input to the classifier. Knowing how to set its parameters will help when testing. Familiarize yourself with spambayes/TESTING.txt to learn how to test your changes. Finally, you will need fairly large collections of spam and ham emails. The TESTING file should describe the requirements there. Skip Montanaro _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev