Re: A different approach to scoring spamassassin hits

2007-07-08 Thread Tom Allison
On Jul 2, 2007, at 10:26 AM, Justin Mason wrote: However as you note, you may be able to use the *absence* of a rule hit as a ham token. Also, you could add some informational rules matching common innocent traits of nonspam mail, for the purpose of serving as good ham rules in this

Re: A different approach to scoring spamassassin hits

2007-07-01 Thread Tom Allison
On Jun 30, 2007, at 11:55 PM, Loren Wilton wrote: Unfortunately I'm not on the SpamAssassin Bayes modules -- I wrote my own Bayes Engine because I wanted to do that and then thought about including the Rules results from SpamAssassin. I don't know where this might be going, but it

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 1:20 AM, Marc Perkel wrote: Tom Allison wrote: For some years now there has been a lot of effective spam filtering using statistical approaches with variations on Bayesian theory, some of these are inverse Chi Square modifications to Niave Bayes or even CRM114

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 4:46 AM, John Andersen wrote: On Friday 29 June 2007, Tom Allison wrote: It would be the Bayes process that determines the effective number of points you assign for each HIT based on what it's learned about it from you. So the tags of: ADVANCE_FEE_1, ADVANCE_FEE_2 would

config clarification

2007-06-30 Thread Tom Allison
For configuration options listed in perldoc Mail::SpamAssassin can I put the settings into local.cf? Mail::SpamAssassin::Conf says yes, but it doesn't say it applies to args for Mail::SpamAssassin-new(); And what does 'save_pattern_hits' get me that I otherwise wouldn't have?

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 8:07 AM, Loren Wilton wrote: You have a bit of a chicken and egg problem at the start. Until some learning takes place in the system. Two possibilities. The rules exist and have scores. Assume they are maintained, for whatever reason. 1.Until Bayes has enough

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 2:55 PM, Bart Schaefer wrote: On 6/29/07, Tom Allison [EMAIL PROTECTED] wrote: The thought I had, and have been working on for a while, is changing how the scoring is done. Rather than making Bayes a part of the scoring process, make the scoring process a part

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 6:29 PM, Loren Wilton wrote: And after typing all this I'm thinking you might be right. But part of this approach is to run all these rules in YES/NO fashion and see if the probability is significant. For example: If I tested for SOME_TEST=NO and found it was

Re: user_prefs

2007-06-29 Thread Tom Allison
OK, thanks. I'm not using spamassassin or spamd. I'm using Mail::SpamAssassin in a perl script. What does '-x' do for Mail::SpamAssassin? On Jun 28, 2007, at 9:23 PM, Duane Hill wrote: On Thu, 28 Jun 2007, Tom Allison wrote: cannot write to /var/www/.spamassassin/user_prefs: No such file

A different approach to scoring spamassassin hits

2007-06-29 Thread Tom Allison
For some years now there has been a lot of effective spam filtering using statistical approaches with variations on Bayesian theory, some of these are inverse Chi Square modifications to Niave Bayes or even CRM114 and other languages have been developed to improve the scoring of

Re: exposing rules

2007-06-28 Thread Tom Allison
OliverScott wrote: Assuming that you have managed to get SA to add headers to messages which is thinks are spam, and are looking to add a header to ALL messages so you can see what rules are firing on your HAM, then you can do the following. This may not be what you are after, but may be of

user_prefs

2007-06-28 Thread Tom Allison
cannot write to /var/www/.spamassassin/user_prefs: No such file or directory failed to create default user preference file /var/www/.spamassassin/user_prefs I never ever ever ever want to try to create a user_prefs file. How do I make sure I never do this?

Re: Why doesn't Spamassassin bounce spam?

2007-06-27 Thread Tom Allison
On a related note-- I have a mailing list with a single user who randomly bounces spam to the mailing list because his filters tag it as spam. His poorly performing spam filter (no clue if it's SA or not) is affecting 100's or 1000's of users who generally want him hurt. Additionally, he

exposing rules

2007-06-25 Thread Tom Allison
Is there a way to put into a header (or something) all the rules that here HIT in a message?

Re: exposing rules

2007-06-25 Thread Tom Allison
On Jun 25, 2007, at 7:42 PM, Matt Kettler wrote: Tom Allison wrote: Is there a way to put into a header (or something) all the rules that here HIT in a message? By default this will be in X-Spam-Status. If they're not, can you let us know how you're calling spamassassin? Some tools

utf8

2007-06-25 Thread Tom Allison
I'm not sure how/if this is done. But I was wondering if anyone has looked into decoding all the charsets into utf8 for bayesian analysis. octets is not readily visible to the user the way it's done today.

Re: Why doesn't Spamassassin bounce spam?

2007-06-16 Thread Tom Allison
John Rudd wrote: Matt wrote: ExiScan has been part of exim for quite a while now. We reject spam at SMTP with exim and SA when it scores above 15. We have not, as of yet, had a FP near that high. The spams are logged in such a way it makes it easy to create a report including the SA report,

Re: A New Approach: Find the Ham

2007-02-10 Thread Tom Allison
CHALLENGE All filtering software is written to score for results that equal spam - catch the bad SOLUTION Make filtering software score for results that equal ham - uncatch the good. Your thoughts? How can this method spend less time and energy? Aren't you going to build a mirrored

headers

2007-01-29 Thread Tom Allison
what's that setting I need to get insanely long headers about what scored what and with whom and why?

Re: SQL Bayes Store -- initialization of database

2007-01-27 Thread Tom Allison
Tom Allison wrote: I'm trying to initialize a database for Bayes from perl (DIY). I took the advice of others and removed much of this approach and just decide to try running Mail::SpamAssassin as is and let it create the database entry for the specified user. It simply will not create

mass testing

2007-01-27 Thread Tom Allison
I have what I'll refer to as a pre-alpha version of my project which I originally dubbed Plan9 because I thought it might be a bad idea. ( Plan 9 from Outer Space) But it seems to work. I'm looking for some hints/tips on how to test this stuff for the variety of errors I will likely

Re: Web user interface

2007-01-26 Thread Tom Allison
Johnson, S wrote: Has anyone written a web interface for end users in which they could go through quarantined spam and release/whitelist on their own? Not yet. But that is something I'm actually trying to do. What I'm working on would fall far short of the available features in

Re: bayes sql initialization

2007-01-25 Thread Tom Allison
Bob McClure Jr wrote: On Wed, Jan 24, 2007 at 09:01:58PM -0500, Tom Allison wrote: Am I correct in understanding that I have to run sa-learn for every user who is going to have a bayes token store? If you are running per-user Bayes (nothing else makes much sense, IMHO), yes, but only

SQL Bayes Store -- initialization of database

2007-01-25 Thread Tom Allison
I'm trying to initialize a database for Bayes from perl (DIY). Using Test::More as a start I tried: can_ok('Mail::SpamAssassin::BayesStore', ('tie_db_readonly')); my $to = 'tom'; my $spamtest = Mail::SpamAssassin-new( {username = $to, debug='all'} ); isa_ok($spamtest, 'Mail::SpamAssassin');

user_prefs

2007-01-24 Thread Tom Allison
I would like to suppress user_prefs and stick with a single site-wide user_prefs. I don't think I actually need to do much for this. But I get a lot of warnings. I also want to be able to switch users for the bayes list (SQL) but continue using the same prefs for all users. If I call

bayes store PgSQL error

2007-01-24 Thread Tom Allison
[1174] dbg: bayes: using username: tallison [1174] dbg: bayes: unable to connect to database: missing = after bayes:192.168.0.100:5432 in connection info string bayes_store_module Mail::SpamAssassin::BayesStore::PgSQL bayes_sql_dsn DBI:Pg:bayes:192.168.0.100:5432 bayes_sql_username

Re: bayes store PgSQL error

2007-01-24 Thread Tom Allison
NEVER MIND!! I'm not paying attention to the ruddy docs! Tom Allison wrote: [1174] dbg: bayes: using username: tallison [1174] dbg: bayes: unable to connect to database: missing = after bayes:192.168.0.100:5432 in connection info string bayes_store_module Mail::SpamAssassin::BayesStore

perldocs Mail::SpamAssassin

2007-01-22 Thread Tom Allison
I'm actually trying to write a perl script to use Mail::SpamAssassin rather than the spamassassin or spamc scripts that are already available. So far, much of the website seems geared towards the end-use of spamassassin. Besides cpan is there someplace that can help me navigate through

bayes 101

2007-01-21 Thread Tom Allison
I just did an install of spampd on my debian box and am working my way through the different configurations... First, I found that /var/cache/spampd/awl had the wrong permissions so I changed that and I stopped getting errors. Interestingly, I have AWL disabled. But I guess it likes to

Re: bayes 101

2007-01-21 Thread Tom Allison
Tom Allison wrote: I just did an install of spampd on my debian box and am working my way through the different configurations... First, I found that /var/cache/spampd/awl had the wrong permissions so I changed that and I stopped getting errors. Interestingly, I have AWL disabled. But I

Re: bayes 101

2007-01-21 Thread Tom Allison
Matt Kettler wrote: Tom Allison wrote: Tom Allison wrote: I just did an install of spampd on my debian box and am working my way through the different configurations... First, I found that /var/cache/spampd/awl had the wrong permissions so I changed that and I stopped getting errors

debug output

2007-01-21 Thread Tom Allison
When using Mail::SpamAssassin with new( {debug = 'all'} ) or similar How do you capture the output from the debug to syslog or other logging file? I can run it via a command line but if I run Mail::SpamAssassin under a daemon/fork process similar to Net::Server::PreForkSimple I can't seem

Re: debug output

2007-01-21 Thread Tom Allison
Theo Van Dinter wrote: On Sun, Jan 21, 2007 at 07:17:14PM -0500, Tom Allison wrote: When using Mail::SpamAssassin with new( {debug = 'all'} ) or similar How do you capture the output from the debug to syslog or other logging file? Take a look at Mail::SpamAssassin::Logger. ie: Mail

Re: spampd

2007-01-19 Thread Tom Allison
Theo Van Dinter wrote: (I assume this message was also supposed to goto the users list since there was nothing private in it, so cc'ing there.) On Fri, Jan 19, 2007 at 03:33:27PM -0500, [EMAIL PROTECTED] wrote: The thought I was struggling with is that in the MTA the content_filter is told who

Re: would SA benefit from port to Java

2006-11-25 Thread Tom Allison
Nix wrote: On 20 Nov 2006, Giampaolo Tomassoni spake thusly: That's not even mentioning the metaprogramming and higher-order programming techniques that we use extensively in SpamAssassin -- those are basically *just not possible* in C/C++. ;) Ops. What's this stuff? Let me know. eval and

postgres database

2006-11-24 Thread Tom Allison
I was reading through the man pages about the use of a database for the storage of bayesian tokens. Is this a list that is global to the mail server, or something that is distinct for each user of that mail server? In other words -- will I have the exact same bayesian history in my token

Re: postgres database

2006-11-24 Thread Tom Allison
Rick Macdougall wrote: Tom Allison wrote: I was reading through the man pages about the use of a database for the storage of bayesian tokens. Is this a list that is global to the mail server, or something that is distinct for each user of that mail server? In other words -- will I have