Re: Certain spam not parsed by spamd!
I've recently implemented relaycountry and seen 90%+ improvement in our ability to trap spam but there is one email which seems capable of avoiding getting parsed by spamd. All other messages get the x-spam headers added successfully but this one for some reason completely slips through without any such headers. It carries a trojan too, which is odd because clamav should pick that up. clamd is updated daily. The headers of the strange spam are: Return-path: banach...@royalkoas.com Envelope-to: u...@host.co.uk Delivery-date: Fri, 24 Jul 2009 11:12:38 +0800 Received: from [190.144.0.42] (helo=CWXNQKBTZ) by s1.host.info with esmtp (Exim 4.67) (envelope-from banach...@royalkoas.com) id 1MUBD2-0002wE-2i for u...@host.co.uk; Fri, 24 Jul 2009 11:12:38 +0800 Received: from 190.144.0.42 by red3.redtong.com; Thu, 23 Jul 2009 22:24:55 -0500 Message-ID: 000d01ca0c0e$50804720$6400a...@banacha55 From: u...@host.co.uk To: u...@host.co.uk Subject: You have received an eCard Date: Thu, 23 Jul 2009 22:24:55 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary==_NextPart_000_0006_01CA0C0E.50804720 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 The above email contained a .zip file. This was not random, as I've received three similar emails this morning and none of them have x-spam headers all other emails are fine. It apparently was never seen by SpamAssassin, if there were no X-Spam-* -headers. How you call SpamAssassin? Any whitelisting there, do you call SpamAssassin for your own mail? It seems the sender address is same as receiver address. Whitelisted somehow, and maybe not inspected by SpamAssassin?
Re: whitelist_from questions
Le 24/07/2009 04:09, MySQL Student a écrit : I don't doubt that if we removed a substantial amount of them that SA would do what's right, but there doesn't seem to be any scientific way to do that successfully. Can't you just look at the scores that the whitelisted messages are getting and see whether any would be close to being considered as spam without the -100 of the whitelist? [How best to do that depends on how you've integrated spamassassin into your mail setup, but grepping through logs ought to do it in most cases]. And perhaps a few carefully-chosen negative-scoring rules (for words or phrases common to your customer's business) might be a far more effective way of handling the rest. Is there a way to script that for the 1000 or so entries, to see which have SPF records? There are no doubt lots of ways, but how about: egrep 'whitelist_from[^_]' local.cf | awk '{FS=@; print $2 TXT;}' | xargs dig | grep v=spf1 John. -- -- Over 4000 webcams from ski resorts around the world - www.snoweye.com -- Translate your technical documents and web pages- www.tradoc.fr
Re: Certain spam not parsed by spamd!
On Jul 23, 2009, at 22:45, snowweb pe...@snowweb.co.uk wrote: there is one email which seems capable of avoiding getting parsed by spamd. Is the email with attachment over 250KB?
Re: Certain spam not parsed by spamd!
LuKreme wrote: On Jul 23, 2009, at 22:45, snowweb pe...@snowweb.co.uk wrote: Is the email with attachment over 250KB? No, it's just 74Kb -- View this message in context: http://www.nabble.com/Certain-spam-not-parsed-by-spamd%21-tp24638560p24640402.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Certain spam not parsed by spamd!
Jari Fredriksson wrote: It apparently was never seen by SpamAssassin, if there were no X-Spam-* -headers. How you call SpamAssassin? Any whitelisting there, do you call SpamAssassin for your own mail? It seems the sender address is same as receiver address. Whitelisted somehow, and maybe not inspected by SpamAssassin? It's called by this part of my exim.conf # Spam Assassin spamcheck_director: driver = accept condition = ${if and { \ {!def:h_X-Spam-Flag:} \ {!eq {$received_protocol}{spam-scanned}} \ {!eq {$received_protocol}{local}} \ {exists{/home/${lookup{$domain}lsearch{/etc/virtual/domainowners}{$value}}/.spamassassin/user_prefs}} \ {{$message_size}{100k}} \ } {1}{0}} retry_use_local_part transport = spamcheck no_verify I guess if their was whitelisting, that would have to be in the exim.conf too, but i can't see any explicit whitelisting there. -- View this message in context: http://www.nabble.com/Certain-spam-not-parsed-by-spamd%21-tp24638560p24640480.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Certain spam not parsed by spamd!
Jari Fredriksson wrote: The headers of the strange spam are: Return-path: banach...@royalkoas.com Envelope-to: u...@host.co.uk Delivery-date: Fri, 24 Jul 2009 11:12:38 +0800 Received: from [190.144.0.42] (helo=CWXNQKBTZ) by s1.host.info with esmtp (Exim 4.67) (envelope-from banach...@royalkoas.com) id 1MUBD2-0002wE-2i for u...@host.co.uk; Fri, 24 Jul 2009 11:12:38 +0800 Received: from 190.144.0.42 by red3.redtong.com; Thu, 23 Jul 2009 22:24:55 -0500 Message-ID: 000d01ca0c0e$50804720$6400a...@banacha55 From: u...@host.co.uk To: u...@host.co.uk Subject: You have received an eCard Date: Thu, 23 Jul 2009 22:24:55 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary==_NextPart_000_0006_01CA0C0E.50804720 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 The above email contained a .zip file. It apparently was never seen by SpamAssassin, if there were no X-Spam-* -headers. How you call SpamAssassin? Any whitelisting there, do you call SpamAssassin for your own mail? It seems the sender address is same as receiver address. Whitelisted somehow, and maybe not inspected by SpamAssassin? This is the SPF record on the recipient domain: v=spf1 a mx ip4:216.108.227.20 ?all I'm thinking to change it to -all as I'm fairly sure that everyone is using our mailserver to send mail on the domain. Do you think that might solve it? Also, you're correct that the From: header is the same as the recipient (obviously spoofed), but the envelope is from an external sender and also the first Received: line acknowledges that it was received from an external server and email address. Which line does it check the SPF record of, just the spoofable From: or one of the others? -- View this message in context: http://www.nabble.com/Certain-spam-not-parsed-by-spamd%21-tp24638560p24640671.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Malware list Q
Hiya Do any of you guys use the following list. http://malware.hiperlinks.com.br/cgi/submit?action=list_sa If so, may I ask how do you find the results, and is it worth adding to spamassassin. Kind Regards Brent Clark
Re: Malware list Q
looks interesting! I've asked the developer if he's interested in us testing it out On Fri, Jul 24, 2009 at 10:34, Brent Clarkbrentgclarkl...@gmail.com wrote: Hiya Do any of you guys use the following list. http://malware.hiperlinks.com.br/cgi/submit?action=list_sa If so, may I ask how do you find the results, and is it worth adding to spamassassin. Kind Regards Brent Clark -- --j.
Re: Malware list Q
On Fri, Jul 24, 2009 at 10:34, Brent Clarkbrentgclarkl...@gmail.com wrote: Do any of you guys use the following list. http://malware.hiperlinks.com.br/cgi/submit?action=list_sa If so, may I ask how do you find the results, and is it worth adding to spamassassin. Hi, We use malwarepatrol with our central squid web caches. Not sure about effectiveness of it though, really should dig out some stats for it perhaps! John. -- --- John Horne, University of Plymouth, UK Tel: +44 (0)1752 587287 E-mail: john.ho...@plymouth.ac.uk Fax: +44 (0)1752 587001
Re: Malware list Q
On 7/24/2009 11:34 AM, Brent Clark wrote: Hiya Do any of you guys use the following list. http://malware.hiperlinks.com.br/cgi/submit?action=list_sa If so, may I ask how do you find the results, and is it worth adding to spamassassin. I've been using the ClamAV sigs for quite a while. They don't hit a lot in mail payload, but when, they're well worth it.
Re: whitelist_from questions
Le 24/07/2009 04:09, MySQL Student a écrit : I don't doubt that if we removed a substantial amount of them that SA would do what's right, but there doesn't seem to be any scientific way to do that successfully. Can't you just look at the scores that the whitelisted messages are getting and see whether any would be close to being considered as spam without the -100 of the whitelist? [How best to do that depends on how you've integrated spamassassin into your mail setup, but grepping through logs ought to do it in most cases]. And perhaps a few carefully-chosen negative-scoring rules (for words or phrases common to your customer's business) might be a far more effective way of handling the rest. Is there a way to script that for the 1000 or so entries, to see which have SPF records? There are no doubt lots of ways, but how about: On 24.07.09 08:58, John Wilcock wrote: egrep 'whitelist_from[^_]' local.cf | awk '{FS=@; print $2 TXT;}' | xargs dig | grep v=spf1 well - addresses can contain wildcards - more addresses can be at one line - SPF records should be checked before TXT the first issue is hard to avoid by scripting, others can be solved. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. 42.7 percent of all statistics are made up on the spot.
Re: Certain spam not parsed by spamd!
LuKreme wrote: On Jul 23, 2009, at 22:45, snowweb pe...@snowweb.co.uk wrote: Is the email with attachment over 250KB? On 24.07.09 01:13, snowweb wrote: No, it's just 74Kb the email or the attachment? In config you posted e-mails over 100K aren't checked. 74K attachment results in 100K mail. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. 99 percent of lawyers give the rest a bad name.
Re: Certain spam not parsed by spamd!
Jari Fredriksson wrote: It apparently was never seen by SpamAssassin, if there were no X-Spam-* -headers. How you call SpamAssassin? Any whitelisting there, do you call SpamAssassin for your own mail? It seems the sender address is same as receiver address. Whitelisted somehow, and maybe not inspected by SpamAssassin? On 24.07.09 01:31, snowweb wrote: This is the SPF record on the recipient domain: v=spf1 a mx ip4:216.108.227.20 ?all I'm thinking to change it to -all as I'm fairly sure that everyone is using our mailserver to send mail on the domain. Do you think that might solve it? no, the SPF doesn't affect the fact if mail gets scanned. At least not in your example unless I've missed anything. Also, you're correct that the From: header is the same as the recipient (obviously spoofed), but the envelope is from an external sender and also the first Received: line acknowledges that it was received from an external server and email address. Which line does it check the SPF record of, just the spoofable From: or one of the others? the SPF is checked on yout internal network boundary, but only if spamassassin gets the file to scan. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. I feel like I'm diagonally parked in a parallel universe.
Re: Certain spam not parsed by spamd!
Matus UHLAR - fantomas wrote: LuKreme wrote: On Jul 23, 2009, at 22:45, snowweb pe...@snowweb.co.uk wrote: Is the email with attachment over 250KB? On 24.07.09 01:13, snowweb wrote: No, it's just 74Kb the email or the attachment? In config you posted e-mails over 100K aren't checked. 74K attachment results in 100K mail. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. 99 percent of lawyers give the rest a bad name. Well done... Solved by Matus! Thanks buddy. -- View this message in context: http://www.nabble.com/Certain-spam-not-parsed-by-spamd%21-tp24638560p24642832.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
spamassassin not scanning messages
I just installed on qmailtoaster with spamassassin and clamv. Looking at email full header I realize that the messages not been scanned. Something is not working as it should. Can someone help me. Thanks in advance
Re: Certain spam not parsed by spamd!
Jari Fredriksson wrote: The headers of the strange spam are: Return-path: banach...@royalkoas.com Envelope-to: u...@host.co.uk Delivery-date: Fri, 24 Jul 2009 11:12:38 +0800 Received: from [190.144.0.42] (helo=CWXNQKBTZ) by s1.host.info with esmtp (Exim 4.67) (envelope-from banach...@royalkoas.com) id 1MUBD2-0002wE-2i for u...@host.co.uk; Fri, 24 Jul 2009 11:12:38 +0800 Received: from 190.144.0.42 by red3.redtong.com; Thu, 23 Jul 2009 22:24:55 -0500 Message-ID: 000d01ca0c0e$50804720$6400a...@banacha55 From: u...@host.co.uk To: u...@host.co.uk Subject: You have received an eCard Date: Thu, 23 Jul 2009 22:24:55 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary==_NextPart_000_0006_01CA0C0E.50804720 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 The above email contained a .zip file. It apparently was never seen by SpamAssassin, if there were no X-Spam-* -headers. How you call SpamAssassin? Any whitelisting there, do you call SpamAssassin for your own mail? It seems the sender address is same as receiver address. Whitelisted somehow, and maybe not inspected by SpamAssassin? This is the SPF record on the recipient domain: v=spf1 a mx ip4:216.108.227.20 ?all I'm thinking to change it to -all as I'm fairly sure that everyone is using our mailserver to send mail on the domain. Do you think that might solve it? Also, you're correct that the From: header is the same as the recipient (obviously spoofed), but the envelope is from an external sender and also the first Received: line acknowledges that it was received from an external server and email address. Which line does it check the SPF record of, just the spoofable From: or one of the others? 'It', the SpamAssassin does not check anything. It is not called by your system. I do not know why that is so. There is no marks for SpamAssasin in the headers, so it was never called.
United-MAP spam flood
Hello Folks, Did you also get many spams from United-MAP, a dynamic company with rapid development, with a united team of professionals in its core.? :) Or maybe this new spam flood is only Poland targeted? Here are a few spam samples: http://pastebin.com/m178f4a58 http://pastebin.com/m6d07f79d http://pastebin.com/m477546b9 My best regards, Pawel
DNSWL
I get: * -1.0 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low * trust and I read the dnswl.org home page, but I don't understand why this rule would get a -1.0 for a LOW trust rating. It just seems awkward to me, I think LOW trust would dictate a positive rating, say a 1.0 or higher. Any insights? Wes
Re: DNSWL
They only White-list. So if the trust is higher, then the score should be lower... Cheers, twofers írta: I get: * -1.0 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low * trust and I read the dnswl.org home page, but I don't understand why this rule would get a -1.0 for a LOW trust rating. It just seems awkward to me, I think LOW trust would dictate a positive rating, say a 1.0 or higher. Any insights? Wes
Re: DNSWL
On Fri, July 24, 2009 14:07, twofers wrote: I get: * -1.0 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low * trust and I read the dnswl.org home page, but I don't understand why this rule would get a -1.0 for a LOW trust rating. you can override this score local if you want, but imho its better to report the sender ip if its spam It just seems awkward to me, I think LOW trust would dictate a positive rating, say a 1.0 or higher. Any insights? or stats against corpus ? :=) -- xpoint
Re: Certain spam not parsed by spamd!
LuKreme wrote: On Jul 23, 2009, at 22:45, snowweb pe...@snowweb.co.uk wrote: Is the email with attachment over 250KB? On 24.07.09 01:13, snowweb wrote: No, it's just 74Kb Matus UHLAR - fantomas wrote: the email or the attachment? In config you posted e-mails over 100K aren't checked. 74K attachment results in 100K mail. On 24.07.09 04:20, snowweb wrote: Well done... Solved by Matus! Thanks buddy. if your system has enough of free resources, you can scan even bigger e-mail. spamc limit is currently 512K -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Nothing is fool-proof to a talented fool.
Re: whitelist_from questions
Actually there should be one or two more whitelists, so one can e.g., score -100 one's friends -10 one's schools -1 one's country
Re: whitelist_from questions
jida...@jidanni.org writes: Actually there should be one or two more whitelists, so one can e.g., score -100 one's friends -10 one's schools -1 one's country I have long wanted to be able to whitelist_from f...@bar -3.0 to have per-entry scores. Obviously though I haven't wanted it enough to write the code. pgp3aDYuXaIPC.pgp Description: PGP signature
Re: whitelist_from questions
On Fri, 24 Jul 2009, Greg Troxel wrote: I have long wanted to be able to whitelist_from f...@bar -3.0 to have per-entry scores. Obviously though I haven't wanted it enough to write the code. How does this not work? header WL_FROM_FOO From =~ /\bf...@bar/i score WL_FROM_FOO -3.00 -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If healthcare is a Right means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- 13 days since a sunspot last seen - EPA blames CO2 emissions
Re: whitelist_from questions
John Hardin jhar...@impsec.org writes: On Fri, 24 Jul 2009, Greg Troxel wrote: I have long wanted to be able to whitelist_from f...@bar -3.0 to have per-entry scores. Obviously though I haven't wanted it enough to write the code. How does this not work? header WL_FROM_FOO From =~ /\bf...@bar/i score WL_FROM_FOO -3.00 It does, but doesn't it require allowing user rules? Plus, it's two lines for each whitelist_from_score entry, with a magic regexp. pgpMetL9X7grj.pgp Description: PGP signature
Re: whitelist_from questions
On Fri, 24 Jul 2009, Greg Troxel wrote: John Hardin jhar...@impsec.org writes: On Fri, 24 Jul 2009, Greg Troxel wrote: I have long wanted to be able to whitelist_from f...@bar -3.0 to have per-entry scores. Obviously though I haven't wanted it enough to write the code. How does this not work? header WL_FROM_FOO From =~ /\bf...@bar/i score WL_FROM_FOO -3.00 It does, but doesn't it require allowing user rules? Yeah, but that requirement wasn't specified. Sorry. Plus, it's two lines for each whitelist_from_score entry, with a magic regexp. Yeah, the whitelist_* do a lot of magic in the background. This would get hard to manage for more than a few entries. I was assuming you only wanted to do a few. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If healthcare is a Right means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- 13 days since a sunspot last seen - EPA blames CO2 emissions
Re: whitelist_from questions
On Fri, 2009-07-24 at 11:57 -0700, John Hardin wrote: On Fri, 24 Jul 2009, Greg Troxel wrote: I have long wanted to be able to whitelist_from f...@bar -3.0 to have per-entry scores. Obviously though I haven't wanted it enough to write the code. First of all -- I don't like the term whitelist in this context. What's being discussed is a small, almost marginal adjustment to the score. Using whitelist for anything that low (even -1 has been mentioned previously) is just watering down the definition. That said, something like the above might be useful in some cases. Not that I ever felt the need for it, but still. Also, there are custom plugins [1] out there, which provide similar or related functionality -- and even are *much* easier to maintain for *users*, than the user_prefs. See the Addressbook and LDAPfilter plugins. The latter even mentions support for per-domain listings. However, I strongly agree with a note in the Addressbook plugin's description. This doesn't really work for all addresses (unless rcvd or auth constrained, sic!). It is a common spammer pattern to send From forged address A, to Recipient A, B and C at the same domain. Thus, giving negative scores to your family, friends or co-workers is in some cases likely to result in FNs. Anyway, I hope everyone who really needs and uses whitelisting, also has the ShortCircuit plugin enabled. If you deliberately WHITE-list, why waste more cycles on the mail? [1] http://wiki.apache.org/spamassassin/CustomPlugins -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: whitelist_from questions
On Fri, July 24, 2009 20:10, John Hardin wrote: On Fri, 24 Jul 2009, Greg Troxel wrote: I have long wanted to be able to whitelist_from f...@bar -3.0 to have per-entry scores. Obviously though I haven't wanted it enough to write the code. How does this not work? header WL_FROM_FOO From =~ /\bf...@bar/i score WL_FROM_FOO -3.00 another example: whitelist_from_spf f...@bar -3.0 only give -3.0 if spf pass or whitelist_from_dkim f...@bar -3.0 same for dkim or both whitelist_from_auth f...@bar -3.0 i still wonder why so many dont care more about forged senders :( good such bad plugin does not exists, its bad enough that whitelist_from does -- xpoint
How can I view bayes score for individual words?
I tried to view the files bayes.toks, bayes.journal, bayes.seen and autowhitelist but they just look jibberish when opened in a unix editor. What's the solution to this? I was hoping to be able to tweak some of the scores and add certain words etc. -- View this message in context: http://www.nabble.com/How-can-I-view-bayes-score-for-individual-words--tp24653720p24653720.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: DNSWL
twofers wrote: I get: * -1.0 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low * trust and I read the dnswl.org home page, but I don't understand why this rule would get a -1.0 for a LOW trust rating. It just seems awkward to me, I think LOW trust would dictate a positive rating, say a 1.0 or higher. Any insights? Low doesn't mean it's a likely spam source, it means it's a nonspam source, but with less confidence than the higher tiers. Regardless, this test performed reasonably well in the 3.2 mass-checks OVERALLSPAM% HAM% S/ORANK SCORE NAME 0.092 0.0058 0.24420.023 0.66 -1.00 RCVD_IN_DNSWL_LOW (from http://svn.apache.org/repos/asf/spamassassin/branches/3.2/rules/STATISTICS-set3.txt) With a S/O of 0.023, that means that 97.7% of the email this rule hit was nonspam, and 2.3% was spam. With that S/O, I don't think -1 is an out-of-order score, particularly since the test set was spam biased (63.8% of the test email was spam)
Re: How can I view bayes score for individual words?
snowweb wrote: I tried to view the files bayes.toks, bayes.journal, bayes.seen and autowhitelist but they just look jibberish when opened in a unix editor. What's the solution to this? The bayes database stores truncated SHA1 hashes of the words, it is not reversible back to human readable text using the database alone. This is done for performance reasons (fixed size tokens = faster random access), but has a side benefit of preventing your bayes DB from containing words that may imply things about your confidential emails. However, if you run a message through spamassassin with -D bayes=9 it should dump all the tokens in the message with their score from the bayes DB. I was hoping to be able to tweak some of the scores and add certain words etc. That would be a very misguided thing to do. Bayes is a statistical system, and statistics work better with real measurements, not biased numbers based on your own guesswork. The reality of things is that a learning statistics system based on email is really gathering statistics based on human behavior. Human behavior is *way* more complex than you think it is. :-) If you really want to tweak the score of some words, create static rules for them. Leave bayes to doing its own exacting measurements.
Re: How can I view bayes score for individual words?
On Fri, 24 Jul 2009 17:38:39 -0700 (PDT) snowweb pe...@snowweb.co.uk wrote: I tried to view the files bayes.toks, bayes.journal, bayes.seen and autowhitelist but they just look jibberish when opened in a unix editor. What's the solution to this? I was hoping to be able to tweak some of the scores and add certain words etc. It uses a Berkeley Database. If you want to tweak anything the easiest way is to use sa-learn --backup to convert to a text file and sa-learn --restore to read the file back in. However the database works with token hashes, not the actual tokens, it's the last 5 bytes of the sha1 hash of the token.