Re: more efficent big scoring

2008-01-22 Thread Jim Maul

Justin Mason wrote:

John D. Hardin writes:

On Tue, 22 Jan 2008, George Georgalis wrote:


On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote:


Neither am I. Another thing to consider is the fraction of defined
rules that actually hit and affect the score is rather small. The
greatest optimization would be to not test REs you know will fail;  
but how do you do *that*?

thanks for all the followups on my inquiry. I'm glad the topic is/was
considered and it looks like there is some room for development, but
I now realize it is not as simple as I thought it might have been.
In answer to above question, maybe the tests need their own scoring?
eg fast tests and with big spam scores get a higher test score than
slow tests with low spam scores.

maybe if there was some way to establish a hierachy at startup
which groups rule processing into nodes. some nodes finish
quickly, some have dependencies, some are negative, etc.

Loren mentioned to me in a private email: common subexpressions.

It would be theoretically possible to analyze all the rules in a given
set (e.g. body rules) to extract common subexpressions and develop a
processing/pruning tree based on that. You'd probably gain some
performance scanning messages, but at the cost of how much
startup/compiling time?


I experimented with this concept in my sa-compile work, but I could
achieve any speedup on real-world mixed spam/ham datasets.

Feel free to give it a try though ;)

--j.




You do mean *couldn't* achieve any speedup, correct?

-Jim



Re: [guinevere-discuss] Lint errors in 3.4

2007-12-18 Thread Jim Maul

Clay Davis wrote:

I've see several people write this.  Can someone point me to some debate
I can review?  It seems to me that if you set the autolearn threshold
fairly high and keep any eye on your bayes scoring, it would be a good
thing.

Thanks,
Clay


Joe Zitnik [EMAIL PROTECTED] 12/18/2007 6:07 AM 



  NEVER use autolearn for Bayes.  Autolearn = most evil.





I never had a problem with autolearn.  I've been using it for years.  Of 
course, i altered the autolearn thresholds.


-Jim



Re: spamassassin not starting - new install

2007-09-18 Thread Jim Maul

Michael Martinell wrote:
My SpamAssassin is able to start – no idea why, whoever it does not 
appear to know what thresholds to use:


 


*Received:* (qmail 25671 invoked by uid 1010); 18 Sep 2007 11:59:35 -0500
*Received:* from 64.233.182.188 by mail (envelope-from 
[EMAIL PROTECTED] 
http://mail.dakotasioux.com/src/compose.php?send_to=michael.martinell%40gmail.com, 
uid 1008) with qmail-scanner-1.25-st-qms

 (clamdscan: 0.91.2/4015. spamassassin: 3.2.3. perlscan: 1.25-st-qms.
 Clear:RC:0(64.233.182.188):SA:0(?/?):.
 Processed in 3.097918 secs); 18 Sep 2007 16:59:35 -
*X-Spam-Status:* No, hits=? required=?



Read:

http://qmail-scanner.sourceforge.net/FAQ.php

Particularly 19.


 


The hits and required fields both have ? in them.

 


My local.cf file specifies 5.0

 


Are you sure you are using the right local.cf and there are no errors in 
it?  Have you run spamassassin -D --lint and/or spamassassin --lint to 
verify?





Do I need to put this somewhere else as well?

 


No, but when using qmail-scanner, some settings are set in 
qmail-scanner-queue.pl instead of in local.cf (like subject rewriting).





I also noticed that when I start spamassassin I see a couple of errors 
that it apparently passes by:


[26559] info: rules: meta test FM__TIMES_2 has dependency 
'FH_HOST_EQ_D_D_D_D' with a zero score


[26559] info: rules: meta test FM_SEX_HOST has dependency 
'FH_HOST_EQ_D_D_D_D' with a zero score





run with --lint and fix the problems.

-Jim



Re: R: And interesting way to detect spambots

2007-08-28 Thread Jim Maul

John D. Hardin wrote:

On Tue, 28 Aug 2007, Giampaolo Tomassoni wrote:


-Messaggio originale-
Da: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]

Marc Perkel writes:

Who finds this concept interesting?

SPAM-L.  This is OT for this list.

Right, Justin, but I see that threads about general anti-spam
techniques are tolerated in this list.


For a lot of people that toleration is wearing thin.



Especially after marcs website...wow.



Re: Detecting short-TTL domains?

2007-08-10 Thread Jim Maul

Stream Service || Mark Scholten wrote:
For so far I know it isn't possible to have a TTL that is to low (if I 
may believe the RFC files). It is also impossible to have to many 
A-records. With both facts in mind I would suggest that you find an 
other method off detecting SPAM.




Most SA rules look for spam signs, not RFC violations.  Now whether or 
not these are good spam signs I do not know...


-Jim


Re: Reject spam from my own domain

2007-07-31 Thread Jim Maul

NetComrade wrote:

We have whitelisted our domain, but now we have spam coming from users that
claim they're in our domain.

What's the best way to fight it?



You REALLY dont want to whitelist your own domain.  Your seeing why 
right now.  Use SPF?  or perhaps a whitelist rule thats less prone to 
forgery?  Whitelist_from_spf or something similar?


-Jim



Re: How would you provide a 554 rejection notice for spam?

2007-07-30 Thread Jim Maul

Matus UHLAR - fantomas wrote:

On 30.07.07 13:25, Spamassassin List wrote:
Any idea for qmail? 


if you excuse a big of irony, I'd say: drop it. There are many better
MTA's than qmail. There's imho much less worse solutions...


According to who, you?

He asked for a solution for qmail.  If you do not know, it would be 
better to just not respond than to suggest he swap out his whole setup.


Thanks anyway.


Re: Now its zip attachments ^^

2007-07-23 Thread Jim Maul

John Rudd wrote:

Matus UHLAR - fantomas wrote:


On 22.07.07 18:47, John Rudd wrote:
As I've said for years: we should just ban attachments.  They're not 
really useful for anything that can't be done a better way.  Which 
only leaves them being useful for attacks of one form or another.


some people just want, some just need attachments.


some people just want -- yup, no disagreement there.  No matter how 
many alternatives you give them, some people just want the ease and 
convenience of attachments.



some just need -- no, I can't agree there.  I have yet to come across 
ANY situation where a person _NEEDED_ attachments.  As I said above, 
there's nothing that can be done with attachments that you can't do 
another way.





Of course these things COULD be done another way.  But not always as 
easily or as quickly as with attachments.  Can you recommend a quick and 
easy replacement to attachments when my boss wants me to send him an 
excel file he needs for a meeting with an auditor?


1. FTP?  Easy for me to setup and upload the file to the server.  But 
now my boss has to open an ftp client (yes you can use a browser but 
does he know this?) He doesnt even know what ftp is..and now he needs to 
use a username and password just to get this file I could have easily 
emailed him?  Too much work on his part.


2. Put it up on our company intranet?  This is somewhat less work than 
ftp but since it is publicly accessible (inside our organization), there 
would need to be some authentication.  This ALMOST worked for us here 
except for that time when the ceo needed a report sent to him but he was 
not in the building.  He wanted it on his blackberry..hmm..how to get a 
report to a blackberry remotely without email and attachments?


3. ??


Re: not everyone is happy with SA

2007-07-19 Thread Jim Maul

Per Jessen wrote:

http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104STORY=/www/story/07-17-2007/0004626829EDATE=



/Per Jessen, Zürich





Thats retarded.  Might as well say, Uplugging my mail server from the 
internet is the best method because I received 0 spam since I did it!


Challenge response is fundamentally broken.  It can not and should not 
be considered an anti-spam solution.


-Jim



Re: Rulesemporium

2007-07-13 Thread Jim Maul

Theo Van Dinter wrote:

On Fri, Jul 13, 2007 at 10:03:07AM -0700, John D. Hardin wrote:

I'll bring this up again: coral.

Is there some reason pointing everyone at the coral cache of the 
website won't work? Granted, coral is also intended for large files, 
but it is distributed and is almost transparent...


Because coral sucks?

We tried it for sa-update, and kept finding that we'd get timeouts,
or corrupted files, or ...  We ended up dropping it in favor of more
traditional mirrors.



There is also the (small) issue of some sites not having web access to 
ports other than 80 or 443.  For some, :8080 is a no go


-Jim


Re: Overriding Scores

2007-06-11 Thread Jim Maul

[EMAIL PROTECTED] wrote:

Server .116

The email attached has been identified by one of our team as legitimate but 
unfortunately was incorrectly tagged as SPAM.

The email address has been whitelisted to ensure this will not happen again and 
we are currently looking into the reasons why this happened.

No mail has been lost as the quarantined mail folder is continuously checked by 
members of Team Genesis, but please accept our apologies for any inconvenience 
caused.

Your SPAM scanning system; Ullyses is continually being upgraded and refined so 
we anticipate a steadily decreasing number of incidents like this as the system 
learns your personal profile.

If you feel that you are receiving an inappropriate amount of SPAM then can we 
ask you to contact us either by email to: [EMAIL PROTECTED] or call your 
Genesis representative who will be happy to assist.

Please do not reply to this email address as it has been automatically 
generated, but email any queries to: [EMAIL PROTECTED]

Thank you and take care



Can this stop now please?

-Jim


Re: 404 while getting RDJ updates?

2007-06-07 Thread Jim Maul

guenther wrote:

On Thu, 2007-06-07 at 17:45 +0200, Anders Norrbring wrote:

Anyone else getting 404 errors from RDJ lately?


Yes, this topic came up just a few hours ago. Probably a dDOS attack.

Please disable all RDJ till further notice.

  guenther




I would imagine this is related to www.uribl.com and surbl.org  having 
issues as well.  Both are now pointing to 127.0.0.1 in what I would 
assume was an attempt to stop the attack.  Some spammer is pissed off it 
seems...


-Jim



Re: 404 while getting RDJ updates?

2007-06-07 Thread Jim Maul

Chris Santerre wrote:



  -Original Message-
  From: Jim Maul [mailto:[EMAIL PROTECTED]
  Sent: Thursday, June 07, 2007 12:02 PM
  To: users@spamassassin.apache.org
  Subject: Re: 404 while getting RDJ updates?
 
 
  guenther wrote:
   On Thu, 2007-06-07 at 17:45 +0200, Anders Norrbring wrote:
   Anyone else getting 404 errors from RDJ lately?
  
   Yes, this topic came up just a few hours ago. Probably a
  dDOS attack.
  
   Please disable all RDJ till further notice.
  
 guenther
  
  
 
  I would imagine this is related to www.uribl.com and
  surbl.org  having
  issues as well.  Both are now pointing to 127.0.0.1 in what I would
  assume was an attempt to stop the attack.  Some spammer is
  pissed off it
  seems...

Its true, scanners indicate klingon war vessels approaching our sector. 
We've dropped out of warp due to overuse of the dilythium crystals. 
Federation starships have been called in for assistance. Scottie has 
given us more power, but is not sure she will hold together much 
longer.  All the while Ensen Alex won't stop dancing with a half naked 
green lady!




I'd really like to meet your parents

-Jim


Re: bayes autolearn - nonspam threshold

2007-05-23 Thread Jim Maul

Duane Hill wrote:

On Wed, 23 May 2007, Abba Communications wrote:



Since the introduction of SA v3.2.0, bayes_auto_learn_threshold_nonspam
appears to be -1.0.
from Duane


Duane and others

With all sincere and due respect to the DEV's and their excellent hard
work...


I understand it takes a lot of work to maintain and appriciate the 
efforts put forth.



Just how many places should this setting be???


Well, it version 3.1.8 it wasn't anywhere to be found in a configuration 
file. You had to have the setting in place to deviate from 0.1.


So why not remove the setting from 10_default_prefs.cf (a core setting 
file installed from the distribution) and have the code default do what 
it should. Defaults are to handle issues where a setting isn't specified.




The code never changed.  It still defaults to 0.1

Somewhere along the line, someone decided it should be -1.  The only way 
to change this is to:


1. Change the code and release a new version (stupid for such a small 
change)
2. Make the change in a conf file by specifying a new value and 
distribute that change to all using sa-update.


Obviously, the devs chose option 2.  What this in effect does is change 
the default autolearn threshold for the SA installation.  It however 
does not change the default in the code as sa-update cannot update core 
SA code.  It also cannot update the documentation that comes with SA.


I can understand it being in some type of conditional statement in 
code if
the setting does not exist in the local.cf file, yet this is getting 
to be a

tad confusing now...



It is somewhat confusing as if you were to read the documentation, it 
says the default is 0.1.  However, if you were to download SA and 
install it without any modifications, the value that would be used for 
this threshold would be -1.  Being that devs can release conf changes 
which can alter defaults, but they cannot update the documentation in 
this manner, what do you suggest as an alternative?  There really is no 
other way to do it.


So, yes, its confusing, but there really isnt a better way to do it.

-Jim


Re: Spam bounceback attack

2007-04-10 Thread Jim Maul

John D. Hardin wrote:

On Tue, 10 Apr 2007, J. wrote:


I didn't realize that most people are denying smtp connections for
bad addresses. That's great that this is possible. So most of the
people on this list reject connections that are for bad addresses?
That's great. I think that would cut down the spam we get by 90%.
I had no idea this was possible.


That's not *quite* what we're talking about. Sorry if this is a rehash
of what you already know:

Proper behavior is to check addresses *during* the SMTP conversation
with the submitting MTA/MUA, and reject invalid/nonexistent address as
the other guy submits them. If any valid addresses are submitted, the
mail goes through. If no valid addresses are submitted, it is up to
the *other guy* to take some action, such as notifying the sender the
mail couldn't be delivered. The connection itself is not blocked or
rejected, though you could set up a log watcher to detect IPs that
continually submit bad addresses and firewall/tarpit them.

A bulk spam mail tool will likely just ignore the no such address  
rejections, leading to no additional impact on innocent third parties.


Contrast this with having your MTA accept the message for delivery, 
pass the message on down the chain, and then have some later step 
realize the address is invalid and generate a notice to the sender 
address that the message was undeliverable.


You're now generating outbound mail based on a spam you received. This 
is bad.


If the address was forged and nonexistent, your bounce will be 
rejected by the supposed sender's MTA; that's not as bad as actually 
delivering a bounce to a real user, but you're still generating 
pointless traffic to some innocent third party.


Multiply that by the millions of messages in a typical spam run and 
you can get a DDoS against whatever address or domain was forged on 
the spams as the sender address.


Rejecting the addresses during the SMTP conversation doesn't generate 
this extra traffic.


Configuring your MTA to refuse to accept nonexistent addresses is
typically a boolean option in its basic configuration settings, not
something esoteric requiring complex addons. Any MTA that doesn't
support this basic capability is badly broken by current standards.

Some MTAs will also allow you to slow down the SMTP conversation (e.g.  
pause a few seconds before sending responses) if more than a few bad

addresses are submitted, to mitigate against dictionary attacks.




qmail, which i believe the OP was using is one of these badly broken by 
current standards MTAs as you put it.  By default, it accepts ALL mail 
regardless of the validity of the recipient.  It will then generate a 
bounce to the (most likely) forged address when it figures out the 
recipient does not exist.  There are many addons/patches to correct this 
behavior.  I would check (using something other than IE) 
http://qmail.jms1.net for general information and useful patches.  And 
more specifically, http://qmail.jms1.net/patches/validrcptto.cdb.shtml 
which gives you the ability to reject invalid recipients at SMTP time.


-Jim


Re: Is Bayes Dead? Have the spammers won?

2007-03-27 Thread Jim Maul

R Lists06 wrote:



Are you sure of this?  Have you also trained these ham messages to
counter this effect?  Not too long ago we were in the same situation.
I have autolearn enabled but I have adjusted the thresholds to avoid

This is quite possible.  I have heard other stories of people using
things like greylisting and rbls to reject at smtp time that the only
things that eventually made it to SA were so limited that it would
produce odd results for bayes.  From my experience, the more you throw
at bayes, the better it gets.  The more selective you are, the less it
has to work with.

Jim


So are you saying for these purposes that you do not use RBLs or greylisting
or other similar tools that cut down on the obvious cycle consuming garbage?




Correct, i do not use RBLs or greylisting.  However, I have 1 domain, 
approx 100 users and receive only 2k messages/day.  We have one machine 
running qmail/SA/clamav which more than handles this load.  I can afford 
not to use rbls or greylisting - other larger setups may not be able to.


-Jim





Re: Is Bayes Dead? Have the spammers won?

2007-03-23 Thread Jim Maul

Marc Perkel wrote:
Perhaps what I need to do is to get rid of autolearn and write my own 
learning system that strips out the body of messages with images and 
just learns the headers. My problem is that when users get image spam 
they put it in the spam folders and they get learned. But the text in 
the image spam causes ham type text to be learned as spam. That causes 
ham to get higher scores.





Are you sure of this?  Have you also trained these ham messages to 
counter this effect?  Not too long ago we were in the same situation.  I 
have autolearn enabled but I have adjusted the thresholds to avoid 
learning false positives/negatives.  We were getting ham (although 
arguably - they were newsletter type ham) that was hitting BAYES_99.  As 
soon as i started training them as ham the problem went away.  Spam is 
still detected correctly by bayes and these newsletters no longer hit 
bayes_99.


-Jim


Re: Is Bayes Dead? Have the spammers won?

2007-03-23 Thread Jim Maul

Marc Perkel wrote:



Jim Maul wrote:

Marc Perkel wrote:
Perhaps what I need to do is to get rid of autolearn and write my own 
learning system that strips out the body of messages with images and 
just learns the headers. My problem is that when users get image spam 
they put it in the spam folders and they get learned. But the text in 
the image spam causes ham type text to be learned as spam. That 
causes ham to get higher scores.





Are you sure of this?  Have you also trained these ham messages to 
counter this effect?  Not too long ago we were in the same situation.  
I have autolearn enabled but I have adjusted the thresholds to avoid 
learning false positives/negatives.  We were getting ham (although 
arguably - they were newsletter type ham) that was hitting BAYES_99.  
As soon as i started training them as ham the problem went away.  Spam 
is still detected correctly by bayes and these newsletters no longer 
hit bayes_99.


-Jim



What I think my problem might be is that I have done so much work 
prescreening messages with Exim that what's left isn't good stock for 
autolearn. I think what I need is a separate dedicated learner server 
that is selective and smart about what it learns.





This is quite possible.  I have heard other stories of people using 
things like greylisting and rbls to reject at smtp time that the only 
things that eventually made it to SA were so limited that it would 
produce odd results for bayes.  From my experience, the more you throw 
at bayes, the better it gets.  The more selective you are, the less it 
has to work with.


Jim


Re: Bit OT - SA not running on same time as rest of system

2007-03-16 Thread Jim Maul

Matt Kettler wrote:

Chris wrote:
I'm running Mandrake 10.1, in order to make sure my system switched to DST on 
March 11th I downloaded and installed an upgrade to the timezone file. After 
running it I ran 


[EMAIL PROTECTED] ~]# zdump -v /etc/localtime | grep 2007
/etc/localtime  Sun Mar 11 07:59:59 2007 UTC = Sun Mar 11 01:59:59 2007 CST 
isdst=0 gmtoff=-21600
/etc/localtime  Sun Mar 11 08:00:00 2007 UTC = Sun Mar 11 03:00:00 2007 CDT 
isdst=1 gmtoff=-18000
/etc/localtime  Sun Nov  4 06:59:59 2007 UTC = Sun Nov  4 01:59:59 2007 CDT 
isdst=1 gmtoff=-18000
/etc/localtime  Sun Nov  4 07:00:00 2007 UTC = Sun Nov  4 01:00:00 2007 CST 
isdst=0 gmtoff=-21600


Which showed everything was well. On March 11th the system did switch to DST 
as it was supposed to, or some of it did. I had a few issues such as with 
postfix and cronjobs which I just fixed, however, spamassassin still insists 
that its running on CST:


Mar 15 19:24:46 localhost fetchmail[13766]: 1 message for cpollock at 
pop.earthlink.net (9944 octets). 
Mar 15 18:24:47 localhost spamd[15738]: spamd: connection from 
localhost.localdomain [127.0.0.1] at port 43735 
Mar 15 18:24:47 localhost spamd[15738]: spamd: setuid to chris succeeded 
Mar 15 18:24:47 localhost spamd[15738]: spamd: processing message 
[EMAIL PROTECTED] for chris:501 
Mar 15 19:24:51 localhost clamd[21255]: Accepted connection on port 1725, fd 
11 

Is there something such as /var/spool/postfix/etc/localtime that has to be 
changed somewhere in spamassassin? Never had an issue like this in previous 
years with DST. I've stopped and started spamassassin several times with no 
changes.


Perhaps perl's  DateTime::Timezone needs to be rebuilt?


From the looks of it, that package compiles-in the tz database, so it

might need rebuild if tz data changes..




I found that while the OS itself did change over, most of the programs 
running at the time did not.  I had to restart sendmail, syslog, apache, 
crond, mysql, etc.  From what i understand, some programs read the 
timzone info when they start up and do not recognize the underlying 
changes to the OS itself until restarted.  I dont know if this is true 
or not, but it worked for me.


-Jim



Re: AW: AW: how to archive/save mails that are scanned by spamd ???

2007-03-15 Thread Jim Maul

Starckjohann, Ove wrote:

Hi!

What line may i add in 
/etc/mail/spamassassin/local.cf

to archive all mails that are checked by spamd ???

Ove




what makes you think that you could even put something in local.cf that 
would do that?  SA does not archive anything.


-Jim


Re: AW: AW: AW: how to archive/save mails that are scanned by spamd ???

2007-03-15 Thread Jim Maul

Starckjohann, Ove wrote:

I know !

But back to technic: do you know a suitable way, HOW to archive mails that are 
scanned by spamd ?




Perhaps with whatever is calling spamassassin?  I use qmail-scanner. 
There are so many options i couldnt even begin to mention them.


-Jim


Re: AW: AW: AW: AW: how to archive/save mails that are scanned by spamd ???

2007-03-15 Thread Jim Maul

Starckjohann, Ove wrote:

Hi!

I'm not calling spamassassin from the local machine but spamd from a remote machine via 
a closed source-software which acts as spamc.

Ove Starckjohann




This does not matter.  This question has nothing to do with SA.  SA can 
not/should not archive anything.  It is simply a spam detection program. 
 If you want to archive messages, this has to be done some other way. 
The most common way is with whatever program is calling SA (and other 
things such as virus scanners, etc).  What calls your closed 
source-software which acts as spamc?


-Jim


Re: AW: how to archive/save mails that are scanned by spamd ???

2007-03-15 Thread Jim Maul

Starckjohann, Ove wrote:

Hi !

the programm acting as spamc is called NoSpamProxy.
http://www.nospamproxy.com/

And my thoughts were that it may be possible to archive the mail that is 
supplied to spamd.
Because the spamd get's the whole mail, analyzes it and reports back the 
spam-score to the spamc-client.
During analysis through SA the mail is possibly stored a /tmp...*.tmp and also 
*could* (may be) be archived...




Sure, SA *could* archive the mail - if you modify the code to do so. 
But as it is written, there is no way to achieve this.  Does nospamproxy 
have the ability to archive the mail?  I was not able to tell this from 
looking at the website real quick.  Regardless, you are barking up the 
wrong tree, SA is not the place to archive anything.


-Jim


Re: Not Enough Points

2007-03-06 Thread Jim Maul

David Goldsmith wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Here is how this message scored:

X-Spam-DCC: PacNet-SG: iceman11.giac.net 1358; Body=65 Fuz1=65 Fuz2=51
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on iceman11.giac.net
X-Spam-Level: ***
X-Spam-Status: No, score=4.0 required=5.0 tests=BAYES_99,HTML_90_100,
HTML_MESSAGE,MIME_HEADER_CTYPE_ONLY,MIME_HTML_ONLY,PLING_PLING 
autolearn=no
version=3.1.8
X-Spam-Pyzor: Reported 0 times.
X-Spam-Report:
*  0.1 HTML_90_100 BODY: Message is 90% to 100% HTML
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
*  [score: 1.]
*  0.0 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
*  0.0 MIME_HEADER_CTYPE_ONLY 'Content-Type' found without required MIME
*  headers
*  0.3 PLING_PLING Subject has lots of exclamation marks


Here is a URL for the message:

http://members.cox.net/dgoldsmi/spam/lowscore02a.eml

Maybe I just got lucky and was an early recipient of it.  None of the
message hash sites have seen it enough yet to assign points.

Does this message break 5.0 points for anyone?



Yep -

Content analysis details:   (9.8 points, 5.0 required)

 pts rule name  description
 -- 
--
 1.1 HTML_IMAGE_RATIO_04BODY: HTML has a low ratio of text to image 
area

 0.1 HTML_MESSAGE   BODY: HTML included in message
 5.4 BAYES_99   BODY: Bayesian spam probability is 99 to 100%
[score: 1.]
 0.3 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 2.2 MIME_HEADER_CTYPE_ONLY 'Content-Type' found without required MIME 
headers

 0.7 PLING_PLINGSubject has lots of exclamation marks



Still rocking SA 2.64 with incredible results ;)

-Jim


Re: Not Enough Points

2007-03-06 Thread Jim Maul

David Goldsmith wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jim Maul wrote:

David Goldsmith wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Here is how this message scored:

X-Spam-DCC: PacNet-SG: iceman11.giac.net 1358; Body=65 Fuz1=65 Fuz2=51
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on
iceman11.giac.net
X-Spam-Level: ***
X-Spam-Status: No, score=4.0 required=5.0 tests=BAYES_99,HTML_90_100,
HTML_MESSAGE,MIME_HEADER_CTYPE_ONLY,MIME_HTML_ONLY,PLING_PLING
autolearn=no
version=3.1.8
X-Spam-Pyzor: Reported 0 times.
X-Spam-Report:
*  0.1 HTML_90_100 BODY: Message is 90% to 100% HTML
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
*  [score: 1.]
*  0.0 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
*  0.0 MIME_HEADER_CTYPE_ONLY 'Content-Type' found without
required MIME
*  headers
*  0.3 PLING_PLING Subject has lots of exclamation marks


Here is a URL for the message:

http://members.cox.net/dgoldsmi/spam/lowscore02a.eml

Maybe I just got lucky and was an early recipient of it.  None of the
message hash sites have seen it enough yet to assign points.

Does this message break 5.0 points for anyone?


Yep -

Content analysis details:   (9.8 points, 5.0 required)

 pts rule name  description
 --
--
 1.1 HTML_IMAGE_RATIO_04BODY: HTML has a low ratio of text to image
area
 0.1 HTML_MESSAGE   BODY: HTML included in message
 5.4 BAYES_99   BODY: Bayesian spam probability is 99 to 100%
[score: 1.]
 0.3 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 2.2 MIME_HEADER_CTYPE_ONLY 'Content-Type' found without required MIME
headers
 0.7 PLING_PLINGSubject has lots of exclamation marks



Still rocking SA 2.64 with incredible results ;)

-Jim


Ok, so my SA 3.1.8 install with the latest rules via sa-update has:

# grep HTML_IMAGE_RATIO_04 *
20_html_tests.cf:body HTML_IMAGE_RATIO_04
eval:html_image_ratio('0.002','0.004')
20_html_tests.cf:describe HTML_IMAGE_RATIO_04   HTML has a low ratio of
text to image area
50_scores.cf:score HTML_IMAGE_RATIO_04 0.877 0 1.057 0

but it apparently does not trip this.

We both have MIME_HTML_ONLY, MIME_HEADER_CTYPE_ONLY and PLING_PLING but
you have higher scores for all three.

I have HTML_MESSAGE and HTML_90_100 which correlate to your HTML_MESSAGE
rule and score.

You have a higher BAYES_99 score.

Your scores for MIME_HTML_ONLY, MIME_HEADER_CTYPE_ONLY, PLING_PLING and
BAYES_99 -- are they the default values from SA 2.64 or have you
increased them?



I have increased my bayes scores because of the high accuracy of my 
bayes database.  BAYES_99 alone is enough to push spam over my 5.0 
threshold.  All other scores are stock for 2.64.


-Jim






Re: Low Scoring Message

2007-03-02 Thread Jim Maul

David Goldsmith wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Setup:  SA 3.1.8, Pyzor, Razor, DCC, iXhash
Botnet, FuzzyOCR 3.5.1, SARE rules, some misc rules

This message got 0 points.  Does it score over 5 for anyone?

http://members.cox.net/dgoldsmi/spam/lowscore01.txt



Content analysis details:   (8.6 points, 5.0 required)

 pts rule name  description
 -- 
--

 0.1 HTML_LINK_CLICK_HERE   BODY: HTML link text says click here
 0.1 HTML_60_70 BODY: Message is 60% to 70% HTML
 0.1 HTML_MESSAGE   BODY: HTML included in message
 0.9 RAZOR2_CF_RANGE_11_50  BODY: Razor2 gives confidence between 11 and 50
[cf:  33]
 5.4 BAYES_99   BODY: Bayesian spam probability is 99 to 100%
[score: 0.9992]
 0.3 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 1.6 LINK_TO_NO_SCHEME  BODY: Contains link without http:// prefix
 0.1 CLICK_BELOWAsks you to click below

Sure does.

-Jim


Re: Low Scoring Message

2007-03-02 Thread Jim Maul

Giampaolo Tomassoni wrote:

From: Jim Maul [mailto:[EMAIL PROTECTED]

Jim Maul wrote:

David Goldsmith wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Setup:SA 3.1.8, Pyzor, Razor, DCC, iXhash
Botnet, FuzzyOCR 3.5.1, SARE rules, some misc rules

This message got 0 points.  Does it score over 5 for anyone?

http://members.cox.net/dgoldsmi/spam/lowscore01.txt


Content analysis details:   (8.6 points, 5.0 required)

 pts rule name  description
 -- 
--

 0.1 HTML_LINK_CLICK_HERE   BODY: HTML link text says click here
 0.1 HTML_60_70 BODY: Message is 60% to 70% HTML
 0.1 HTML_MESSAGE   BODY: HTML included in message
 0.9 RAZOR2_CF_RANGE_11_50  BODY: Razor2 gives confidence 

between 11 and 50

[cf:  33]
 5.4 BAYES_99   BODY: Bayesian spam probability is 

99 to 100%

[score: 0.9992]
 0.3 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 1.6 LINK_TO_NO_SCHEME  BODY: Contains link without http:// prefix
 0.1 CLICK_BELOWAsks you to click below


Jim, Where did you get the LINK_TO_NO_SCHEME and CLICK_BELOW rules?




They are both from the stock 2.64 rulesets.

20_phrases.cf:body __CLICK_BELOW 
/click\s.{0,30}(?:here|below)/is

20_phrases.cf:meta CLICK_BELOW  (__CLICK_BELOW  !CLICK_BELOW_CAPS)
20_phrases.cf:describe CLICK_BELOW  Asks you to click below


20_html_tests.cf:rawbody LINK_TO_NO_SCHEME  /\s+href=[']?www\./i
20_html_tests.cf:describe LINK_TO_NO_SCHEME Contains link without 
http:// prefix



I just cant see upgrading to 3.anything with 2.64 working so well.

-Jim


Re: [ot-ish] fuzzyocr still being developed?

2007-02-21 Thread Jim Maul

snowcrash+spamassassin wrote:

following the numerous questions on list, i've gathered that fuzzyocr
is rather popular -- we use it, too.

i've not noticed recent bug-fixing, src dev (~ 1 month), or comments
here, from the dev.

just wondering -- is the proj still alive? dev vacation, maybe? or,
has the proj been subsumed _into_ SA when i wasn't looking?

thanks for any input/update.




I think hes just busy.  AFAIK it is still being worked on.

-Jim



Re: Bayes resolution gettin weaker

2007-02-12 Thread Jim Maul

Jack Gostl wrote:
Well... I'm convinced. I turned off autolearn a week ago, and things 
have never been smoother. Its a shame really, that's a nice feature, but 
for some reason it waters down the Bayes resolution until its almost 
useless.




Most likely because the autolearn thresholds are too generous.  The 
possibility to autolearn spam as ham and/or ham as spam is too great.  I 
have been running with autolearn enabled, my thresholds set to:


bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 12.0

without any problems for almost 3 years now.  My bayes database has 
never been better.  I think too many people have problems with it 
because of the defaults and instead of trying to figure out how to make 
it work better, they just turn it off and call it broken.


-Jim



- Original Message - From: Jack Gostl [EMAIL PROTECTED]
To: Anthony Peacock [EMAIL PROTECTED]; SpamAssassin 
users@spamassassin.apache.org

Sent: Monday, February 05, 2007 7:06 AM
Subject: Re: Bayes resolution gettin weaker




- Original Message - From: Anthony Peacock 
[EMAIL PROTECTED]

To: SpamAssassin users@spamassassin.apache.org
Sent: Monday, February 05, 2007 3:56 AM
Subject: Re: Bayes resolution gettin weaker



Hi,

Jack Gostl wrote:
I've been watching this for awhile, and there is now a pattern to 
what I'm seeing.


I'm running a configuration with multiple users sharing a bayes 
files. This is an interim move to facilitate the spamassassin 
upgrades, and like many interim moves its been going on for a long 
time.


When I first build the bayes files from my personal folders and my 
spam archives, things were great. 99.8% of the spam caught or 
better. Then, usually after a week or so, the number starts to 
drop. Right now, its down to 97%, in another day or two it will be 
down below 95%. With the amount of spam we receive, that is a lot 
of missed junk mail.


So I blow away my bayes* files, rebuild, and I'm back up to darn 
near 100% caught. For about a week. Then the deterioration begins 
again.


Has anyone else encountered this? Is this an artifact of too many 
users sharing a spam file?


Also I retrain each night, feeding any missed spams plus any 
new hams received back through sa-learn. I can't see how that 
makes it worse, but who knows.



Do you have autolearn enabled?


Uh... yes? You are suggesting that I turn it off? I had always 
assumed that if the Bayes learned something as ham that it 
shouldn't, sa-learn was smart enough to undo it.


Change the thresholds for auto learning.  Mine are:

bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 12.0


I'm willing to try. I made the change in my user_prefs and we'll see 
what the next week brings.


Thanks











Re: ALL_TRUSTED rule fires despite no trusted_networks defined

2007-02-08 Thread Jim Maul

Stéphane LEPREVOST wrote:
We are actually checking the configuration of our SA installation (SA 
3.1.7 + qmail + qmail-scanner 1.25st + clamav running on SLES *) and 
just saw a very weird thing :
 
despite we don't have any 'trusted_networks' line in our local.cf file, 
more than 50 000 received mails over 90 000 did fire the rule 
ALL_TRUSTED ...
 
Does someone knows why ?



Because if you dont define trusted networks, SA guess at what it should 
be.  Sometimes it gets this wrong...is your server nat'ed?


You may have to define trusted networks manually.

-Jim




Re: Not just detect, but block

2007-02-07 Thread Jim Maul

Evan Platt wrote:

At 10:42 AM 2/7/2007, nh2 wrote:


Hello,

I am new to spamassassin and have installed it on a Suse 9.3 system with
qmail.
If my assassin detects spam, it adds *SPAM* to the subject.
How can I tell my server that such marked mails shall not be delivered
(deleted or moved to a special directory)?


If no one is able to answer you here, this may be better addressed on a 
qmail list, as this isn't a function of SpamAssassin.





It isnt really related to qmail either, but rather what program you use 
to call SA.  I use qmail-scanner but there are other alternatives 
(simscan, etc).  Qmail-scanner has an option to do this and i believe 
simscan does as well but i have never used it.


-Jim



Re: SA-gen'd message report headers appear differently (with/without linebreaks) in different mail clients

2007-02-06 Thread Jim Maul

Andy Figueroa wrote:
As an occasional, long-term Thunderbird user, and using a reasonably 
current version, 1.5.0.9, TB doesn't even have a built-in show header 
feature.  It can be added with a buggy extension called View Headers 
Toggle Button, which doesn't show long lines without scrolling (scroll 
right and left by dragging the mouse over the long line) and doesn't 
show all of a long header with no vertical scrolling available.  So, if 
SpamAssassin formatted it's headers so they were pretty, with TB you'd 
see even less of the header because pretty formatting adds lines.


This is a missing feature in TB, poorly added with the extant extension. 
 When my leg heals I'll go back to using KMail on my desktop computer 
and be happy about it.  :-)


Andy Figueroa



Is ctrl-U not sufficient?

-Jim


Re: Drug spam, some caught some not - none caught by drug rules

2007-01-26 Thread Jim Maul

Rich Shepard wrote:

On Fri, 26 Jan 2007, Rich Shepard wrote:


 Where do I put this file so it's seen and used by SpamAssassin?


  Nevermind. I put it in /usr/share/spamassassin/ with all the other .cf
files.

Rich




nooo

Those are the DEFAULT rules.  Do not add/remove/modify anything in this 
folder.


custom rules go in /etc/mail/spamassassin/

You really need to have a better understanding of the basics of SA.  I'd 
suggest going over the documentation again. Specifically: 
http://wiki.apache.org/spamassassin/WhereDoLocalSettingsGo


-Jim


Re: USER_IN_WHITELIST problem

2007-01-22 Thread Jim Maul

Drew Burchett wrote:

Well, I certainly don't mean to be argumentative about this, but over
the weekend, I had to set USER_IN_WHITELIST score to 0 due to the number
of false hits it was receiving.  Seeing as I am the only one here who
has the ability to add and remove from whitelists or blacklists, I have
a pretty good idea of what is in them.  I can't say for sure, but there
certainly seems to be a bug in this particular rule.  If I could help to
troubleshoot it, I would be glad to provide whatever information is
necessary.



All this guessing can easily be put to rest by posting:

1. The headers of the message in question
2. Your SA whitelist statements

-Jim




Drew Burchett
United Systems  Software
Ph:(270)527-3293
Fax:  (270)527-3132

-Original Message-
From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 22, 2007 10:40 AM

To: Sherman Lilly
Cc: users@spamassassin.apache.org
Subject: Re: USER_IN_WHITELIST problem

Sherman Lilly wrote:
I have spam getting through that would get filtered if they were not 
getting -100 because of the USER_IN_WHITELIST rule. I do have a
whitelist but 

no of these spam email have anything close to my whitelist.


Yes they do, otherwise you wouldn't see USER_IN_WHITELIST hitting.

It's probably hitting on whatever the envelope from address is (found in

the Return-Path header).  Most of the time this happens when people 
whitelist their own domain using whitelist_from.



Daryl

--
CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message.






Re: box trapper filter and wildcard question

2007-01-18 Thread Jim Maul

dan li wrote:
Hello. I am wondering if the Box Trapper is a part of spam assasain? If 
so, I am hoping some of you can give me the correct lines for the 
addresses I want to whitelist by using wildcard expressions.




Boxtrapper appears to be part of (or an add on to) cpanel.  It is not 
related in any way to spamassassin.


-Jim


Re: sa-learn explained

2006-12-29 Thread Jim Maul

Dave Koontz wrote:
 
I guess milage varies.  Auto-Learn has been a life saver for us and has

drastically reduced false postives we used to get with emails to our
College's Health Care  Research departments.  We pass all local user email
through SA as well, so this really helps the system learn what is 'good'
email.

I'd suggest that everyone should at least try it and monitor the results.




I have found autolearn to be quite a valuable function here as well. 
Keep in mind that i have adjusted the autolearn threshold values to 
prevent things from being autolearned incorrectly.  I would suggest 
others do the same if they use autolearn.  IMO, with the default scores, 
it is too easy for false learning to occur. I use:


bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 10.0

-Jim



-Original Message-
From: Nigel Frankcom [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 29, 2006 11:17 AM

To: users@spamassassin.apache.org
Subject: Re: sa-learn explained

On Fri, 29 Dec 2006 09:51:05 -0500, Andy Figueroa
[EMAIL PROTECTED] wrote:

I still fee like a tyro with SpamAssassin, but my installation is 
catching better than 99% with perhaps 0.1% false positives (thanks in 
large part to things I've learned from this list), and I think I can 
tell you a couple of things better than just read the manual.  (But, do 
read the manual!)  My initial experience with SpamAssassin about a year 
ago was through a large web hosting company and I was limited to 
playing with SpamAssassin through cpanel, though till they moved 
SpamAssassin to its own server, I could also edit my own user 
preferences directly.  The problem was, this big company never could 
get it right, so now I'm running my own mailserver(s) out of what 
seemed like necessity.  I'm running Gentoo with SA 3.1.7.


sa-learn is used to train and keep up-to-date the bayesian database.  
So, turn on autolearn in your /etc/mail/spamassassin/local.cf so the 
line reads:

bayes_auto_learn 1
(should be on by default).
This will cause selected spam and ham that you get to be used 
automagically to keep the bayesian database up-to-date.


I'm using maildir and have two subdirectories in my .maildir called:
2-learn-spam
2-learn-ham

I put missed spam in 2-learn-spam and ham misclassified as ham in 
2-learn-ham.  Then, whenever I have a few messages in one of those 
directories, I run one of the following scripts:


learnspam.scr, which contains this line:
sa-learn --spam --progress /home/figueroa/.maildir/.2-learn-spam/cur

learnham.scr which contains this line:
sa-learn --ham --progress /home/figueroa/.maildir/.2-learn-ham/cur

This is on my personal mailserver.  On the mailserver I run at a 
school, I run that script on each users 2-learn-spam/ham directories 
every night under crontab.


Run an up-to-date version of SpmaAsssasin.  I was having pretty good 
results with 3.1.3 (the unmasked version in Gentoo), but got 
immediately better results when I upgraded to the current version.


Also, to keep your RULES up-to-date, run sa-update as root from 
time-to-time.


Good luck!  Happy spamassassaning!



Personally, I'd disagree with auto-learn; having used SA in a production
environment for some years I've found manual training to be a better
solution.

YMMV

Just my 2 (pick your currency) worth.

Nigel









Re: Two Questions

2006-12-27 Thread Jim Maul

ToTheCenter.com wrote:

I now am left with more understanding and more questions!


1) Some emails come through twice. Once as the original version and 
once as:

 No Message Collected 


Any ideas?



No idea, but does this help?

http://forums.ev1servers.net/showthread.php?t=31497



2) I get the impression that the spam is being automatically deleted. 
Probably how I set it via a configuration at one point. Somewhere I read 
that procmail is that guy responsible. Where is the procmail config file 
and how can I fix this?




Neither of these really have anything to do with spamassassin so im not 
sure how much help you will receive from here.  I, for one have never 
used procmail, and certainly not with any control panel (ensim).  Is 
there a reason that you cannot go to ensim for support?


-Jim



Re: Tagging for spam mails

2006-12-14 Thread Jim Maul

Brad Baker wrote:

We would like to add a spam report to the body of emails identified as
spam to make troubleshooting false positives easier. For instance:


To: [EMAIL PROTECTED]
From: [EMAIL PROTECTED]
Date: November 26, 2006 3:57PM
Subject: [spam] Buy ED Pills Now

The quick brown fox jumps over the lazy dog. The quick brown fox jumps
over the lazy dog. The quick brown fox jumps over the lazy dog. The
quick brown fox jumps over the lazy dog.


This email has been identified as spam for the following reasons:
Content analysis details: (6.77 points, 4.00 allowed)

pts rule name description
 -- 
--

0.1 HTML_MESSAGE HTML included in message
0.1 HTML_TAG_EXISTS_TBODY HTML has tbody tag
0.6 NORMAL_HTTP_TO_IP Uses a dotted-decimal IP address in URL
6.5 BAYES_995 Bayesian spam probability is 99.5 to 100%
-0.5 DK_VERIFIED Domain Keys: signature passes verification
0.0 SPF_PASS SPF: sender matches SPF record (pass)
0.0 NO_RDNS2 Sending MTA has no reverse DNS



From this page:
http://spamassassin.apache.org/full/3.0.x/dist/doc/spamassassin.html#tagging_for_spam_mails 



It looks like this option is what we want:   spam mail body text

I tried just adding spam mail body text to local.cf with no result
though. I also added a 1 to the end - that didn't work either.  We
are running spam assassin 3.1 and the report_safe option in local.cf
is set to 0.



Dont you want report_safe 1?  I dont know what this spam mail body 
text thing is your talking about.



Could anyone point me to more information on how this feature works? I
tried searching Google but didn't have much luck and the Spam Assassin
documentation is somewhat ambitious.



ambitious?


Thanks,
Brad







Re: Sorry Dhawal - no personal attacks allowed [OT]

2006-12-12 Thread Jim Maul

Ken A wrote:



Dhawal Doshy wrote:

Marc Perkel wrote:
Well - if you don't like me then why don't you write a filter rule to 
delete message coming from me? I'm not going away so get used to it. 
If my threads weren't so damn interesting it wouldn't generate so 
much interest.


I think that your personal attack is not appropriate for this forum. 
This is a tech forum and there are lots of ideas that you aren't 
going to like. You're just going to have to get used to it.


Sincere apologies..



Marc,

It does get a bit old watching you light fires. The claims you make
about how this or that doesn't work (usually because you don't
understand it), and the overly broad how are we gonna make a better
toaster? questions really do increase the noise level quite a bit here.



The subject is clear.  If you notice a message you dont care about, 
filter them out by subject.



Some people on this list have to pay per kb of bandwidth used.



Then unsubscribe from the list.



Ken A.
Pacific.Net








Re: Rule update over DNS?

2006-12-07 Thread Jim Maul

Kelson wrote:

Jason Haar wrote:

May I propose that sa-update should become merged into spamd? (or
daemonized)


Merging would be bad. There are plenty of us using methods other than 
spamd to call SpamAssassin.





I dont think anyone is using spamd to call SpamAssassin.



Re: Rule update over DNS?

2006-12-07 Thread Jim Maul

Justin Mason wrote:

Jim Maul writes:

Kelson wrote:

Jason Haar wrote:

May I propose that sa-update should become merged into spamd? (or
daemonized)
Merging would be bad. There are plenty of us using methods other than 
spamd to call SpamAssassin.

I dont think anyone is using spamd to call SpamAssassin.


???

one over here ;)

--j.





oh?  Care to explain how spamd would call spamassassin? That would be a 
neat trick ;)


-Jim



Re: MX server Queue

2006-11-30 Thread Jim Maul

chisina mike wrote:


MX1 sendmail server mail queue is getting bigger, it must forward all mail
to Main mail server.
[EMAIL PROTECTED] mqueue]# grep stat=queue -c /var/log/maillog
6363

I tried the following commands
# vi /etc/MailScanner/MailScanner.conf
Deliver In Background = yes
Delivery Method = queue

# vi /etc/crontab
0-59 * * * * /usr/sbin/sendmail [EMAIL PROTECTED]

#vi /etc/mail/sendmail.cf
O MinQueueAge=15m

[EMAIL PROTECTED] ~]# sendmail -bd -ODeliveryMode=queueonly
-OQueueDirectory=/var/spool/mqueue.in

But I still have the same problem.

Regards
Mike chisina






Is there a question here somewhere?  Im not even sure what this has to 
do with SpamAssassin?


-Jim


Re: Odd behaviour (?) of my Qmail / Qmail Scanner / SpamAssassin 3.1.3 Setup?

2006-11-29 Thread Jim Maul

Adam Wilbraham wrote:

To follow up on this, the message in question is flagged as spam if i
run it through spamassassin, however if I run it through spamc its not.
spamc is what Qmail Scanner invokes. Is there a separate configuration
for spamc / spamd to spamassassin? I thought not...
 



when you run spamassassin it is running as the current user.  who are 
you logged in as?  When qmail-scanner runs spamc it is most likely 
running as a different user (maybe qscand?)  Different users will 
provide different results depending on the configuration.  Also, 
scanning a message at a later time may produce different results due to 
the message being listed in some RBL or razor,dcc,etc.


-Jim



On Wed, 29 Nov 2006 14:00:13 +
Adam Wilbraham [EMAIL PROTECTED] wrote:


I've got a bit of an odd situation whereby some obvious spam seems to

snip







Re: Percentage of email that is spam after filtering?

2006-11-27 Thread Jim Maul

Chris Santerre wrote:
Out of total mail hitting our server 12.99% is legit and delivered. You 
read correctly, 12.99%!!


65% is rejected at MTA w/ RBLs


I wonder what percentage of this 65% is legit and blocked.


21% is caught by Spamassassin and not delivered.
12.99% is legit and delivered.
0.01% is spam that sneaks thru and delivered

HTH,

Chris Santerre
SysAdmin and Spamfighter
www.rulesemporium.com
www.uribl.com







Re: Percentage of email that is spam after filtering?

2006-11-27 Thread Jim Maul

Chris Santerre wrote:



  -Original Message-
  From: Jim Maul [mailto:[EMAIL PROTECTED]
  Sent: Monday, November 27, 2006 12:12 PM
  To: Chris Santerre
  Cc: users@spamassassin.apache.org
  Subject: Re: Percentage of email that is spam after filtering?
 
 
  Chris Santerre wrote:
   Out of total mail hitting our server 12.99% is legit and
  delivered. You
   read correctly, 12.99%!!
  
   65% is rejected at MTA w/ RBLs
 
  I wonder what percentage of this 65% is legit and blocked.
 

Really... do we really need to rehash this everytime someone says they 
use an RBL? Well I can tell you I get maybe 5 a YEAR reported, and I 
bypass the filter for. Then I inform the vendor/customer of their 
listing. They are EXTREMELY happy that I told them. Otherwise they would 
have no clue.


5 mails have to be resent a year, or 65% of all useless mail allowed to 
come into my system. The math is easy.




whoa hey now calm down dont throw the gloves off just yet.  I wasnt 
trying to start a war here, just trying to show the other sides of 
things.  Im sure others here appreciate both sides of the story - 
especially when there could be unforeseen side affects with harmful 
consequences.  Its all with good intentions, i swear!


Jim



Re: sa-learn treating spam as ham

2006-11-24 Thread Jim Maul

Patrick Sherrill wrote:

Sorry, last email was a poor example. Try this one.

Before sa-learn:

X-Spam-Status: No, score=4.201 required=4.9 tests=[BAYES_50=0.001, 
HELO_DYNAMIC_IPADDR=4.2]


After sa-learn:

X-Spam-Status: No, score=-0.2 required=4.8 tests=BAYES_40 autolearn=ham 
version=3.1.0


The difference in required score is conf differences between SA and 
Amavis-new.





Im confused.  First, why are the lines different?  Whats this 
tests=[BAYES_50=0.001,HELO_DYNAMIC_IPADDR=4.2] thing?  And why does 1 
line have autolearn= and the other doesnt have any autolearn?  The top 
is not a standard SA header while the bottom one is.  And also, if 
autolearn ignores the bayes_ scores, and BAYES_40 is the only test 
listed, then the message score should be 0.0 from what the autolearner 
sees.  Is the default autolearn threshold for ham 0.0?  God i hope not. 
 I've set my ham autolearn threshold to -0.5 to avoid this.  you may 
want to also.


Regardless, there is something weird going on and it doesnt have 
anything to do with sa-learn.


-Jim


Re: Rules Du Jour briken?

2006-11-16 Thread Jim Maul

twofers wrote:

Is this link having problems that anyone knows of?
 
http://www.exit0.us/index.php?pagename=RulesDuJour
 
I can't get to Rules Du Jour.




Actually, the whole exit0.us site doesnt work.

-Jim


Re: bayes_seen on MySQL, growing and growing

2006-11-13 Thread Jim Maul

Paolo Cravero wrote:

Hi,
while doing some checkup on production servers, I noticed that the 
bayes_seen table on MySQL is rather big:


row: 15'814'021 (15.8Mr)
size: 1'853'882'368 bytes   ( 1.8GB)

I've understood SA doesn't clean-up that table, so it has to be done 
manually.


Can I simply do a DELETE * FROM bayes_seen and live long and employed? 
;-) I know it works if Bayes is on files. I would also OPTIMIZE TABLE 
bayes_seen to regain the disk space.


It would be probably faster to delete and re-create the table, but on a 
production system...


Any other issues?




I dont use mysql with SA, but you should be able to use truncate instead 
of delete.  It may very well be faster with all those rows.


-Jim


Re: razor and dcc : high cpu load

2006-11-10 Thread Jim Maul

Rejaine Monteiro wrote:


But I have various servers with qmail-ldap configuration, where the a 
first server (simple qmail installation, without ldap)  receives mails  
from internet and check domain using rcpthosts only, does spam and virus 
checks and them forwards the mail to the others qmail-ldap servers
using virtualdomains and alias (so, the ldap accounts and mail address 
are on others servers)


One problem with this arrangement is that I cannot do RCPTCHECK, 
because  none of the domains are in locals, but in virtualdomains.
I would really like  to do RCPTCHECK because of spam, but with that 
configuration I don't no how to do this...


Any ideas???



Generate a list of all valid addresses from the machine that actually 
has the mailboxes and propagate that list to the machines that actually 
received the mail so they are aware of which accounts are valid and 
which are not.  This is done pretty easily using qmail.




Re: mail bounce warning for the list

2006-11-10 Thread Jim Maul

Mike Kenny wrote:

On 11/9/06, *Jim Maul* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:


I think pretty much everyone understand WHY people use these BLs.  This
is not the point.  The point is, its not a very good solution.


Is it even a solution? I guess that depends o nwhat the problem is. If 
the problem is the volume of mail passing through the servers then I 
suppose it is. The ultimate extrapolation of this is that in a perfect 
world no mail would be allowed to pass through so that we can continue 
to run our servers on 286s!





To me, a solution that in turn creates problems of it own, is not a 
solution at all.  It only shifts the problem elsewhere.  Apparently, 
thats good enough for many people out there.


Maybe I'm being naive but I thought the objectives were not to make live 
easier for the mail administrator (though that would be nice) but to 
ensure that the people who actually run the business (accountants, sales 
staff, support engineers, CEOs, etc.) receive all relevant mail that is 
sent to them and don't have to waste inordinate amounts of time wading 
through spam. I see the first of these as being of signifcantly more 
importance than the second.




Exactly my point.  Users want 2 (at least!) things with email:

1. They want to receive all legitimate mail
2. They dont want spam

1. is extremely important and should NOT be compromised in any way by 2. 
  For example, to say that theres a trade off in that solving (2) in 
any way affects (1) is just wrong.  There should be no trade off.  I'd 
rather receive 1000 spams a day than miss 1 legit email and my boss agrees.



Blocking on content achieves the second of these, sorry if it now 
requires more car and attention to keep that server running. Blocking on 
the source IP address, purely because it may be dynamic or may have sent 
spam some time in the past makes the first objective virtually 
impossible to achieve.


Unless the spam vigilante sends a notification to the intended recipient 
of every mail it has blocked so that they can check if this should have 
been the action taken. This sort of defeats the second objective.


I am not against DNSBLs. What I would like to see is more honesty in how 
they should they used. They are a tool, not a solution. Their web pages 
should have a warning liek cigarette packs 'use of this service to block 
rather than score emails can cause blindness, madness and bubonic 
plague'. Too many of our users' destinations seem to be using these 
sites as though they are infallible.




And all the people who recommend the blocking of mail based on these 
lists are doing everyone a disservice - especially to those who dont 
realize that these lists are, in fact, not 100% accurate.  False 
positives WILL happen.


Since it is the sender who is notified of the bounce, by our mail 
server, not the recipient (who unknowingly sanctioned it) the problem is 
placed at our doorstep to resolve.


mike



And i solve this problem by tagging the subject and passing the mail on 
to the users mailbox.  If they want to create a rule in their outlook or 
whatever to send these tagged mails to the trash then that is THEIR 
decision and if a legit message ends up in their trash they have no one 
to blame but themselves.  I clearly explain this to them when they ask 
me to create the rule for them.  I also suggest they browse through 
their deleted items occasionally and check for false positives.  If i 
rejected their mail at the mta, there would be no notification that any 
message even attempted to be delivered to them and they would have no 
idea that there was even a problem.  I guess some people are ok with 
this, but I am not one of them.


-Jim



Re: razor and dcc : high cpu load

2006-11-10 Thread Jim Maul

Rejaine Monteiro wrote:


Like a text-file based (it's not a security hole?!) or a ldap-replica on 
mail-server?
I'm  searching for more examples and other ideas and find this patch for 
qmail:

http://qmail.jms1.net/patches/validrcptto.cdb.shtml

I don't no if this patch is really necessary.. but it's a sugestion too...
Anyway... I'll search more and to do many tests...

Thanks...



First, that was exactly that page/patch that i was referring to.  I use 
it here on my server and know of others who are in the exact same setup 
as you who also use it.


The file is a .cdb file (not text) but even if it was, i fail to see how 
this is a security hole.  There are only a list of accounts or perhaps a 
wildcard ([EMAIL PROTECTED]) in this cdb file so where is the risk?


ldap-replica is way too involved for this function.  Use the patch you 
mentioned, find a way to get qmail-ldap to output a list of addresses, 
build the cdb file and replicate to the server(s).  Thats basically it.


-Jim


Re: Spam assasin query

2006-11-10 Thread Jim Maul

Ramdas P. Prabhu wrote:

Dear All,

I have a small query, and shall be highly obliged if you could solve it.

Some time back, I had enabled some option in spam assasin whereby anyone who
sent me a mail received an automated message to click on a link in order to
verify whether he was real or a spam source. Somehow that option is not
available to me now. Can you please let me know how this option can be
activated in spam assasin. Below is an excerpt of the same feature which I
have mentioned above.



Subject: Your email requires
verificationverify#EdsAtW8RAHzEKlx83QUQtDhQLl6DnCF9


The message you sent requires that you verify that you
are a real live human being and not a spam source.

To complete this verification, simply reply to this message and leave
the subject line intact.

The headers of the message sent from your address are show below:


From [EMAIL PROTECTED] Wed Jan 18 01:19:06 2006

Received: from wwwparm by gains.house4domains.com with local-bsmtp (Exim
4.52)
 id 1Ez7ax-0001Bz-FM
 for [EMAIL PROTECTED]; Wed, 18 Jan 2006 01:19:05 -0600
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on
gains.house4domains.com
X-Spam-Level:
X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED
autolearn=unavailable version=3.1.0
Received: from [59.182.37.119] (helo=rpp)
by gains.house4domains.com with esmtpa (Exim 4.52)
id 1Ez7av-0001BG-82






 I shall be highly obliged if you could let me know how the same can be
 enabled again.




Spamassassin might have scanned this message, but it in no way produced 
the output you are referring to.  This has to have been some C/R system 
which SA does not use.


-Jim


Re: Well, that didn't take very bloody long

2006-11-10 Thread Jim Maul

Chris Santerre wrote:



  -Original Message-
  From: Steve Lake [mailto:[EMAIL PROTECTED]
  Sent: Friday, November 10, 2006 12:52 PM
  To: users@spamassassin.apache.org
  Subject: Well, that didn't take very bloody long
 
 
   Ok, remember that Name Wrote: :) emails?  They've
  completely
  changed.  Now it's hi username instead.  Joy, oh joy.  Can
  anyone find
  any common elements in these emails because whoever this putz
  is, they're
  adapting a lot.  They hit us, we adapt, they immediately
  change tactics and
  come at us again.  Now with all the brilliant minds on this
  mailing list,
  we really should be able to find out who this putz is and
  nail all his
  stuff regardless of what tactic he switches to.

Ahahaha... I went and looked at mine that are being caught. Found one of 
my old rules is tagging some of these. I about spit up my NE Chowder!


 pts rule name  description
 -- 
--

 1.2 MY_DSL Contains likely dsl address in header
 0.3 MY_HELOMay be valid but catches most.
 0.1 FORGED_RCVD_HELO   Received: contains a forged HELO
 1.7 SARE_MLB_Stock1BODY: SARE_MLB_Stock1
 1.7 SARE_MLB_Stock5BODY: Mentions stock symbol, tickers, or OTC.
 0.6 MY_PHRS_LOWBODY: low scoring phrases found
 1.7 SARE_CSBIG BODY: Only Mexican food gives me an 
Explosive Gain.


I crack myself up!

--Chris



haha damn that is pretty funny.  Explosive gain?  Am i the only one who 
thinks toilets should come with handles? haha


-Jim



[Fwd: Your email message was blocked]

2006-11-10 Thread Jim Maul
What!?  You gotta be kidding me!  My message was BLOCKED because it had 
damnn in it?  People actually use crap like this?


Wow...

-Jim

 Original Message 
Subject: Your email message was blocked
Date: Sat, 11 Nov 2006 04:23:23 +0930
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]

MailMarshal (an automated content monitoring gateway) has
stopped the following email for the following reason:

It believes it may contain unacceptable language, or inappropriate material.

   Message: B0003b04c4.0001.mml
   From:[EMAIL PROTECTED]
   To:  [EMAIL PROTECTED]
   Subject: Re: Well, that didn't take very bloody long

Please remove any inappropriate language and send it again.

The blocked email will be automatically deleted after 4 days.

MailMarshal Rule: CSG Bothways : Block Unacceptable Language
Script Offensive Language (Basic) Triggered in Body
Expression: putz Triggered 1 times weighting 5


For more information on email virus scanning, security and content
management, visit http://www.marshalsoftware.com




Re: [Fwd: Your email message was blocked]

2006-11-10 Thread Jim Maul

Evan Platt wrote:

At 11:08 AM 11/10/2006, you wrote:
What!?  You gotta be kidding me!  My message was BLOCKED because it 
had damnn in it?  People actually use crap like this?


Wow...

-Jim

 Original Message 
Subject: Your email message was blocked
Date: Sat, 11 Nov 2006 04:23:23 +0930
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]

MailMarshal (an automated content monitoring gateway) has
stopped the following email for the following reason:

It believes it may contain unacceptable language, or inappropriate 
material.


   Message: B0003b04c4.0001.mml
   From:[EMAIL PROTECTED]
   To:  [EMAIL PROTECTED]
   Subject: Re: Well, that didn't take very bloody long

Please remove any inappropriate language and send it again.

The blocked email will be automatically deleted after 4 days.

MailMarshal Rule: CSG Bothways : Block Unacceptable Language
Script Offensive Language (Basic) Triggered in Body
Expression: putz Triggered 1 times weighting 5




Looks more like they didn't like PUTZ.. :)




Even worse!  But (excuse my stupidity) where was the word putz in my 
email?  I sure as hell didnt type it.


-Jim


Re: [Fwd: Your email message was blocked]

2006-11-10 Thread Jim Maul

Coffey, Neal wrote:

Jim Maul wrote:

Even worse!  But (excuse my stupidity) where was the word putz in my
email?  I sure as hell didnt type it.


Jim Maul wrote:

Chris Santerre wrote:

 Can anyone find any common elements in these emails
 because whoever this putz is, they're adapting a lot.

haha damn that is pretty funny.


I wonder if someone at csg.com.au is going to notice an unusual amount
of putzing in the logs, if we keep this discussion up :)




Ah ok thanks for pointing that out (chris' message that i replied to). 
I was thoroughly confused.  No more putzing for me ;)


-Jim




Re: mail bounce warning for the list

2006-11-09 Thread Jim Maul

D.J. wrote:

Blocking mail base soley on the IP address (whether because it is a
dynamic address or has at some time in the past sent a mail to a
spamtrap) is akin to shooting the postman because yesterday you
received an advertisement. 



You obviously don't handle a lot of mail volume.  If I had to scan every 
SMTP request that came in, and did not use the to DNSBL's I use (neither 
are SpamCop) I would need WAY more powerful hardware than I currently 
have, and I don't have chump hardware as it is.  As it stands, using 
qmail + spamassassin + clamav on three load balanced Dual Xeon 2.8 GHZ 
machines with 2GB of RAM handles the flow with an average 5 minute load 
average of around 3-4.  And that's with the BL's enabled.  Think of if I 
had to actually process the other million or so messages (NOT an 
exaggeration) that attempt to hit my servers...


As someone has probably already pointed out... admins use these lists 
because they trust their accuracy.  If they receive too many complaints 
(as we did with a particular DNSBL) you stop blocking on that list and 
move to only scoring.




I think pretty much everyone understand WHY people use these BLs.  This 
is not the point.  The point is, its not a very good solution.


If you have 100gb of data you need to back up every day and you only 
have 50gb worth of tapes to back that data up onto, would you only back 
up half of it and trust that your hardware wont fail?  This is 
essentially what you are doing.


The CORRECT solution to the problem is to buy more tapes.  Just like a 
better solution to your problem is to buy more machines to process the 
mail, not trust someone else to tell you who should and shouldnt be able 
to send mail to your server.  FPs WILL happen.  If you havent seen any 
yet, great, but be damn sure you will at some point.


I understand that this can get incredibly expensive and this is most 
likely why people use BLs at all, but that does *not* mean that 
rejecting mail based on these lists is by any means the solution to the 
problem.


Re: mail bounce warning for the list

2006-11-07 Thread Jim Maul

Rose, Bobby wrote:
So what you're saying is that the rule that people running listservers 
should maintain valid recipients who want to receive messages from the 
list shouldn't be followed just because it's a list about an antispam 
product?  The last time I checked, the most common reason for spamcop 
lists is due to messages being sent to their spam traps.  What's the 
point of even having rules in SA for spamcop and other DNSBLs if you 
don't have a certain level of trust in them.  SA is more resource 
intensive that an MTA block which is why so many still use it.  I know 
that over 20k a day trip the SORBs DUL rule here and around 10k trip 
spamhaus.  You can pretty much bet it's all spam so I can understand why 
people would rather use those lists at their MTAs based on their 
observations of the mail flow for their domains.
 


You can block millions or billions or however many spams you want with 
this method, but the second you block one legit piece of mail and your 
boss doesnt get it, its your ass.  People can do whatever they like with 
their servers, but blocking mail at the MTA using blacklists is A BAD 
IDEA, PERIOD.  I realize it may be necessary for some setups that 
actually receive thousands or millions of messages a day, but that 
doesnt make it any better of an idea.


Also, show me a boss that gives a crap that the reason the message to 
him/her was blocked was because the senders mail server is listed in 
some BL somewhere and i'll be really impressed.  Most dont want to know 
and mainly dont care WHY it happened..they just know that the server you 
set up blocked a legit message and if your lucky they wont be too pissed 
off.  Good luck.  I'd rather not introduce that headache into my work life.




There have been messages posted to this list that can have very positive 
SA scores simply due to the content.  So based of that, I guess everyone 
should whitelist users@spamassassin.apache.org 
mailto:users@spamassassin.apache.org and spammers reading the list can 
just turn around and use that as their return address because then the 
argument could be made that anyone who doesn't deserves not to get mail 
from the SA lists.
 


There are reasons that other whitelist methods exist that arent as 
easily forged but im sure you already know that.  This argument is 
pretty lame at best.




I believe the correct process here is that the moderators of the SA 
listserver investigate why the listserver got listed on Spamcop.  If it 
is a case where there are addresses to spamtraps in the list, then maybe 
the list needs to send out opt-in verification messages to weed them out.
 


Again, who knows..who cares?  Legit systems get listed in BL's all the 
time.  It really doesnt seem to matter how hard one tries to prevent 
this from happening as many lists have many different listing criteria. 
 Would you like to volunteer your time to get legit servers delisted 
from all BLs?  Thats mighty nice of you...


As someone else said before, stop blocking mail outright based on these 
lists and use them for scoring instead and be done with it.


-Jim


Re: spam filter working, but not well

2006-11-07 Thread Jim Maul

Brian S. Meehan wrote:

Spamassassin is invoked from Courier-MTA. (OS is SUSE Pro 9.3)
The /usr/lib/courier/etc/courierd file has the following line:
DEFAULTDELIVERY=| /usr/bin/spamassassin | /usr/lib/courier/bin/maildrop
I had tried it with 'spamc' but there was no difference. When I tried it
with /usr/bin/spamd I get the following in my mail log:



spamd is the daemon and you definitely do not want to start this for 
every message you receive.  You should be using spamassassin or spamc 
here.  If you use spamc, spamd must already be started and running for 
it to function correctly.  spamc/spamd are a pair and are used together. 
 spamassassin is standalone.



spamd[5895]: spamd: could not create INET socket on 127.0.0.1:783:
Permission denied
courierlocal:
id=00086831.4550A56E.1702,from=...sender...,addr=[EMAIL PROTECTED]:
[5895] error: spamd: could not create INET socket on 127.0.0.1:783:
Permission denied
courierlocal:
id=00086831.4550A56E.1702,from=...sender...,addr=[EMAIL PROTECTED]:
spamd: could not create INET socket on 127.0.0.1:783: Permission denied
courierlocal:
id=00086831.4550A56E.1702,from=...sender...,addr=[EMAIL 
PROTECTED],size=928,success:
Message delivered.
courierd: completed,id=00086831.4550A56E.1702


I definitely have more than 200 ham and 200 spam in the database (done
with sa-learn commands). bayes_seen is 632k and bayes_toks is 2.5M in
size.

I think the problem is network tests but I checked the
/etc/sysconfig/spamd file and the only uncommented line is:
SPAMD_ARGS=-d -c

-Brian



Can you send a sample of a message that you received?  Im not sure if 
you did this already as i missed the original message.


-Jim


Re: spam filter working, but not well

2006-11-07 Thread Jim Maul

Brian S. Meehan wrote:

Jim,
I have it set so that i'm using /usr/bin/spamassassin now. Thanks for that
info.

Here is the relevant message header from an email that was not caught:
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on
 mail.meehanontheweb.com
X-Spam-Level: ***
X-Spam-Status: No, score=3.1 required=4.0 tests=ADVANCE_FEE_1,RCVD_IN_XBL
 autolearn=no version=3.1.7
Received: from cliente-addc099 (201-68-96-184.dsl.telesp.net.br
[:::201.68.96.184])
 by meehanontheweb.com with esmtp; Tue, 07 Nov 2006 10:50:57 -0500
 id 00072EA2.4550AB7D.18B6
Old-Return-Path: [EMAIL PROTECTED]
Received: from 192.94.94.37 (HELO red.ext.ti.com)
 by meehanontheweb.com with esmtp (CSNG1VAZG A627H)
 id 6W926D-JODX0S-DO
 for [EMAIL PROTECTED]; Tue, 7 Nov 2006 15:49:51 +0180
From: Dillon Barron [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Dillon here :)


Here is another one that wasn't caught:
X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on
 mail.meehanontheweb.com
X-Spam-Level: *
X-Spam-Status: No, score=1.7 required=4.0 tests=EXTRA_MPART_TYPE,
 HTML_IMAGE_ONLY_24,HTML_MESSAGE autolearn=no version=3.1.7
Received: from catv-50634822.catv.broadband.hu
(catv-50634822.catv.broadband.hu [:::80.99.72.34])
 by meehanontheweb.com with esmtp; Mon, 06 Nov 2006 17:04:29 -0500
 id 00086441.454FB16F.31DE
Message-ID: [EMAIL PROTECTED]
From: Project: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: rejected Uganda rebel

Thanks,
-Brian





Whats strange is there are no bayes scores at all.  I know you mentioned 
that you have at least 200 ham/spam in the database but are you sure its 
the same users database that mail processing runs as?  Also, when i just 
ran those headers through spamc here, i got:


 4.1 MSGID_OUTLOOK_INVALID  Message-Id is fake (in Outlook Express format)


Im curious as to why your system didnt trigger this rule?  Im still 
running 2.64 ;(


It does seem that you are using network tests, but are you using 
razor/pyzor/dcc?  Those could help as well.


-Jim


Re: Bayesian scores

2006-11-03 Thread Jim Maul

Péntek Imre wrote:

Hello,

Why BAYES_99 have only the score 3.5 while 5.0 is required to identify a mail 
as spam? I think this rule should have a score about 5.1 (or anything greater 
than 5.0).


because if its wrong in its classification, then that 1 rule alone will 
cause a FP.  The whole idea is that no 1 rule cause a message to be 
tagged either way. (except for maybe whitlist/blacklist)


Anyway, if you want, change the score of the rule.  I've upped the 
scores on almost all bayes rules here because history has shown it to be 
incredibly accurate here.


-Jim



Re: Bayesian scores

2006-11-03 Thread Jim Maul

Péntek Imre wrote:

Jim Maul wrote:

I've upped the scores on almost all bayes rules here because history has
shown it to be incredibly accurate here.
Yes. BTW so far I've got no FP but still get false negatives with score 3.5, 
BAYES_99, using this database:

[5816] dbg: bayes: corpus size: nspam = 2757, nham = 1403
Built from scratch by myself, still growing.
As I have so big database there's very little possibility of mistaken bayesian 
score, but as I've built this database from scratch, I can also state that 
the same stands for little bayesian databases too. So I will use score 5.1 
for BAYES_99, and still suggest to use this in the SA distribution too. 
Thanks for helping me anyways.



If you are getting false negatives with 3.5 then you need to find a way 
to get more rules to hit.  My average spam score here is 16.1 which is 
way over my 5.0 threshold.  The trick is to increase the distance 
between your average spam and ham scores as much as possible and then 
you can run with a higher spam threshold.  If you have spam not getting 
tagged, you should increase rules that trigger, not lower your threshold.


Are you using network tests, razor, surbl, add on rules from sare, etc?

-Jim




Re: Bayesian scores

2006-11-03 Thread Jim Maul

Péntek Imre wrote:

Jim Maul wrote:

Are you using network tests, razor, surbl, add on rules from sare, etc?

I can just guess, as I don't know how to get to be sure.
I can find several spams marked with:
RCVD_IN_BL_SPAMCOP_NET
UNPARSEABLE_RELAY
URIBL_AB_SURB
Are these mean I also use network tests?



I am not sure.  It would seem so to me.  Make sure you do not have -L 
being passed when starting spamd.



As I see I don't use razor, I will read the wikipage about it.


Definitely! Razor is by far one of the top performing rules on many SA 
setups.  It works great.


-Jim





Re: BIG increase in spam today

2006-11-02 Thread Jim Maul

Mark wrote:

-Original Message-
From: Marc Perkel [mailto:[EMAIL PROTECTED] 
Sent: donderdag 2 november 2006 19:00

To: users@spamassassin.apache.org
Subject: Re: BIG increase in spam today


I'm not an appliance vendor but I run a fornt end spam 
filtering service and it's been a struggle. Most of my spam

defense isn't SA though. I'm using Exim rules to do most of the
work and SA gets what's left.


Same here. A custom brewed milter-type setup of mine (a combined set of
socketmap invocations, to be precise) handles the vast majority of spam at
the gate.

92% (!) of all incoming spam uses an invalid HELO.

9% pretends to be me in their HELO.



Is this 9% included in the above 'invalid HELO' number?

-Jim


Re: Can't upgrade w/ RPM

2006-11-02 Thread Jim Maul

Philip Prindeville wrote:

Hi.

I'm running FC3 on an AMD64 platform for my mail server,
and I had last installed SpamAssassin 3.1.5.  Well, I grabbed the
tarball for 3.1.7, and did a rpmbuild -tb ... of the tarball.

Worked fine.

Then I tried to upgrade via RPM:

# rpm -v -U 
/home/src/redhat/RPMS/x86_64/perl-Mail-SpamAssassin-3.1.7-1.x86_64.rpm
error: Failed dependencies:
perl-Mail-SpamAssassin = 3.1.5-1 is needed by (installed) 
spamassassin-3.1.5-1.x86_64


any ideas why this is happening and what the fix is?

-Philip
 


You cant just upgrade one of the RPM's, you need to do them all at once.

spamassassin-3.1.5-1.x86_64 is using 
perl-Mail-SpamAssassin-3.1.5-1.x86_64.rpm so you cant upgrade one 
without the other.


-Jim


Re: AWL score change

2006-11-01 Thread Jim Maul

Steve Ingraham wrote:
I am running qmail with spamassassin 3.1.5.  I am having a problem with 
spamassassin scoring.  I have been attempting to change the score for 
AWL to -25.  Here is a header from an email I received a short time ago 
with a score of 1.4 for AWL in the X-Spam-Report section:





You can not change the score of this rule.  The AWL is not a whitelist. 
 It is a score averager. Its score changes depending on certain factors.


Why not just disable it and use real whitelisting?

-Jim


Re: AWL score change

2006-11-01 Thread Jim Maul

Steve Ingraham wrote:

Steve Ingraham wrote:

I am running qmail with spamassassin 3.1.5.  I am having a problem
with 

spamassassin scoring.  I have been attempting to change the score for



AWL to -25.  Here is a header from an email I received a short time
ago 

with a score of 1.4 for AWL in the X-Spam-Report section:
 
Jim Maul wrote:

You can not change the score of this rule.  The AWL is not a whitelist.



It is a score averager. Its score changes depending on certain factors.



Why not just disable it and use real whitelisting?


I did not know that about AWL.  As far as using the whitelist, my users
are getting messages that are scored using AWL from multiple locations.
I see it as becoming cumbersome to add dozens or hundreds of incoming
addresses in the whitelist.

Steve




I've not used whitelisting myself but from what others have posted on 
this list, it seems that you can use wildcards.  Im not sure if many of 
the addresses are from the same domain or not but this may be able to 
help you out.  I'd look into the various whitelist_* commands and see if 
they will work for you.


http://wiki.apache.org/spamassassin/ManualWhitelist

-Jim



Re: increase score of rules

2006-10-31 Thread Jim Maul

Pablo Allietti wrote:

Hi all i want to increase the score of a images rules how can i do that
? for example 


HTML_IMAGE_ONLY_28
HTML_IMAGE_RATIO_02

i want to modify the score about this rules for example 4.0 which file i
need to modify? how?





You read the documentation like a good little SA user.

Specifically:

http://wiki.apache.org/spamassassin/AdjustRuleScore

-Jim



Re: SpamAssassin + sql user prefs

2006-10-31 Thread Jim Maul

Chris Szilagyi wrote:

Hello:

I have not been able to find the answer to my question so I thought I'd try
this mailing list.

I have SpamAssassin 3.1.7 (using spamc/spamd) installed on a Red Hat 7.1
system, with Perl 5.6.1.  We currently have SQL user prefs enabled in a MySQL
db, and put the entries in /etc/procmailrc to enable system-wide scanning.

My question is:  Are there any settings for SpamAssassin that users would set
in their prefs, that would bypass scanning of their email?  My reason for
asking is that if we have users that do not want any scanning, we'd like to
free up the load on the server so it's no scanning messages and scoring them
for no reason.  Right now we're using the sasql plugin for Squirrelmail as
the front-end for the user settings, and one of the settings is to set the
level to '99' = _(Don't Filter).  But I'm just trying to figure out if
this will force SpamAssassin (spamd) to just pass the message through without
examining the content, to lighten up the load on the server.

Does anybody know which setting (if any) will accomplish this?  Thank you very
much for the feedback.



There is nothing in SA to tell it not to scan something.  If you dont 
want SA to scan a piece of mail, then you have to tell whatever calls SA 
(Procmail it seems, in your setup) not to pass that particular mail to 
it.  I've never used procmail myself but im sure someone here can offer 
some help with that.


Jim


Re: Spamassassin effectiveness, BAYES_99

2006-10-20 Thread Jim Maul

Michael Beckmann wrote:

Greetings!

In the past few weeks, I have noticed significant amounts of spam 
passing through my filter. It is reaching a level that annoys me. I use 
Spamassassin 3.1.7.


I used to get maybe one or two spam messages a day earlier this year 
with 200+ spams filtered. Now I get 10 to 20 spams per day that are not 
automatically filtered (while something like 300+ are filtered.) Did 
anybody else notice this? Are spammers becoming more effective in 
working around SpamAssassin?


I examined the spam, and it seems that the majority of the messages 
score BAYES_99 and nothing or hardly anything else. BAYES_99 is not 
enough to filter the messages. I use the standard threshold of 5.


I have been tempted to increase the BAYES_99 score to 5. I have seen 
that only very few ham messages of the newsletter type ever score 
BAYES_99 in my inbox.


Do others make similar observations? How do you deal with this?



It all depends on YOUR setup, but i'll be the first to say that after 
months of observations at my facility here (a hospital) i have increased 
my BAYES_99 to 5.2 running with a 5.0 threshold.  This is obviously 
risky but we have had great results with no reported (i cant check 
*every* piece of mail myself) false positives.


Others may suggest you lower your threshold but I feel this is the wrong 
way to deal with this type of situation.  Generally, you want to 
increase the gap between spam scores and ham scores, not lower the 
threshold.  The way to do this is add on rules, network tests, bayes, 
etc.  If you are already using all of these and still have poor 
accuracy, i'd say go for it (WRT jacking the BAYES_99 score) and monitor 
the results.


-Jim


Re: This image is turning frequent..

2006-10-18 Thread Jim Maul

Matt Florido wrote:

* Jo Rhett [EMAIL PROTECTED] [10-17-2006 10:25]:


score SARE_GIF_STOX 2.5 2.5 2.5 2.5



Can you tell me what each corresponding 2.5 represents?



http://spamassassin.apache.org/tests_3_1_x.html

Pay particular attention to the rightmost column heading in the table.

-Jim


Re: What's with UCEPROTECT List?

2006-10-17 Thread Jim Maul

Kelson wrote:

Matt Kettler wrote:

That said, some folks still hate it because you're using some (very
little) of their CPU and network to handle your spam.


Also, a large number of verifications (say, because someone has been 
sending lots of spam with forged headers) looks suspiciously like a 
dictionary attack.




Exactly.  In effect what sender verification does is cause your server 
to perform the dictionary attack instead of the spammer.


Say im a spammer. I send messages to [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED], etc and see which ones are accepted to gather valid 
addresses.


With sender verfication, spammer now sends messages to 
[EMAIL PROTECTED] with a return address of [EMAIL PROTECTED], 
[EMAIL PROTECTED], etc.  Your server does the sender check to see if 
[EMAIL PROTECTED] exists.  Your server is doing the work for the spammer now 
and looks exactly like a dictionary attack.  This could (and does) very 
easily get you onto several blacklists.


Sender verification?  Not for me, thanks.

-Jim


Re: How to disable autolearn for FuzzyOcr?

2006-10-16 Thread Jim Maul

D.J. wrote:

 I think what the original poster was asking was how to make the
 gibberish bodies not get Bayes scanned, so as to not pollute the
 database with text that isn't spammy.


Exactly my point.


Slightly off topic here, but I have a dumb question.  If you get a 
message with obvious bayes poison, what *should* you do?  Do you remove 
the poison and classify, or do you just not classify that message?


I train it just like you would any other message - especially since many 
get autolearned.  The 'poison' doesnt seem to have much of an affect. 
If anything, its a good spam indicator.


-Jim


Re: sa-learn and Caught spams

2006-09-27 Thread Jim Maul

Daniel T. Staal wrote:

On Wed, September 27, 2006 10:43 am, Matt Kettler said:

Mike Woods wrote:

Hi guys, bit of a query regarding sa-learn and messages that have
already been tagged as spam.

We have spamassassin scanning mail via amavisd and sending any caught
spams to a spam folder in the users accounts (using plus addressing),
we've also been getting users to drop any missed spams into this spam
folder so we can train spamassassin on them, at present I have a
script that moves *only* the missed spams to a master folder for
sa-learn, my question is simple, would there be any benefit in
including the mails identified as spam in this process, I know
sa-learn looks for common patterns in spams to identify them as spam
but im unsure if adding known spams in would be beneficial in this ?

YES. There is DEFINITELY a benefit to learning messages tagged as spam.
Even if they got BAYES_99.

Why? because spam mutates over time, and even if a spam got bayes_99, it
may still have new variants of hot words in it that will help it keep
hitting the same kind of spam as it changes. If you wait till this kind
of message mutates enough to no longer be bayes_99, you've put yourself
behind the curve, and now you have to catch up to the new variant.


While I in general agree with this, I was under the impression that
spamassassin will auto-learn from messages it marks.  (At least, past a
certain threshold.)  In which case, feeding the spam messages to it again
would bias the database towards spam, as the messages are being learned
twice.



I believe that SA will not learn a message it has seen before so 
multiple sa-learn's will not have any affect.



So the question would have to be: Does Spamassassin automatically update
the Bayes database from (some/any) messages it flags as spam or ham?



I would think only if you try to reverse/forget the original learning.



Daniel T. Staal


-Jim


Re: commerce Antispam Products

2006-09-14 Thread Jim Maul
Richard Collyer wrote:
 We're looking for a commerce antispam product.It should be high 
 performance and has the strong ability to capture spams.
 Could you recommend me a good product about it?We are an ISP,have 
 millions of users.
 (Please don't say Symantec's brightmail,it's fairly good,but it's too 
 expensive for us.)
 
 Is there any reason that you don't want to use spamassassin?


Or, more importantly, why are you asking this on the spamassassin
mailing list?  One would think the responses would be somewhat biased...

Would you sign up to a microsoft windows xp mailing list and ask for
suggestions on a good cheap operating system?

-Jim


Re: Strange Score

2006-08-25 Thread Jim Maul

Matt Kettler wrote:

Christopher Mills wrote:

Look at this,

X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on
chrysalis.chrysalishosting.com http://chrysalis.chrysalishosting.com
X-Spam-Level: 
X-Spam-Status: No, score= 4.3 required=5.0 tests=BAYES_50,HG_HORMONE,
HTML_40_50,HTML_MESSAGE,J_CHICKENPOX_43,J_CHICKENPOX_55,OFFER,
SPECIAL_OFFER,UNPARSEABLE_RELAY autolearn=no version=3.1.4
Received: from localhost by chrysalis.chrysalishosting.com
http://chrysalis.chrysalishosting.com
with SpamAssassin (version 3.1.4);
Fri, 25 Aug 2006 01:45:43 -0500

The score is off. It flagged the message as {Spam?} as it should,
because the required score is 5.
XSpam level shows 5 stars, but the line below says it got a spam score
of 4.3


Erm, I count 4 stars, not 5.

As for the spam tag in the subject, are you sure this message wasn't
scanned twice (possibly by the sender)? If you scan a message twice,
only the second set of X-Spam-* headers is present, but any other
changes from the first scan still hang around.





I have to say, the first 3 times I read this message, I counted 5 stars 
too.  Really strange..  if you look at it long enough you can see a guy 
in a boat fishing in the middle of the ocean!


-Jim


Re: SA-LEARN Question

2006-08-22 Thread Jim Maul

Christopher Mills wrote:

Hi,
We have over 100 domains on a server, all of which are getting junk mail. SA 
3.1.4 installed, but I don't think it's properly trained yet (even though I did 
upgrade from an earlier version).


If I set up a [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] address and 
tell all my customers to forward the junk mail they get to that address, then 
run sa-learn on that mailbox, will that help, or, will it train SA that the 
users that forwarded the junk ARE the spammers and start to assign higher scores 
to legitimate customers?


If you forward the emails, this process will not work.  You must either 
forward it as an attachment and then strip the attachment and run 
sa-learn on that or use some other method which preserves the original 
headers.  How you do this depends largely on your setup.


-Jim



Re: Image spam with inline jpeg image

2006-08-10 Thread Jim Maul

Bowie Bailey wrote:

Bret Miller wrote:

On Wed, 9 Aug 2006, Gary Funck wrote:

Has anyone considered also supplying new rules in the
form of rpm's available via a yum-compatible repository?
It'd be nice to have the usual versioning and logging
support as well as a central update facility.  This
could be done as a gateway to sa-update, perhaps
providing the updates in other package formats as well.

This is purely a philosophical argument, but something seems
wrong about the idea of using a package manager to manage
volatile data files in /var.

It also has the same problems as sa-update.  It's not very useful
unless you have one package/channel per ruleset and that is a bit
excessive considering that a ruleset is just a single file.


From my perspective, RDJ does a great job of handling the add-on

rulesets.  It's simple and flexible.  Why fix something that isn't
broken?

RDJ doesn't work in native Windows. Sa-update does. In my mind, that
makes RDJ *broken* if you're running Windows.


RDJ is a bash script.  It was written to run on the *nix systems that
most people use for SA.

It shouldn't be that difficult to create a version that works on
Windows.  My approach would be to port it to Perl and use LWP to do
the file transfers.



I think everyone is missing the point here.  This isnt a discussion 
about porting RDJ to windows or even about RDJ itself.


SA now has an application that is similar to RDJ in its function.  This 
is an offical part of SA and not an unsupported (by the SA team) add on. 
 The if it aint broken, why fix it argument doesnt apply here as its 
now being included with SA itself.  Its like saying hey look cars now 
come with seatbelts direct from the factory but im going to rip them out 
and install someone elses.  Theres just no point to it.  sa-update can 
be used to (among other things) replace RDJ.  It runs on windows an 
*nix.  Why would anyone spend their time even further developing RDJ, 
nevermind porting it to another OS when SA now has the same 
functionality built in?


-Jim



Re: Image spam with inline jpeg image

2006-08-10 Thread Jim Maul

Bowie Bailey wrote:


It doesn't really matter to me who supports which pieces as long as
they all work.

Someone may be able to fix sa-update so that it can take over from
RDJ, but as of now, that is not possible without configuring about 62
sa-update channels (one for each ruleset RDJ manages).



True, but doesnt that make more sense than having 2 separate programs 
which both pull down updated rules for SA, but from 2 different locations?


-Jim



Re: DEAR_SOMETHING rule scoring issue

2006-08-09 Thread Jim Maul

Gregory T Pelle wrote:

What is the procedure to have a rule score reviewed?

I have been looking over the scoring for version 3.1.x at

http://spamassassin.apache.org/tests_3_1_x.html

and think that a score of 1.6 is high for the DEAR_SOMETHING rule.  I
know that our customer support emails have the first line as Dear
customer's name  It would seem to me that any business that is
trying to sound professional would have emails that hit this rule.



I could be wrong on this as i am not much of a regex expert, but it 
doesnt appear that this rule will trigger on normal things like Dear Jim


checking 20_phrases.cf shows:

body DEAR_FRIEND/^\s*Dear Friend\b/i
describe DEAR_FRIENDDear Friend? That's not very dear!
body DEAR_SOMETHING /\bDear 
(?:IT\W|Internet|candidate|sirs?|madam|investor|travell?er|car 
shopper|web)\b/i

describe DEAR_SOMETHING Contains 'Dear (something)'

Can someone with some more regex experience confirm this?

-Jim


Re: SA to Outlook built-in junk email filter

2006-08-09 Thread Jim Maul

Matthew V wrote:

Hi there,


Hello.



Server: qmail vpopmail simscan-1.2 spamassassin-3 clamav


good


Client: Win2k/XP with Office 2003


not so good



I've been trying to get Outlook 2003 to automatically deposit mail 
marked by spamassassin as spam into its junk email folder. What I'm 
looking for is a built-in junk filter rule for Outlook that I can take 
advantage of to correctly handle email flagged by Spamassassin.


We have upwards of 100 Outlook clients, and I'm looking for an 
out-of-the-box approach so that I don't have to set up a custom rule in 
Outlook to handle ***SPAM***


This is an interesting idea.  I've never thought of using the built in 
filters to avoid having to make changes to every single client.




I found this list of built-in filters:
http://office.microsoft.com/en-us/assistance/HA010450051033.aspx



Seems straightforward enough.

However none of them seam to work (such as adding $$ or $! to the 
subject line).




In typical microsoft fashion. ;)  Anyway, wouldnt it be better to ask 
microsoft why their product doesnt work as expected instead of the SA 
mailing list?  This really has nothing at all to do with SA or spam.


-Jim


Re: Image spams getting thru

2006-08-02 Thread Jim Maul

John D. Hardin wrote:

On Tue, 1 Aug 2006, Theo Van Dinter wrote:


Except now you've also delayed your valid mail by 30 minutes or an
hour which sucks (and is sometimes completely unacceptable).


Repeat after me: Email is a non-guaranteed, Best Attempt delivery
mechanism. There may be delays.



Just because thats what it was designed to be, doesnt mean that it is. 
Email is whatever people use it for.  Its an instant messenger utility, 
its a file transfer mechanism, or even a replacement for the telephone 
or snail mail.  Many people have gotten used to the fact that email 
these days is usually freakin quick and to suddenly have that changed is 
unacceptable.


Imagine if car companies suddenly started making all vehicles with 4 
cylinder engines to help solve the current gasoline crisis.  It *would* 
help the problem and many people would embrace it, but for many others, 
its simply unacceptable.


-Jim


Re: Image spams getting thru

2006-08-01 Thread Jim Maul

John D. Hardin wrote:

On Tue, 1 Aug 2006, Ramprasad wrote:


  How about sending 450 Please Try later to ever mail with an
inline image and then somehow verify if it really comes back.
(Obviously not my original idea :-) )


The problem there, again, is that you've already used the bandwidth
and system resources needed to receive and scan the message. Why
explicitly say please re-send the message later, I'd like to use my
bandwidth and CPU resources to process it again? Would the benefit
outweigh the cost?

Then add in the infrastructure and long-term resources needed to
determine whether you've seen the message before and make a decision
based on that data.

How many spams would really comeback. max 20% 


There is a much lighter-weight and more global way to achieve that:
standard greylisting. 



Im curious how many organizations that arent ISPs are using some sort of 
greylisting.  Do your users complain when the email they sent to a 
fellow employee 17 seconds ago didnt arrive yet?  We hear all sorts of 
shit when things like that happen.  Try explaining greylisting and spam 
to some ICU nurse who really doesnt care.  All she knows is that we 
didnt have this problem when we paid to outsource our email.  For us, 
and im sure many others as well, greylisting is just not realistic.


-Jim


Re: Image spams getting thru

2006-08-01 Thread Jim Maul

Ken A wrote:



Jim Maul wrote:

John D. Hardin wrote:

On Tue, 1 Aug 2006, Ramprasad wrote:


  How about sending 450 Please Try later to ever mail with an
inline image and then somehow verify if it really comes back.
(Obviously not my original idea :-) )


The problem there, again, is that you've already used the bandwidth
and system resources needed to receive and scan the message. Why
explicitly say please re-send the message later, I'd like to use my
bandwidth and CPU resources to process it again? Would the benefit
outweigh the cost?

Then add in the infrastructure and long-term resources needed to
determine whether you've seen the message before and make a decision
based on that data.

How many spams would really comeback. max 20% 


There is a much lighter-weight and more global way to achieve that:
standard greylisting.


Im curious how many organizations that arent ISPs are using some sort 
of greylisting.  Do your users complain when the email they sent to 
a fellow employee 17 seconds ago didnt arrive yet?  We hear all sorts 
of shit when things like that happen.  Try explaining greylisting and 
spam to some ICU nurse who really doesnt care.  All she knows is that 
we didnt have this problem when we paid to outsource our email.  For 
us, and im sure many others as well, greylisting is just not realistic.



Well, you don't have to use it on internal mail. That's just a 
configuration issue.

Ken
Pacific.Net




True, and we would if we chose to use it at all.  My example was a 
little too generic I suppose.  We regularly have employees that use 
email as an instant messenger type of service with insurance companies, 
patients, doctors offices, etc.  For them, and ultimately us, the delay 
is simply not an option.


-Jim


Re: Subject header not detected after folded header

2006-07-31 Thread Jim Maul

Ben Wylie wrote:

Am running SpamAssassin 3.1.2 on Windows 2003 server.

This is an extract from the headers of an incoming email.
This triggered the MISSING_SUBJECT Missing Subject: header rule.
Why did this not detect the subject header?



Because its blank?



X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary=_=_NextPart_001_01C6B494.1CB001D7
Subject:
Date: Mon, 31 Jul 2006 12:26:05 +0100




-Jim



Re: Subject header not detected after folded header

2006-07-31 Thread Jim Maul

Ben Wylie wrote:

Jim Maul wrote:

Ben Wylie wrote:


Am running SpamAssassin 3.1.2 on Windows 2003 server.

This is an extract from the headers of an incoming email.
This triggered the MISSING_SUBJECT Missing Subject: header rule.
Why did this not detect the subject header?


Because its blank?


my understanding was that it only hit of the subject header was not 
present at all rather than it being blank.


Am i wrong?



Could be, or I could be the incorrect one here.  Being that the rule hit 
and the subject header was there but blank, i'd guess that the rule hits 
even when the header is there and blank.  Whether this was the intended 
outcome I do not know..


-Jim


Re: SA Score - Confidence Percentage

2006-07-26 Thread Jim Maul

Chris Santerre wrote:


  -Original Message-
  From: John Rudd [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, July 26, 2006 10:44 AM
  To: Chris Santerre
  Cc: Sietse van Zanen; SpamAssassin Users
  Subject: Re: SA Score - Confidence Percentage
 
 
 
  On Jul 26, 2006, at 6:40 AM, Chris Santerre wrote:
 
-Original Message-
From: John Rudd [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 26, 2006 6:38 AM
To: Sietse van Zanen
Cc: SpamAssassin Users
Subject: Re: SA Score - Confidence Percentage
   
   
   
I can see how plugins and add-on rules all affect it, but
  certainly
they have some sort of base comparison that lets them know
when they've
gotten the right score values for the base rules, right?
  
   I'm confused by your statement. (I'm also distracted by shiny
   objects)
  
   When rules scores are formed, they are scored based off a large
   corpus, additional rules, and set in the very moment they
  are scored.
 
  Yes, that is the corpus I am referring to.
 
  When that score is developed, how is it decided that the scores have
  settled?  When a 95% of the spam in the corpus got ranked 5 or
  higher?  80%?  100%?  That's the comparison I'm looking for.

Ahh the perceptron. Um... I'm not going to even pretend to tell you I 
understand how that mystical piece of code works. Something to do with goats and 
planetary alignments.


However, IMHO, the public corpus runs and perceptron are outdated the moment 
they are run. I also am one of the few nuts who think the perceptron hurts more 
then it helps. I have no data to back that up other then the tingly feeling in 
my tummy. And my tummy serves me well. ;)





Hurts more than it helps?  Probably not.  But it *does* cause weird 
things like BAYES_80 being scored higher than BAYES_95.


body   Bayesian spam probability is 80 to 95%  BAYES_80  0 0 3.608 2.0
body   Bayesian spam probability is 95 to 99%  BAYES_95  0 0 3.514 3.0

Must have been the goats...

-Jim


Re: Bayes Always On

2006-07-21 Thread Jim Maul

Duane Hill wrote:

On Fri, 21 Jul 2006, Bowie Bailey wrote:


Duane Hill wrote:

I recently upgraded SA to v3.1.3 on FreeBSD 6.0. I have also ran
sa-update.

I have found that no matter what I do to the local.cf with turning off
bayes, it is still being used. I have searched the system over and
have only found local.cf contained within /etc/mail/spamassassin.

Any ideas?


spamassassin -D config --lint

This will show you all of the configuration files that SA is reading.

Also, double-check that your mail processing is running the same SA as
your command line tests and that you are running as the same user.


Thanks much! I was simply looking for local.cf. Had I been looking for 
local.cf.sample I would have found it. And, now that I think about it. I 
do remember seeing a number of messages posted to this list that did 
state the default is moved to /usr/local/etc/mail/spamassassin upon 
running sa-update. Sorry to have bothered.





Wait, I thought SA loaded all .cf files automatically.  Now your telling 
me it loads .sample files as well?  Is this correct?


-Jim


Re: SA not tagging subject

2006-07-18 Thread Jim Maul

Bowie Bailey wrote:

tomcatf14 wrote:

I've disabled fast spamassassin and now it tag the subject!!!Good but
i think i still want to use Fast SA to enhance the performance.


What is fast spamassassin???



Basically spamc -c



The doc stated this:


The doc for what?



qmail-scanner



I want fast_spamassassin for performance - but I want the Subject:
header tagged as SPAM too! Boy - you don't want much do you! :-)
Anyway - you can. Simply change the --scanner option to
fast_spamassassin=STRING and STRING (SPAM: is a good value)
will be prepended to the Subject line of every message marked as Spam.


This is not part of the SpamAssassin config.



No, your right - its qmail-scanner.



But my qmail-scanner configure doesnt' have this option. I'm using
qmail-scanner 1.25. Try to use the latest version 2.0.1 and hope it
help. 


Maybe you should be asking on the qmail or qmail-scanner list?



He should have.  However, qmail-scanner does have this config option.

my @scanner_array=(clamdscan_scanner,spamassassin);

But this really has nothing to do with spamassassin as was mentioned...

-Jim



Re: SA not tagging subject

2006-07-17 Thread Jim Maul

tomcatf14 wrote:

What should i do if i want to use the current SA?


Follow the instructions that come with it instead of some outdated guide 
somewhere.


-Jim



Re: Blocking all inline GIF or JPG Images

2006-06-27 Thread Jim Maul

Matt wrote:

Hi,
What would I need to do to just outright block all e-mail that has an
inline gif or jpg (or multiple ones)?




You should do this in whatever program you have calling SA/AV/etc..  SA 
itself doesn't block anything.


-Jim



Re: content is being striped

2006-06-16 Thread Jim Maul

Michael Di Martino wrote:

jdow wrote:

It isn't SpamAssassin doing this. It may be a misconfigured procmail
rule. I presume it could also be a misconfigured rule from any OTHER
means of tossing mail into your mailbox. But it is NOT SpamAssassin
doing it.   


{^_^}
- Original Message -
From: Michael Di Martino [EMAIL PROTECTED]


I am currently using SA 3.1.3  with the following
Net-qmail  (LWQ)
Simscan 1.2
Ripmine
Clamav

The problem currently is that all messages are being delvered striped
of 
their Subjects and Content

I am complealy stumped by this and all my google searches have come up
empty. Any help would be greatly appreciated.

Below is my local.cf file

required_hits 8
report_safe 0
rewrite_header Subject [SPAM]


Thanks


It may not be SA but it is not procmail since I don't have procmail
instatlled.
However, whenever I turn on SA in my simcontrol file all messages are
stripped.




Its definitely something simscan is doing.  you may have better luck on 
the simscan list.


-Jim



Re: Newbie question

2006-06-06 Thread Jim Maul

Gary Forrest - Netnorth wrote:

Hi All

We have been using SA v3.1.1, all seems to work well :)
( FreeBSD 6.1, Sendmail 8.13.6  few milters )

Is it possible to get SA not to scan inbound email addressed to certain
domain names.



Yes, but not with SA itself.


We have looked at the various white listing functions available, by adding
into local.cf

 whitelist_from sender.com
 whitelist_to receiver.com

This sort of works, in that the email receives a negative score.
The problem is SA still spends time checking the email ( taking 3-12 seconds
to scan )


The better way is to not call SA at all for these accounts.



This server also performs Anti Virus function via Clam AV, and we would to
offer AV services to certain customers, with providing SA services.



The few milters you are using should be able to call 
SA/clamav/whatever selectively based on various rules you set up.


-Jim


Re: Need to edit this rule

2006-06-06 Thread Jim Maul

Will Nordmeyer wrote:
Just put 
score FROM_DOMAIN_NOVOWEL lowerscore


in your local.cf

(IE:
score FROM_DOMAIN_NOVOWEL 0.3

You don't want to adjust it in the master file - your adjustment would 
be overwritten everytime you upgraded.  



Not to mention that this will only affect mail TO your domain, not from 
you.  If other people you are sending mail to happen to use SA, this 
rule will still hit with their score on their systems.






Hi, all.

It seem that, just lately, the following rule is being hit:

FROM_DOMAIN_NOVOWEL domain has series of non-vowel letters

As our domain name contains a series of non-vowel letters, I'd like 
to 
reduce the score associated with this rule.  Problem is, I can't seem 
to locate it.  Can anyone point me to it?


Thanks.

Dimitri


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.












Re: Horde webmail spam report and spam assassin

2006-06-05 Thread Jim Maul

Alejandro Lengua wrote:

Horde webmail has a spam reporting feature, however it
is a bit useless.

Why?
Because it sends the email (without headers) to an email
address (the spam admin). This way is very difficult to
feed the spam mail into spam detection software.

I wonder if somebody has done anything to make it work
with  the SA-Learn feature of Spam Assassin.



I havent played around with horde/imp lately but last i checked, the 
report spam option actually passed the message to spamassassin -r.  Your 
saying it no longer does this?  I'll have to check it out..


-Jim



Re: Spamassassin + Kaspersky SMTP-Scanner

2006-05-11 Thread Jim Maul

Rick Macdougall wrote:

Thomas Gross wrote:

Hi List!

 


I'm runing a debian mailserver with qmail 1.03, vpopmail and kaspersky
anti-virus smtp-scanner 5.5.3.

 

Now i wanted to add the latest spamassassin to filter the spam which 
grows

up to 500 mails per day.

 

I searched the whole internet for a possible configuration with no 
result.


 


Has anyone a solution for that ?



simscan.

You can find it at http://www.inter7.com/?page=simscan

Regards,

Rick





or qmail-scanner

http://qmail-scanner.sf.net

-Jim


Re: Spam coming thru w/high score different SA version

2006-04-27 Thread Jim Maul

Tracey Gates wrote:

I checked and did find 2 spamd files.  One was in /usr/bin with the
latest install date.  The other one was in /etc/rc.d/init.d with the
older install date.  I backed up the older files and replaced the ones
that are in the init.d directory with the ones from the /usr/bin
directory.  (replaced the spamassassin and spamd files).  I then stopped
and restarted spamd.  I checked emails that came in after the restart of
spamd and they still say version 3.0.2.  I'm guessing I'm still missing
something?

Any suggestions?



Yeah, the one in /usr/bin is the actual file.  The one you see in 
/etc/rc.d/init.d is a startup script for spamd.  These are NOT the same 
thing.  You do NOT want to copy /usr/bin/spamd to 
/etc/rc.d/init.d/spamd!  What you did will cause spamd to not start up 
on reboot anymore.  I would undo what you did and look around some more 
for another copy.  the one in init.d is not what your looking for.


-Jim


Re: Spam coming thru w/high score different SA version

2006-04-27 Thread Jim Maul

Tracey Gates wrote:

OK.  Sorry, I'm a novice at all of this admin stuff.  I replaced the old
files back and restarted spamd again.  I did a find for spamd and here
is my results:

[EMAIL PROTECTED] /]# find ./ -name spamd
find: ./proc/9832/fd: No such file or directory
./etc/rc.d/init.d/spamd
./usr/bin/spamd
./usr/src/redhat/BUILD/Mail-SpamAssassin-3.0.2/spamd
./usr/src/redhat/BUILD/Mail-SpamAssassin-3.0.2/spamd/spamd
./usr/src/redhat/BUILD/Mail-SpamAssassin-3.0.2/blib/script/spamd
./usr/src/redhat/BUILD/Mail-SpamAssassin-3.1.1/spamd
./usr/src/redhat/BUILD/Mail-SpamAssassin-3.1.1/spamd/spamd
./usr/src/redhat/BUILD/Mail-SpamAssassin-3.1.1/blib/script/spamd
./home/administrator/Mail-SpamAssassin-3.0.2/spamd
./home/administrator/Mail-SpamAssassin-3.0.2/spamd/spamd
./home/administrator/Mail-SpamAssassin-3.0.2/blib/script/spamd

Am I looking for the correct file?  I don't see what it might be picking
up the wrong version.



Ah, wait! I just went back and re-read the original email.  You said:

I'm running on a RedHat ES 3.0 using CommuniGatePro and CGPSA.  The 
CGPSA.conf file points to the correct directories for my SA 
installation.  Any suggestions would be a great help.


This link: http://www.tffenterprises.com/cgpsa/
says:

The filter works efficiently, by directly using the SpamAssassin API. 
It does not rely on a daemon process such as spamd or on the execution 
of shell scripts (as the usual process for utilizing SpamAssassin with 
CommuniGate servers does). It can safely be used with multiple 
CommuniGate Pro enqueuer threads.


So... basically, your not using spamd.  Looking for another copy of it 
is pointless.  It seems you have 2 copies of the SpamAssassin API laying 
around.


-Jim


Re: Messages Not detected as Spam

2006-04-26 Thread Jim Maul

Paul Wetter wrote:
Ok, I added what you said.  I think things may be back on the up and in 
operation.  Some spam however is still not detected, which brings me to 
my next question.


I have one other question about razor checks.  They do not appear to be 
working.  If I do a manual check (with the amavis user) it logs the 
message as a spam message in the razor-agent.log file.  Yet running the 
same thing through spamassassin does not show any razor checks picking 
it up and also it does not log anything in the razor-agent.log file 
either way.


In local.cf I have the following 3 lines related to razor:

loadplugin Mail::SpamAssassin::Plugin::Razor2
use_razor2 1
razor_config /pathtoconfig/.razor/razor-agent.conf

Am I missing something?  Is this correct?

From what I see my SpamAssassin install is not doing the razor checks.


Thanks in advance.
-Paul



Dont loadplugin statements go in init.pre not local.cf?

Im still on 2.64 so i could be completely wrong on this one...

-Jim


Re: Permission errors

2006-04-25 Thread Jim Maul

Igor Chudov wrote:

Doing some housecleaning...

I am running spamd as root, at which point it reverts to 'nobody'.

It then proceeds to complain, understandably, that it does not have
permission to write to users' directories. 



Apr 24 23:56:57 manifold spamd[21442]: spamd: still running as root:
user not specified with -u, not found, or set to root, falling back to
nobody at /usr/bin/spamd line 1152, GEN353 line 4. 
Apr 24 23:56:57 manifold spamd[21442]: spamd: processing message
[EMAIL PROTECTED] for root:99 
Apr 24 23:56:58 manifold spamd[21442]: locker: safe_lock: cannot

create tmp lockfile
/root/.spamassassin/auto-whitelist.lock.manifold.algebra.com.21442 for
/root/.spamassassin/auto-whitelist.lock: Permission denied 
Apr 24 23:56:58 manifold spamd[21442]: auto-whitelist: open of

auto-whitelist file failed: locker: safe_lock: cannot create tmp
lockfile
/root/.spamassassin/auto-whitelist.lock.manifold.algebra.com.21442 for
/root/.spamassassin/auto-whitelist.lock: Permission denied 



I am in a cleanup mode and would like to get rid of these errors, but
this one has me stumped. How can it expect to access inside root's
directory, if it runs as nobody???

i





Its not supposed to access root's home directory.  Set up another user 
(spamd or something) that spamd is going to run as and store all bayes, 
whitelist, etc files in that homedir instead.  There shouldnt be any SA 
files in root as spamd wont be able to read them.


-Jim


  1   2   3   >