Re: Is Bayes Dead? Have the spammers won?

2007-03-22 Thread Michel R Vaillancourt

Henrik Krohns wrote:

On Thu, Mar 22, 2007 at 09:55:07AM -0700, Marc Perkel wrote:
Maybe I'm doing something wrong but with the various methods of bayes 
poisoning going on I've found that bayes is just lowering the score of 
spam and causing more spam to get through.


So is there actually any real proof that Bayes poisoning works? I've yet to
find any evidence. All the cases have been admins/users messing it up
themselves.


In point of fact, my own experience is that poisoning attempts make no difference 
at all.  Because the number of poison tokens in an established database is so small, they 
don't change anything.  However the incidence of other spam-positive keys tips the 
hand.

I use auto-learning.  Always have.  It has NEVER been a problem;  if I 
get an FP or FN, I resubmit those mails for retraining to the DB.

I've even gone so far as to take a Spam mail that was visually more than 80% 
poison, copy the poison out, put it around another spam mail and mail it to 
myself from a dummy account.  Result?  Bayes_99.  Took the same poision, wrapped it 
around a legitimate mail and sent it to myself.  Result?  Bayes_00.  You can't keep a 
good Bays down;  auto-learned or not.

And I'm a little guy; 5000 messages a day ... 1 if the lists I host 
are busy.  Its not like I have a massive bayes DB to work against.  The Big 
Boys should be even more accurate just by raw weight of statistical incidence.  
Bayes Poison is fiction;  its not even good fiction.
--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Re: How to Limit SA scan jobs

2007-01-31 Thread Michel R Vaillancourt

Tom wrote:
How do I stop Spamassassin from scanning email for a particular user 
(email account) on the server.


I have Spamassassin 3.1.7 finally working and would like to send all 
messages identifed as spam to a user on the server
The problem is now those messages that go to the spam account gets 
rescanned




http://wiki.apache.org/spamassassin/AllSpamToFiltering

--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Re: Avoiding Bayes Poison

2007-01-11 Thread Michel R Vaillancourt

Clay Davis wrote:
Over the past several months I have been saving the spam that slips 
through to my users accounts to train my bayes with.  I notice that 
lately almost all of it has (what I am assuming to be) an attempt to 
poison my bayes (a bunch of valid words put together in a nonsensical 
paragraph) at the bottom of it.
 
How much should I worry about this type of spam and how it will affect 
my bayes db?  Work arounds?  Advice?
 
Thanks, gang.
 
Clay


Hi, Clay.  Without getting into the math behind it, Bayes poisoning is almost impossible.  I have been training 
my Bayes DB with everything I consider spam, wether it has a poison section or not.  I'm almost 
always seeing a BAYES_99 result on these poisoned emails.  Why?  Because the key tokens that make it spam 
are repeated;  the poison text is not.

I use a combination of auto-training and hand-correction with my DB.  I only 
correct if the answer is not a BAYES_99.  Don't sweat the poison, Bayes is 
almost immune to Iocane, etc.

--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Re: Avoiding Bayes Poison

2007-01-11 Thread Michel R Vaillancourt

Clay Davis wrote:

Thanks, Michel.  How do you correct?  Run it back through as ham?
C



All my user accounts have system-created ConfirmedSpam and ConfirmedNotSpam 
folders.  If the SA system makes a mistake, they just drag-and-drop the email into the right folder.  Every 
night, the server runs a batch job that sa-learns the contents of these folders and then empties 
them.  It works like a charm.

I'm extra fussy, and any spam that scores less than a BAYES_99 gets dumped into 
the ConfirmedSpam folder... even a BAYES_95.  The result is that I only get 
an inaccurate score about once a day on 10,000 mail handled.  My bayes_db folder is about 
60Mb in total, just FYI.
--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Re: roaming users sending mail internally and dynamic IPs issue

2006-12-18 Thread Michel R Vaillancourt

Thomas Bolioli wrote:
Thanks for the response. SMTP auth is set up so there must be something 
I need to do to tell SA that it was auth'd.

Any ideas?
Thanks,
Tom


One solution that I used for this problem was a custom rule.  We had one client site that had a lot of 
roadwarriors  so they had their own SMTP machine.  On that machine, I used a mail-filter to add an X 
tag with an MD5 hash of the company name as a validation stamp.  Every email coming into that machine from the 
Roadwarrriors got the Stamp.  The MX boxen all had a custom rule in SA that took 20 points off.  Every 
Sunday night, the system redid the Stamp and passed out new versions of the rule with the correct 
Stamp in it to the MX boxen.

It worked like a charm.
--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Re: Synchronizing two Bayes database

2006-12-08 Thread Michel R Vaillancourt

Emmanuel Lesouef wrote:

Yes, I was thinking about this solution.

But isn't it network ressource hungry ?

And if I would like to keep a files based bayes db, what should be the
good manner to migrate one to another server ?

Thanks Sietse for the advice.

Sietse van Zanen a écrit :

Sure, use MySQL for bayes storage and have both servers use that DB.
Then you could be fairly sure, both use the same bayes.
 
I think it should even be possible to dump both databases and migrate

into one SQL db. But I don't use MySQL myself, so I would not know how.
 
-Sietse
 


On your most accurate machine, run a CRON job that once a week does:

sa-learn --siteconfigpath=/your/site/path --force-expire
sa-learn --siteconfigpath=/your/site/path --backup   /tmp/weeklyMerge.sal.bak
scp /tmp/weeklyMerge.sal.bak [EMAIL PROTECTED]://tmp/weeklyMerge.sal.bak
mv /tmp/weeklyMerge.sal.bak /tmp/weeklyMerge.sal.sent

... use ssh key-auth so no password interaction is required for your 
robot account.

On the other.machine.tld run a cron job that fires one hour later 
that:

sa-learn --siteconfigpath=/your/site/path --restore /tmp/weeklyMerge.sal.bak
mv /tmp/weeklyMerge.sal.bak /tmp/weeklyMerge.sal.restored
sa-learn --siteconfigpath=/your/site/path --force-expire

--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Re: New Spam

2006-11-17 Thread Michel R Vaillancourt

Evan Platt wrote:

At 07:40 AM 11/17/2006, you wrote:
I'm getting some new spam coming through.. It's ASCII art (using 
nothing but numbers) and spells out TORA.08 and nothing else..


It looks to be coming from a Bot-Net..  Anyone seen this?

Thanks, Billy


Just got 2 also to 2 different e-mail addresses.


Yep.  Saw this just within the past ten mins.

--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


*****SPAM***** ... This Just In / Thought I'd Share ...

2006-11-14 Thread Michel R Vaillancourt
Spam detection software, running on the system empire.wolfstar.ca, has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
the administrator of that system for details.

Content preview:  LOL ... stupid spammer tricks... check the message ID:
  Nov 14 11:08:47 ext1 spamd[21707]: spamd: identified spam (49.6/5.0) for
  nobody:8 in 7.5 seconds, 3796 bytes. Nov 14 11:08:47 ext1 spamd[21707]:
  spamd: result: Y 49 -
  
BAYES_99,FORGED_YAHOO_RCVD,INVALID_TZ_CST,MIME_BASE64_NO_NAME,MIME_BASE64_TEXT,MIME_BOUND_DD_DIGITS,PERCENT_RANDOM,RCVD_IN_DSBL,SARE_BOUNDARY_07,SARE_RAND_2,SARE_RAND_2J,SARE_RAND_2W,SARE_SUB_MSG_SUBJ,SPF_HELO_PASS,UNRESOLVED_TEMPLATE,UNVERIFIED_YAHOO,UPPERCASE_25_50,URIBL_JP_SURBL,URIBL_SC_SURBL,X_MESSAGE_INFO,rantext35,rantext36,rantext37
  
scantime=7.5,size=3796,user=nobody,uid=8,required_score=5.0,rhost=localhost.localdomain,raddr=127.0.0.1,rport=3990,mid=[EMAIL
 PROTECTED],bayes=1,autolearn=spam
  [...] 

Content analysis details:   (6.4 points, 5.0 required)

 pts rule name  description
 -- --
-0.0 SPF_PASS   SPF: sender matches SPF record
-0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
-2.6 BAYES_00   BODY: Bayesian spam probability is 0 to 1%
[score: 0.0001]
 2.0 rantext35  FULL: Random text 35
 2.0 rantext36  FULL: Random text 36
 2.0 rantext37  FULL: Random text 37
 2.0 RCVD_IN_SORBS_DUL  RBL: SORBS: sent directly from dynamic IP address
[70.50.245.54 listed in dnsbl.sorbs.net]
 1.5 SARE_RAND_2W   SARE_RAND_2W
 2.5 SARE_RAND_2SARE_RAND_2
 0.0 UPPERCASE_25_50message body is 25-50% uppercase
 2.3 PERCENT_RANDOM Message has a random macro in it
 1.5 SARE_RAND_2J   SARE_RAND_2J
-6.8 AWLAWL: From: address is in the auto white-list


---BeginMessage---

LOL ...  stupid spammer tricks... check the message ID:

Nov 14 11:08:47 ext1 spamd[21707]: spamd: identified spam (49.6/5.0) for 
nobody:8 in 7.5 seconds, 3796 bytes.
Nov 14 11:08:47 ext1 spamd[21707]: spamd: result: Y 49 - 
BAYES_99,FORGED_YAHOO_RCVD,INVALID_TZ_CST,MIME_BASE64_NO_NAME,MIME_BASE64_TEXT,MIME_BOUND_DD_DIGITS,PERCENT_RANDOM,RCVD_IN_DSBL,SARE_BOUNDARY_07,SARE_RAND_2,SARE_RAND_2J,SARE_RAND_2W,SARE_SUB_MSG_SUBJ,SPF_HELO_PASS,UNRESOLVED_TEMPLATE,UNVERIFIED_YAHOO,UPPERCASE_25_50,URIBL_JP_SURBL,URIBL_SC_SURBL,X_MESSAGE_INFO,rantext35,rantext36,rantext37
 
scantime=7.5,size=3796,user=nobody,uid=8,required_score=5.0,rhost=localhost.localdomain,raddr=127.0.0.1,rport=3990,mid=[EMAIL
 PROTECTED],bayes=1,autolearn=spam

--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca
---End Message---


Re: Joe Blow wrote: Spam

2006-10-18 Thread Michel R Vaillancourt


They are adding new PCs to the bot-nets used for spam faster than the
DNSBL operators can update the lists.

  -- Clifton



	I've just made my personal additional rules-set available at 
http://empire.wolfstar.ca/spamAssassin/ ... specifically, 
WOLFSTAR_SOMEONEWROTESTOCKUCE.cf adds 1.75 to that UCE's score, which 
seems to be enough to trip it into the spam category on my servers. 
Thanks to Peter H. Lemieux for one of the patterns I am using.


--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Re: sa-update versus rulesdujour questions

2006-10-18 Thread Michel R Vaillancourt

Jo Rhett wrote:


On Oct 18, 2006, at 11:15 AM, Tim Litwiller wrote:
I've never changed anything in local.cf when using RDJ - what did you 
have to change?


Reading RDJ setup, it kept mentioning that I would have to add 
statements to local.conf for each and every ruleset that I imported.  
This is why I didn't bother, and used sa-update instead.


I'm wondering if there is anything I'm missing...

	Yes.  Its not /etc/spamassassin/local.cf you add lines to.  Rather, it 
is /etc/rulesdujour/config.


Mine looks like:
ext1:~# cat /etc/rulesdujour/config
TRUSTED_RULESETS=TRIPWIRE ANTIDRUG SARE_EVILNUMBERS0 SARE_EVILNUMBERS1 
SARE_EVILNUMBERS2 RANDOMVAL BOGUSVIRUS SARE_ADULT SARE_FRAUD SARE_BML 
SARE_RATWARE SARE_SPOOF SARE_BAYES_POISON_NXM SARE_OEM SARE_RANDOM 
SARE_HEADER SARE_HEADER0 SARE_HEADER1 SARE_HEADER2 SARE_HEADER3 
SARE_HEADER_ENG SARE_HEADER_X264_X30 SARE_HEADER_X30 SARE_HTML 
SARE_HTML0 SARE_HTML1 SARE_HTML2 SARE_HTML3 SARE_HTML4 SARE_HTML_ENG 
SARE_SPECIFIC SARE_OBFU SARE_OBFU0 SARE_OBFU1 SARE_OBFU2 SARE_OBFU3 
SARE_REDIRECT SARE_REDIRECT_POST300 SARE_SPAMCOP_TOP200 SARE_GENLSUBJ 
SARE_GENLSUBJ0 SARE_GENLSUBJ1 SARE_GENLSUBJ2 SARE_GENLSUBJ3 
SARE_GENLSUBJ_X30 SARE_GENLSUBJ_ENG SARE_HIGHRISK SARE_UNSUB SARE_URI0 
SARE_URI1 SARE_URI2 SARE_URI3 SARE_URI_ENG SARE_WHITELIST

MAIL_ADDRESS=root
SINGLE_EMAIL_ONLY=true

	... the idea is that you have to explicitly turn on by inclusions the 
rules you want.  Run as lean or as heavy as you like.  Mine is damn near 
everything available, save the PRE300 sets and the two BLACKLIST sets.


Works just ducky.

--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Re: Are other people seeing higher Load Averages after moving to 3.1.7?

2006-10-18 Thread Michel R Vaillancourt

Craig Baird wrote:
I think spam is *way* up the last week or two.  My server started 
hovering at

a load average of around 55 a week or so ago.  I started doing some
investigating when I realized that the load was not coming down.  I found
that My server has been taking between 400,000 and 500,000 messages per 
day. A few months ago, it was more like 150,000 to 200,000 per day.  
Unfortunately,
I moved logging over to a new syslog server recently, so I can't say 
whether
the increase was sudden or gradual.  I think some of it has been 
gradual, but
it sure feels like it's only been the past few weeks that we've been 
getting

hit *really* hard.  After deciding that the load average was likely due to
actual spam load, I implemented a couple of RBLs at the MTA level.  My load
is now back down between 1 and 3, and messages making it through to SA are
now back to around 200,000 per day.

Craig

	Even my admittedly tiny shop has seen a 50% increase in traffic in the 
past two weeks... from 10k messages a day to 15k - 16k average.  Almost 
100% of that increase is UCE.


--
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca