Re: sa-learn weirdness...

2008-02-08 Thread Paolo Cravero

Arthur Dent wrote:


Hmmm... Not delete exactly, but the sa-learn job take so long that the
archivemail job has kicked off and finds the TempSpam and TempHam mboxes
in the Mail directory and dutifully chops out anything older than 180 days. I
didn't think that that would be a problem, but maybe it's upsetting sa-learn?
I will try switch the order of the jobs (archivemail running first) and see if
that makes a difference.. 


At this point you have probably already swapped the two processes.

I think sa-learn or the process feeding it does not like the chopping.


Well, as I explained in my previous post, the TempHam folder is a
concatenation of all my non-spam folders. Mail that is older than 180 days is
taken off at one end and new mail (c. 30-40 per day) added on at the other.
The total remains roughly constant.


Don't forget that sa-learn remembers which messages have been learned. Once 
your old messages have all been learned, you need to feed to it only new 
arrivals, that is since the last sa-learn run. No need to keep 180 days worth 
of ham and spam in the temp folder!



Let sa-learn complete and then chop the folder. Just concatenate the process 
rather than schedule it in crontab. It should fix your apparent weirdness.


Paolo



Re: sa-learn weirdness...

2008-02-06 Thread Paolo Cravero

Arthur Dent wrote:


Learned tokens from 8 message(s) (3165 message(s) examined)
Learned tokens from 4628 message(s) (8703 message(s) examined)
Learned tokens from 3890 message(s) (8634 message(s) examined)
Learned tokens from 2264 message(s) (8671 message(s) examined)
Learned tokens from 2303 message(s) (8620 message(s) examined)


Odds 2,000,127 against one... and counting...


Notice that although the amount of tokens being learned seems to be coming
down gradually, the total far exceeds the total amount of ham mails in the
corpus.


The number of *messages* learned is decreasing, not the number of tokens.

Could it be that something deletes the temp folder before sa-learn has 
finished, so it gets distracted and starts flying away carrying a suitcase?


Or do you receive 8600 messages each day? Some of them might have been 
autolearned on the incoming SMTP channel, BTW.


IMHO it is not necessary to train so extensively the Bayes DB. If you want the 
process to complete in a decent amount of time, feed it fewer messages at a time.


Paolo

PS: who knows who Arthud Dent is/was, will understand the oddities in this 
reply. All others: get a copy of the HHGTTG. :-)


Re: Spam Assassin Load Balancing

2008-01-08 Thread Paolo Cravero

Thomas Ledbetter wrote:

First of all: we're running amavisd-new, not plain spamc/spamd anymore.

We used to have N servers each running its own spamd deamons, so with separate 
Bayes/AWL DB.


I have not understood how many machines run spamc and how many spamd.

With a rounb robin policy on a hardware load balancer, once the 
connection is routed to a specific 'worker bee', if that machine times 
out, the request will fail, and the mail wont get scanned.  However, 
more intelligent hardware load balancing setups can monitor the work on 
each node, and take it out of service as necessary.


A load balancer sets as offline non-responding nodes, according to a different 
level of checks (ICMP ping, TCP ping, service check, ...). But these checks 
are not in real-time, so if spamd dies during analysis the connection will 
drop (or hang) and spamc will timeout. The load balancer won't restart the 
connection to another node. At least not our HLB. Been there (with LDAP), done 
that!


Also, when running a round-robin based cluster, is there any problem 
having a mix of machines with different performance capacities?  i.e. If 
I have a 10 node cluster, and 3 of the servers are much slower than the 
others, will it impact performance of the cluster as a whole?  Even if I 
limit the number of spamd that run to a lower value than the higher 
performance machines?


What do you consider as performance? I think the global average analysis 
time (what I call performance) will obviously be affected, to an amount that 
depends on load distribution. With a real load balancer you can use different 
priorities for each node, so to keep faster machines more busy than slower ones.


Anyway, I've seen spamd running on different hardware since 2004 and I 
wouldn't say the analysis speed has been improved significantly. Just don't 
let spamd nodes swap memory to disk.


Good luck with the high-load spam fight,
Paolo


Re: OT - massive newsletter

2007-09-19 Thread Paolo Cravero

mizzio wrote:


I'm setting up an SMTP server (centos + qmail) on a dell quad core
machine for sending out a periodic newsletter (10 millions a month).

In order to avoid any possible blacklisting problem, I'm looking for all
the best practices. Right now I've set up:


You need EXPLICIT authorization (opt-in) of all recipients and be able to 
prove it. This is required by EU (and thus your/my country law) and the best 
insurance not to end up in blacklists.


Good luck,
Paolo



Re: Blocking MMS messages?

2007-02-13 Thread Paolo Cravero

Steve Monkhouse wrote:


Yeah that works for that one.. but with multiple originating sources and
multiple carriers etc I thought there must be a better way than manually
enetering every mms provider... ??


I'm old fashioned and don't own an MMS-enabled phone, but phone numbers
are generally 12 numbers long if in the standard international form,
prefixed with a +.

I just sent myself an SMS-to-email with Vodafone Italy and hit these rules:
X-Spam-Status: No, score=2.532 tagged_above=-999 required=3.5
tests=[BAYES_00=-2.599, DNS_FROM_RFC_ABUSE=0.2,
FORGED_RCVD_HELO=0.135, FROM_ENDS_IN_NUMS=2.53,
FROM_LOCAL_HEX=1.305, NO_REAL_NAME=0.9

while the sender was [EMAIL PROTECTED] Take a survey of how your
local providers format senders and write a set of rules accordingly.

Paolo



spamd errors... SpamdForkScaling.pm

2006-12-18 Thread Paolo Cravero
Got these errors in maillog on a postfix+spamc/spamd Linux RedHat ES3 
installation. Looks like this issue has not been fixed in 3.1.7, 
targeted for 3.1.9?


Could it be that the system runs out of file descriptors? Don't think so...

[EMAIL PROTECTED] cat /proc/sys/fs/file-nr
84314030314564
[EMAIL PROTECTED] cat /proc/sys/fs/file-max
314564

Here's an excerpt from maillog. Process 31633 is the spamd master.


Dec 18 11:20:39 srv-asgw02 spamd[31633]: prefork: child states: BIIBBIB

Dec 18 11:20:39 srv-asgw02 spamd[31633]: spamd: handled cleanup of child 
pid 5654 due to SIGCHLD


Dec 18 11:20:39 srv-asgw02 spamd[31633]: prefork: child states: BIIBBB

Dec 18 11:20:39 srv-asgw02 spamd[31633]: syswrite() on closed filehandle 
GEN452736 at /usr/lib/perl5/5.8.0/i386-linux-thread-

multi/IO/Handle.pm line 447.

Dec 18 11:20:39 srv-asgw02 spamd[31633]: Use of uninitialized value in 
concatenation (.) or string at /usr/lib/perl5/site_per

l/5.8.0/Mail/SpamAssassin/SpamdForkScaling.pm line 419.

Dec 18 11:20:39 srv-asgw02 spamd[31633]: prefork: killing rogue child 
330, failed to write on fd :


Dec 18 11:20:39 srv-asgw02 spamd[31633]: prefork: killing failed child 
330 fd=undefined at /usr/lib/perl5/site_perl/5.8.0/Mai

l/SpamAssassin/SpamdForkScaling.pm line 137.

Dec 18 11:20:39 srv-asgw02 spamd[31633]: prefork: killed child 330

Dec 18 11:20:39 srv-asgw02 spamd[31633]: prefork: child states: BKBBBI


Paolo


Re: bayes_seen on MySQL, growing and growing

2006-11-17 Thread Paolo Cravero

Jim Maul wrote:

I dont use mysql with SA, but you should be able to use truncate instead 
of delete.  It may very well be faster with all those rows.


From MySQL 4.x manual:

For InnoDB, TRUNCATE TABLE  is mapped to DELETE, so there is no 
difference.


We're using InnoDB rather than MyISAM, so there's apparently no big 
difference. It doesn't free disk space, though, so an OPTIMIZE TABLE 
should be issued.


Still no input from developers/maintainers can I empty the 
bayes_seen table without breaking DB consistency?


Thanks,
Paolo


bayes_seen on MySQL, growing and growing

2006-11-13 Thread Paolo Cravero

Hi,
while doing some checkup on production servers, I noticed that the 
bayes_seen table on MySQL is rather big:


row: 15'814'021 (15.8Mr)
size: 1'853'882'368 bytes   ( 1.8GB)

I've understood SA doesn't clean-up that table, so it has to be done 
manually.


Can I simply do a DELETE * FROM bayes_seen and live long and employed? 
;-) I know it works if Bayes is on files. I would also OPTIMIZE TABLE 
bayes_seen to regain the disk space.


It would be probably faster to delete and re-create the table, but on a 
production system...


Any other issues?

TIA,
Paolo



FP with Outook SMTPing to Lotus Domino

2006-08-25 Thread Paolo Cravero as2594

Hi,
I just spotted this FP in our SA 3.1.4 quarantine...

I have no means to contact the sender, but I guess he used an Outlook 
(Express?) client to SMTP a Domino server.


Even if we had the threshold at the default 5 it would have been 
stopped. Is there a workaround on the rules or should I decrease some 
scores?!


Moreover PRIORITY_NO_NAME is not listed in 
http://spamassassin.apache.org/tests_3_1_x.html but is present in my 
20_head_tests.cf (require_version 3.001004).


TIA,
Paolo

X-Spam-Status: Yes, score=5.091 tag=-999 tag2=3.5 kill=3.5
  tests=[BAYES_00=-2.599, HTML_40_50=0.496, HTML_MESSAGE=0.001,
  MSGID_DOLLARS=1.716, PRIORITY_NO_NAME=2.7,
  RATWARE_OUTLOOK_NONAME=2.777]
Received: from smarthost02.ISP.it
  (smarthost02.ISP.it [xxx.yyy.zzz.nnn])
  by MYamavisSERVER.it (Postfix) with ESMTP id 777AD5840A5;
Fri, 25 Aug 2006 09:12:11 +0200 (CEST)
Received: from relay03.portal ([192.168.bbb.aaa])
  by smarthost02.ISP.it (Lotus Domino Release
  6.5.1) with ESMTP id 2006082509000547-2363 ;
  Fri, 25 Aug 2006 09:00:05 +0200
Received: from acme ([xxx.yyy.zzz.mmm])
  by relay03.portal (Lotus Domino Release 6.5.1)
  with ESMTP id 2006082509004734-2554 ;
  Fri, 25 Aug 2006 09:00:47 +0200
Message-ID: [EMAIL PROTECTED]
From: RFC2822 COMPLIANT [EMAIL PROTECTED]
To: RFC2822 COMPLIANT
Subject: RFC2822 COMPLIANT
Date: Fri, 25 Aug 2006 09:11:30 +0200
MIME-Version: 1.0
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1807
Content-Type: multipart/alternative;
  boundary==_NextPart_000_0005_01C6C826.73BE4680



Re: FP with Outook SMTPing to Lotus Domino

2006-08-25 Thread Paolo Cravero as2594

Randal, Phil wrote:


You might wish to look at tweaking your BAYES_xx scores to reduce false
positives.

I guess that depends on how healthy your Bayes database is, though.


Can't really say how healthy it is. 99% of spam (guessing, but pretty 
close) is in English language, 99% of our ham is in Italian language.


Spam in Italian is so rare (so far!) that I had to write custom rules to 
catch specific spam, because Bayes wouldn't hit hard enough after 
several training rounds.


So... our Bayes is probably highly unbalanced due to the nature of our 
traffic and spam. Am I right? Any workaround?


Paolo



Re: false positive on FORGED_MUA_OUTLOOK (v.3.1)

2006-04-05 Thread Paolo Cravero as2594

Tony Finch wrote:

The following headers come from a legitimate message - I have obscured the
sender's name, but that's all. The SlipStream SP Server seems to have
appended the client username and IP address to the message-ID, causing the
FP. See also:
http://mail-archives.apache.org/mod_mbox/spamassassin-users/200509.mbox/[EMAIL 
PROTECTED]


Yep! That was me! :-)

I investigated with the sender of that message, and since he's a friend, 
I could ask him all sorts of questions.


Turned out that he used a dialup connection *and* a dialup connection 
accelerator offered by the provider itself. I don't know how that [EMAIL PROTECTED] 
thing works, but it probably re-routes all IP traffic through a 
software-compressed tunnel established between the PC and provider's 
servers. Don't know where, Message-IDs are altered, but not by the 
client itself.


I tried the same dialup without compression software and everything went 
fine.


So, the FORGED rule triggers correctly. Someone deals improperly with 
Message-IDs!


Paolo


Re: sa-learn Lotus Notes

2006-04-05 Thread Paolo Cravero as2594

Andy Jezierski wrote:

There have been numerous threads on how to have end users drop 
misclassified mail to spam/ham folders in Exchange, but I don't recall 
seeing any mention of a way of doing this with Notes.


Although we don't let users train Bayes, Lotus client and server from 
version 5 and above support IMAP, both as a client and as a server.


When I need to extract a message from a LN mailbox I open an IMAP 
mailbox and copy it there. Or, the other way around, I access my LN 
mailbox via IMAP.


Don't know if LN supports shared IMAP folders, or proxy authentication. 
But this need depends if you're training shared Bayes or per-user.


Paolo


Re: Idea for new SA Rule

2006-04-05 Thread Paolo Cravero as2594

Gustafson, Tim wrote:


Could SpamAssassin benefit from a filter that would actually check the
spelling of the text parts of the message, and if misspelled words
exceeds, for example, 50%, then we can add a few points to the SPAM
score?  I'm not sure how to begin coding this, but I think it should be
pretty easy (using pSpell or aSpell or something) and I think it would
be a very useful tool.


And how would you deal with messages in other languages? Over here 99% 
of messages in English are spam! AFAIK there's no language indicator in 
email messages.


Paolo


Re: Best Practices: SpamAssassin

2006-03-31 Thread Paolo Cravero as2594
 of the above?


Test them and decide which apply to your case. Dunno how indipendent 
your current antispam solution is, with SA you need to invest some time 
to review false negatives/positives (if any) and review extra rulesets.



How have people faired with MySQL replication of the DB?  I will need
this solution to present the same data for backup MX which is not
local to the primary MX.


First of all: we dropped the secondary MX record because it received 
more spam than primary. We use a load balancer for HA.


What do you want to store on MySQL? Bayes, AWL, quarantine are your 
non-mutually exclusive options.


Bayes and AWL can be regenerated in matter of minutes, and you can start 
(I mean power up) a backup MX without them.
Replicating quarantine is like replicating your trash between two bins. 
If you provide delegated quarantine, how likely is that a HW failure 
will destroy a false positive? You're probably better off without MySQL 
master-slave replication hassle.


AFAIK there is a MySQL master-master replication function, but its 
limitations make it incompatible with amavis SQL needs.



OT MODE ON
X-Mailer: Novell GroupWise Internet Agent 6.0.4

OMG! It formatted your message paragraphs without breaking-up lines! 
Luckily Thunderbird has a rewrap function!


OT MODE OFF

Have a nice weekend!
Paolo Cravero

--
|QRPp-I #707  + www.paolocravero.tk +  I QRP #476   |
| SpamAssassin-based email antispam/antivirus solutions |
 \Italian/English-to/from-Croatian translations/
  \   Skype: pcravero /


Re: Spamassassin Appliances?

2006-03-24 Thread Paolo Cravero as2594

Hi,
this is a copy'n'paste from a message I wrote in December 2005 to the 
AMaViS list.




Hi,
I thought you might like to know how much a commercial solution _very_ 
similar to amavisd-new+ClamAV+SA+MySQL+mailzu costs.


Something with AV+AS and webQuarantine to be installed on your own 
hardware, and a nice web interface for management (configuration).


For 10k mailboxes it is about 12 USD/mailbox/year. But for 200 mailboxes 
the cost increases to 50 USD/mbx/year. There are of course reductions if 
the license coves 2 or 3 years.


How much money is your setup worth? :-)

.

I'd go for spare lower-level machines that can just be turned on until 
the main one is fixed. Anyway, unless someone has shell access to your 
SA installation, it shouldn't software-break. Over here it hasn't in 
over 3 years of uninterrupted 100kmsgs/day/server. Mind disk occupation 
if you quarantine to disk, though!


Depending on your traffic, Postix+SA could be handled by a P4 1GB RAM 
machine without slowdowns.


Paolo


Forged Outlook false positive

2006-01-31 Thread Paolo Cravero as2594

Hi,
these headers trigger the FORGED_MUA_OUTLOOK check on 2.64 and 3.1.0:

X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13)
X-Spam-Level: *
X-Spam-Status: No, score=1.6 required=5.0 tests=BAYES_00,FORGED_MUA_OUTLOOK,
FORGED_RCVD_HELO autolearn=no version=3.1.0
Received: from xx.yy.yu (user-broadband-wireless-2.4GHz-1.xx.yy.yu 
[1.2.3.4])

by zz.yy.it (Postfix) with ESMTP id 90F881A3A14
for [EMAIL PROTECTED]; Mon, 23 Jan 2006 07:52:38 +0100 (CET)
Received: from galerija2 ([192.168.13.195])
by xx.yy.yu (kg.org.yu [192.168.13.5])
(MDaemon.PRO.v6.8.5.R)
with ESMTP id 1-md501.tmp
for [EMAIL PROTECTED]; Mon, 23 Jan 2006 07:52:57 +0100
Message-ID: [EMAIL PROTECTED]
From: International [EMAIL PROTECTED]
To: L R [EMAIL PROTECTED]
References: [EMAIL PROTECTED]
Subject: Read: ok subject line
Date: Mon, 23 Jan 2006 07:52:56 +0100
MIME-Version: 1.0
Content-Type: multipart/report;
report-type=disposition-notification;
boundary==_NextPart_000_0006_01C61FF2.05C6A910
X-Mailer: Microsoft Outlook Express 5.50.4952.2800
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4952.2800
X-MDRemoteIP: 192.168.13.195
X-Return-Path: [EMAIL PROTECTED]
X-MDaemon-Deliver-To: [EMAIL PROTECTED]

What is wrong with these? This should be a Return Receipt sent by OE 
through a MDaemon SMTP server (whose behavior is to me unknown). Does it 
change Message-ID?!


TIA,
Paolo


A self-declared Bulk message

2006-01-02 Thread Paolo Cravero as2594
Just reviewed the spam that passed through our amavisd-new + SA3.1.0 
barrier and noticed something funny at the bottom of a message:


This email has been sent with an unregistered version of MaxBulk Mailer. 
MaxBulk Mailer is a new easy-to-use mail merge software for Macintosh.


This message came to a non-existing address we started to trap for free 
spam, so nobody ever opted-in.


Here's the message, I removed our internal SMTP headers and cut the 
actual recipient domain:


http://spazioinwind.libero.it/ik1zyw/temp/selfDeclaredBulk.eml

That website is now probably into most SURBLs, but it is funny anyway.

Happy New Year to those who are in A.D. 2006! :-)
Paolo


Re: Load ldap prefs

2005-12-19 Thread Paolo Cravero as2594

Philip S. Hempel wrote:

Did you copy'n'paste this or retype?

user_scores_dsn 
ldap://locahost/dc=qmailldap,dc=lh,dc=com?spamassassin?sub?uid=__USERNAME__

  locaLhost, perhaps?

Let us know...
pc


Re: I need help with false spam (ham flagged as spam)

2005-12-19 Thread Paolo Cravero as2594

Liviu Lalescu wrote:


Spamassassin is reporting it as spam, with a score of 5.6, but it is surely 
not spam. I have also used a sa-learn --ham on it, but even after that the 
message is still flagged as spam. I have done sa-learn --ham timetabling 
and after that spamassassin -t  timetabling timetabling.out, obtaining 
also a 5.6 score.


I can mention that I have used learning (sa-learn) for about 8000 ham messages 
and for 14 spam messages.


Thus Bayesian does not kick in: obvious since no BAYES_* test gets reported.

Anyway, SpamAssassin is NOT guilty at all in this false positive.


 pts rule name  description
 -- --
-0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
 0.9 MSGID_FROM_MTA_ID  Message-Id for external message added locally
-0.0 SPF_PASS   SPF: sender matches SPF record
 1.9 DATE_IN_FUTURE_96_XX   Date: is 96 hours or more after Received: date


Your correspondant sent a message dated 19 *JANUARY* *2006*. This alone 
would let the message through.




 0.5 DNS_FROM_RFC_ABUSE RBL: Envelope sender in abuse.rfc-ignorant.org
 0.9 DNS_FROM_RFC_WHOIS RBL: Envelope sender in whois.rfc-ignorant.org
 1.4 DNS_FROM_RFC_POST  RBL: Envelope sender in
postmaster.rfc-ignorant.org


Paulo's provider has been listed in rfc-ignorant.org lists. Go to those 
websites to understand why.


Then, take some time to finish your Bayesian engine training and feed 
it/him/her some good ol' spam, so that is starts working.


Last but not least, add Paulo in whitelisted senders.

Paolo



Re: Using sa-learn with Notes/Domino Servers via agents

2005-11-23 Thread Paolo Cravero as2594

Not a solution but a few thoughts since we have LN here as well.

Domino servers add a hell of headers to email messages that might 
confuse the Bayesian engine.


Forwarding internet mail from one LN account to another DESTROYS RFC2822 
headers. Copying preserves.


LN clients can access IMAP mailboxes (sort-of undocumented hidden 
feature). sa-learn can be fed through a call from fetchmail accessing an 
IMAP mailbox+folder. (I think the latter is documented in the Wiki.)


You may widen the autolearn thresholds so that fewer messages are fed 
automatically to the Bayes DB.


Another issue I have is that we have 2 loadbalanced exim servers for 
tagging spam,
yet I would like to keep the bayes DB the same on both hosts. Did anyone 
ever come

up with a solution to this problem?


Yes, a RDBMS backend for the Bayes database (MySQL here). Otherwise you 
might elect one server as master and align DBs nightly (spamd 
restart!). Or stay with mis-aligned Bayes DBs: if your servers route a 
lot of msgs/day (n*10k) and are round-robin balanced, they'll be 
statistically identical. Same goes for AWL, if used.


HTH,
Paolo

--
|QRPp-I #707  + www.paolocravero.tk +  I QRP #476   |
| SpamAssassin-based email antispam/antivirus solutions |
 \Italian/English-to/from-Croatian translations/
  \   Skype: pcravero /


Re: f-secure messaging security gateway x-series??

2005-11-23 Thread Paolo Cravero as2594

Mathias Homann wrote:


So, has anyone here seen/touched this thing before?


Not that one, but touched two other vendors' appliances.

For me, the only strong point with it seems to be the combined 
firewall/AV/spam scanner thing (waitaminute... single point of failure??), 
and the web admin frontend which can generate colorful pie charts about 
spam/virus statistics (which, of course, can be printed on overhead films and 
used to increase the IT budget...).


Anyone ever seen one of those?


Lately they *all* look like an amavisd-new wrapper with a commercial AV, 
SA- or DSPAM-based AS analysis plus all those colorful niceties that 
impress managers but don't actually improve performance.


One big issue with these appliances is how they decide a content is spam 
or not, and how you can adapt the appliance to your needs. Many of them 
keep a sort-of centralized rules (Bayes? heuristic? ...) that spreads 
to each appliance, so you really don't know what is behind the 
decisional process. That makes it hard to explain your customer why his 
favourite Ikea newsletter was blocked. Same goes for non-English spam/ham.


There might be other issues, but they're OT for this list.

SA rulez! :)
Paolo

PS: I asked one of those vendors (the one I sent an idea of pricings a 
few weeks ago) how they deal with DNS-based lists. I wanted to know if 
they use vendor-based DNS replicas or query public nameservers, since 
they advertise +100kmsgs/day. They haven't answered yet...


POP3 proxy with SA 3.x?

2005-10-25 Thread Paolo Cravero as2594

Hi,
I have successfully used a Perl POP3proxy on a Linux box with SA 2.6.x .

I have now migrated to 3.x, and some internal functions have been 
dropped or renamed, so that Perl program doesn't work anymore.


Does anyone know of a (Linux) POP3 proxy that supports SA 3.x?

TIA,
Paolo



SA 3.1 false positive on FORGED_MUA_OUTLOOK

2005-09-20 Thread Paolo Cravero as2594

Hi,
just incurred in a false positive with SA 3.1 (through amavisd-new).

The message comes from a friend, and he uses a real Outlook Express 
client, perhaps the Italian version.


libero.it is one of the biggest Italian (free) ISPs.

Any hint on fixing this?

Paolo

.

Received: from localhost (172.16.1.84) by smtp2.libero.it (7.0.027-DD01)
id 431C3A2400E8EB94; Mon, 19 Sep 2005 19:44:51 +0200
Received: from smtp0.libero.it ([172.16.1.76])
 by localhost (asav5.libero.it [193.70.192.154]) (amavisd-new, port 10024)
 with ESMTP id 11243-11-5; Mon, 19 Sep 2005 19:44:50 +0200 (CEST)
Received: from Vecchio (195.210.65.40) by smtp0.libero.it (7.0.027-DD01)
id 431C393500235EBE; Mon, 19 Sep 2005 19:44:50 +0200
Received: from ppp-231-174.25-151.libero.it 
([EMAIL PROTECTED] [151.25.174.xxx])

by wca20.libero.it (SlipStream SP Server 4.0.112
	built 2005/05/06 17:01:26 -0400 (EDT)); Mon, 19 Sep 2005 19:44:50 +0200 
(CEST)

X-Originating-IP: [151.25.174.xxx]
X-Originating-User: [USER_ANONYMOUS]
Message-ID: 
[EMAIL PROTECTED]

From: Name Surname [EMAIL PROTECTED]
To: Name Surname [EMAIL PROTECTED],
Name Surname [EMAIL PROTECTED]
Subject: cinema
Date: Mon, 19 Sep 2005 19:43:12 +0200
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary==_NextPart_000_0056_01C5BD52.5F07C5C0
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1106
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
X-Scanned: with antispam and antivirus automated system at libero.it
X-Spam-Status: Yes, hits=4.924 tag=-999 tag2=3.5 kill=3.5
 tests=[BAYES_00=-2.599, DNS_FROM_RFC_ABUSE=0.2, NS_FROM_RFC_POST=1.708,
 FORGED_MUA_OUTLOOK=4.056, HTML_MESSAGE=0.001, 
RCVD_IN_BL_SPAMCOP_NET=1.558]

X-Spam-Score: 4.924
X-Spam-Level: 
X-Spam-Flag: YES


Re: SA 3.1 false positive on FORGED_MUA_OUTLOOK

2005-09-20 Thread Paolo Cravero as2594

Michael Monnerie wrote:


X-Spam-Status: Yes, hits=4.924 tag=-999 tag2=3.5 kill=3.5
 tests=[BAYES_00=-2.599, DNS_FROM_RFC_ABUSE=0.2,
NS_FROM_RFC_POST=1.708, FORGED_MUA_OUTLOOK=4.056, HTML_MESSAGE=0.001,
RCVD_IN_BL_SPAMCOP_NET=1.558]
X-Spam-Score: 4.924


Yes, increase the level at which an e-mail is marked as SPAM. This one 
got only 4.924 points, which is still below the default 5 points from 
where it should be marked as SPAM. A level of 3.5 is very optimistic, 
leading to lots of FP.


I work for an ISP and we've been running SA 2.64 at 3.5 threshold for a
couple of years now. All false positives so far have been well beyond
the threshold and mailing lists. Today's FP on SA 3.1 under evaluation
was a personal mail.

Maybe an update to his Outlook Express could help, saves 4.056 points if 
his e-mail program works correctly :-)


Well, altought I can suggest it to my friend (who uses a 56k dialup
BTW), I can't force the whole world to update their OE clients (they'd
better switch to something better, anyway!).

Even if we increase the threshold to 5, what about false positives that
hit both FORGED_MUA_OUTLOOK and BAYES_nm (with nm leading a 1 score).

IMHO FORGED_MUA_OUTLOOK is buggy, but I have too few experience with
Outlook Express Message-IDs.

Paolo

--
|QRPp-I #707  + www.paolocravero.tk +  I QRP #476   |
| SpamAssassin-based email antispam/antivirus solutions |
 \Italian/English-to/from-Croatian translations/
  \   Skype: pcravero /



(OT) SURBL local-DNS sample file?

2005-07-19 Thread Paolo Cravero as2594

Hi, what follows is certainly OT for SpamAssassin.

I am setting up SA3 with SURBL support, and I am configuring RBLDNSD in 
order to run a local SURBL copy.


Before asking for rsync permission, I'd like to test the configuration 
on a non-production system (with a non-production IP address).


I need a sample of the files that are actually downloaded with rsync, 
but I've not been able to find any sample to use on surbl.org and 
related sites.


I am not a DNS expert to write my own. Can someone provide me a sample? 
Would SURBL.org people mind publishing a sample rsync file on their pages?


Thanks for your attention,
Paolo


Re: SpamAssassin w/POP3 SMTP outsourced e-mail server...

2005-07-07 Thread Paolo Cravero as2594

Jesse Shumaker wrote:

Let me try and summarize what I have recieved from all these e-mails as 

[...]

use and am trying to piece it all together.


Correct, except that the remote POP3 server is specified on client 
configuration and not wired statically on the pop3 proxy box. At least 
with the SApop3proxy we're using.


Ciao,
pc


Re: SpamAssassin w/POP3 SMTP outsourced e-mail server...

2005-07-06 Thread Paolo Cravero as2594

Jesse Shumaker wrote:

Hi

This looks good and I think I may try this perl module. It seems that 
it's geared towards a single workstation and not a network of machines. 
They say that you point your client to localhost, which means that each 
machine must have this installed. How are you guys running this so that 
you can have one centralized SA server? Also, how does the SA box 
authenticate with the ISP's POP servers for each e-mail client? In my 
organization each user has their own password and username for their 
e-mail account.


We installed it on a linux box with SA, and run it as a deamon. It 
supports concurrent connections, altought we haven't tested it 
thoroughly (hundreds of simultaneous connections...). So, rather than 
installing it locally on each machine, use a shared POP proxy.


The client sends SAproxy the user/password, that then SAproxy submits to 
the remote server. It is a proxy for POP3 protocol (no support for 
POP3*S*), just that before sending the message to the client it is 
scanned by SA.


It is also very flexible, since the destinaton server has to be 
specified as part of the login string ([EMAIL PROTECTED] 
to retrieve mail with login [EMAIL PROTECTED] from pop.domain.com 
server): your colleagues can use the same proxy box for retrieving mail 
from other POP3 accounts as well.


PC

--
|QRPp-I #707  + www.paolocravero.tk +  I QRP #476   |
| SpamAssassin-based email antispam/antivirus solutions |
 \Italian/English-to/from-Croatian translations/
  \   Skype: pcravero /


Re: SpamAssassin w/POP3 SMTP outsourced e-mail server...

2005-07-05 Thread Paolo Cravero as2594

Jesse Shumaker wrote:

Jesse,

It would be just like a web proxy. The outlook clients are redirectd to 
the spamassassin box which filters the e-mail and forwards/relays the 
requests onto our ISP's e-mail servers. If you can assist me at all with 
this I would be greatly appreciated.


you can try this: http://mcd.perlmonk.org/pop3proxy/

It is written in Perl and apparently works on Win and Linux boxes.

I believe it is the one we're using in my organization. Very stable.
Paolo

--
|QRPp-I #707  + www.paolocravero.tk +  I QRP #476   |
| SpamAssassin-based email antispam/antivirus solutions |
 \Italian/English-to/from-Croatian translations/
  \   Skype: pcravero /


Re: OT: Mail/Spam Stats and MRTG

2005-06-06 Thread Paolo Cravero as2594

Jake Colman wrote:

Does anyone have any suggestions for using mrtg to produce a graph showing
the amount of received email and how much of it was flagged as spam?

I am using mrtg, sendmail, and procmail on all the same server.


You need to write an external program (script) for the SNMPdeamon on 
the server. It returns a single number computed out of sendmail/procmail 
maillog of whatever you want to monitor. Then use MRTG to manipulate the 
value (cumulative vs last-5-minutes).


Here we use Cricket to monitor SpamAssassin performance in 
quasi-real-time. But I didn't set it up myself.


HTHAL,
Paolo

---
SpamAssassin-based email antispam/antivirus solutions
Italian/English-to/from-Croatian translations


Re: Logfile analyzer

2005-05-27 Thread Paolo Cravero as2594

Chris Santerre wrote:


Can anyone recommend a good logfile analyzer for Spamassassin?


Depends on what you want to analyze. One of the ninjas wrote a great script
to parse the logs and show rule hit statistics. If you are looking for that
I can see if I can find it my vast archive of ninja info. Let me know.


pflogsumm.pl if using SA with Postfix...

I also wrote a script that gives stats per domain of spam caught, if 
using SA with Postfix. If anyone's interested in joining my self 
beta-testing...


Paolo

--
QRPp-I #707  + www.paolocravero.tk +  I QRP #476
 \   Skype: pcravero   /


Re: German Spam

2005-05-18 Thread Paolo Cravero as2594
Netmail wrote:
Hi 
I'm new for spamassassin , when modify the local.cf file after restart
sendmail or what ?
If you are using spamc/spamd you need to restart spamd in order to 
activate new rules.

If you are simply calling spamassassin executable from sendmail (highly 
inefficient), no restart is needed.

Ciao,
Paolo
--
SpamAssassin-based antispam/antivirus email gateways
   Italian/English-to/from-Croatian translations


Re: R: German Spam

2005-05-18 Thread Paolo Cravero as2594
Netmail wrote:
Ok
Now this is my config file 
# This is the right place to customize your installation of SpamAssassin.
# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
# tweaked.
#
###
#
rewrite_subject 1
#report_safe 1

If i want add block for the header of message ..how to ?
Altought custom rules on a particular text is not the best way to 
achieve SpamAssassin potential, you need to add to local.cf something like:

header  SUBJ_RETHANKS Subject =~ /Re\: Thanks \:\)/
describeSUBJ_RETHANKS Subject is Re: Thanks :)
score   SUBJ_RETHANKS   10 10 10 10
# Gives 10 points to those messages whose subject is Re: Thanks :)
You need to be able to write Perl regexps if you want to be more 
successful. You should test new rules on a development server, not on 
your primary box.
Don't forget to run spamassassin --lint before you put any new 
rule/ruleset into production!

Custom rules are very time consuming for the sysadmin, especially if not 
well written. There are enough free resources (rulesemporium) to keep 
SpamAssassin's hit ratio very high. And please do not forget that the 
Bayesian filter is your Friend!

SpamAssassin is _extremely_ well documented.
Paolo
--
SpamAssassin-based email antispam/antivirus solutions
Italian/English-to/from-Croatian translations


SA3.0.2 + amavisd-new ignoring $sa_tag_level_deflt ?

2005-03-10 Thread Paolo Cravero as2594
Hi,
I'm testing a setup with amavisd-new (latest download version) and SA 
3.0.2 on RedHat ES3. This setup serves as a laboratory for upgrading our 
SA 2.64 servers.

I would like to have amavisd-new to add X-Spam-* headers to all 
messages, so I set the following:

$sa_tag_level_deflt  = -999;  # add spam info headers if at, or above
$sa_tag2_level_deflt = 3.50; # add 'spam detected' headers at that level
$sa_kill_level_deflt = 3.50; # triggers spam evasive actions
$sa_dsn_cutoff_level = -999;   # spam level beyond which a DSN is not
Unfortunately X-Spam-* headers are NOT added to messages scoring between 
-999 and 3.5. What am I missing?

Thanks,
Paolo
--
QRPp-I #707  + www.paolocravero.tk +  I QRP #476
 \   Skype: pcravero   /


Re: highly available sitewide bayes, local db vs. sql

2005-02-24 Thread Paolo Cravero as2594
Ben Poliakoff wrote:
Hi Ben
What sort of experiences have people had managing a sitewide bayes db
that is used by spamassassin (spamd|amavisd) instances on multiple
machines?  I've got an environment with spamassassin/amavisd-new running
in parallel on a pool of two (but possibly more in the future) equally
weighted machines.  How have you avoided the dreaded Single Point of
Failure?
Running here two servers with SA in load balancing. Each machine has its 
own local BayesAWL DB (no SPoF). Given the amount of incoming traffic 
(100kmsgs/server/workday) we are statistically sure that both servers 
see the same (spam) messages.

We have not noticed any efficiency unbalance between the two instances 
in over 12 months.

Having two DBs has also one advantage: if Bayes on one machine gets 
corrupted (wrong training, ...) you can restore it from the twin server 
with a simple FTP. We have done this at least once.

What needs to be done periodically is AWL DB purging/reset since it 
keeps growing and growing...

We were considering a MySQL DB on a third machine (with failover on 
other two), but the loss of Bayes history is not such a big issue IMHO. 
A nighttime backup is probably enough as long as you have another 
machine to restore the DB few hours after failure. Nevertheless a good 
ham/spam collection will re-train your Bayesian filter in a matter of 
minutes.

Our third machine will probably run a local mirror of SURBL, instead!
HTH,
Paolo