Re: Using spamc--EVERY message has score of zero (including spam)

2008-01-27 Thread Chr. v. Stuckrad
On Sun, 27 Jan 2008, Don Ireland wrote:

 Can somebody help me figure out WHY?
 
 It's returning *0/0*

As far as my experience goes, you get 0/0 only if the spamc
did not get a connection to the spamd!
A 'real' score of zero woutd be 0/min with 'min' the
minimum spamscore of the spamd, normally 5.

So you'll have to check how your spamc does try tho reach
your spamd. Are they on the same host? (normally they access the
same 'unix-domain-socket file' which must be accessable to the
uid of spamc! ; you can also try to connect by TCP adding the option
'-d localhost') or do you want to ask spamd on another host?
Then you'll have to use '-d somehost'.

AND make sure this spamd on 'localhost/somehost' allows TCP connections!
(you'll need on the spamd the option ' --allowed-ips=##.##.##.##'
(with the ip of the spamc host - localhost or somehost))

The default-Installation does ONLY use the sockt-file for security reasons.

Stucki (using spamd/spamc on differen thosts :-)

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: The googolbees are getting craftier

2008-01-22 Thread Chr. v. Stuckrad
On Mon, 21 Jan 2008, John D. Hardin wrote:

  m,https?://(?:[^\./]+\.)*goo+gle(?:pages)?\.(?:[a-z][a-z][a-z]?(?:\.[a-z][a-z])?)/+.*[?](?:btni|adurl),i

If I understand that pattern, both the '*' are 'unbounded'???

This might 'break' your spamfilter, if spamassassin gobbles
up all memory during analysis.  Better replace any unbounded
'*' by reasonable length {0,N}, with N a little more than the
seen strings.

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: Paiment Repre sentative spams

2007-11-26 Thread Chr. v. Stuckrad
On Mon, 26 Nov 2007, Igor Chudov wrote:

 for thieves who are moving stolen money to their real accounts, using

A german radio-station in Berlin had a feature
abount those criminals. Sending trojans as spam
to people using homebanking, they capture money,
and to transfer this money to themselves they
need those 'helpers', who receive the 'payments'
in-country, then transfer to other countries where
the money 'vanishes'.

In germany doing such a transfer is 'laundering
money', and the 'helper' not only falls under this
law but also has to pay back the whole sum, while
the 'real criminal' normally is already gone ...

It was assumed that there are millions more 'pins + tans'
grabbed, and 'on hold', while the scams do not recruit
'enough helpers' to get hold on the money of the already
trojanized bank-accounts.  So seemingly lots of people
have caught on and are ignorig those scams.

(I hope my largely rusty english comes across :-)

Stucki  (getting lots of those all the time)




Re: 'spamc/spamassassin' crashing with overlong blank line spams?

2007-09-19 Thread Chr. v. Stuckrad
On Wed, 19 Sep 2007, Karsten Bräckelmann wrote:

 How so? Since these mails are killing spamd, what use is it to throw yet
 another rule at it?

Well, in the time since I wrote the mail to the list,
I circumvented the problem by prefixing my 'spamc' by
a little 'awk-filter' to get rid of those overlong
lines, and since then the spamfiltger is ok.
And I did hope somebody would (did!) write a rule,
while I was working on fixing my spamc/scripting ...

The meta rule on the list looks promising :-)

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(alle):+49 30 838-75 454/


'spamc/spamassassin' crashing with overlong blank line spams?

2007-09-18 Thread Chr. v. Stuckrad

Hi!

Seemingly our spamc (3.1.9, not yet 3.2.*) can not
transfer a special kind of current spam to a remote
spamd.  Those Mails always produce '0/0' instead
of usable reports.

You can see something like the Mail I analyzed
at http://page.mi.fu-berlin.de/stucki/mail.txt
(I had change the offending line for the browser too,
so at the end you seen a descriptive line only)

Is this a known failure of the old spamc?
Is the MTA supposed to fix Mails with overlong
blank/any lines?

Do I have to switch to 3.2.* to fix it?

Would it be possible to 'just take' a
newer 'spamc (3.2.*)' to communicate to
an old 'spamd(3.1.*)' or did the protocol
change somehow?

Thanks for hints ...   Yours   Stucki

PS.:  Ideas welcome for catching the characteristic Subject of
those spams, which look like 'just random tty line noise'!

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: Number spam (paranoid guess)

2007-08-07 Thread Chr. v. Stuckrad
On Tue, 07 Aug 2007, John Andersen wrote:

 Ok, what is this stuff. 
 All it contains is 6 digit numbers.  What's up with that stuff?

My most paranoid guess is:

- Cause: we have summer vacation time ...

So LOTS of people are on holidays.
If you use E-Mails with totally useless content which goes
through all filters for a short time, you can trigger LOTS
of vacation-Messages!

Then (1) you will have to know, 'who answered' and if you
ar not only a a spammer, but also a 'more criminal mind',
you (2) might even find the typical vacation messages like
I'm away to china for two weeks, try later ...!

So you know somebody is *away* and you can safely steal
from the flat, impersonate the owner of the addresse etc...

That's paranoid, I know, but criminals are not always dumb :-)
And lazy anyway, and on the internet too :-)

Stucki  (who never has vacation [messages:-])


Re: Now its zip attachments ^^

2007-07-23 Thread Chr. v. Stuckrad
On Mon, 23 Jul 2007, John Scully wrote:

...   After adding the sanesecurity sigs to clamd last
 week not one PDF has made it through.  And since clamd unpacks and examines
 every attachment anyway it is no additional load.  In fact, due to the
 messages not hitting SA it probably reduced load slightly.

I have a 'political problem' with that.  We 'drop' knowv viruses into
a quarantine directory without further notice, and only once in years
somebody complained and wanted his virus back :-)

We *only* TAG spam with headers, then users decide to drop, move, or read it.

So if I 'simply insert' those clamav sigs, spam would be handled as a virus,
not as 'our spam', which I'm not allowed to destroy.

Did somebody of you create an extra 'instance' of clamad-filter to fight
spam with spam-sigs only, without scaning for virus-sigs?  Does that
sound feasible?
 
Stucki


Re: Spam Du Jour ? *.XLS -- packed into zip now

2007-07-22 Thread Chr. v. Stuckrad
On Sun, 22 Jul 2007, Robert Schetterer wrote:

 http://sanesecurity.co.uk/clamav/
 
 catches it now

As seen before, they react fast on news on this list :-)

Now I got the same 'XLS' *inside* a *.zip file!

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: Spam Du Jour ? *.XLS

2007-07-21 Thread Chr. v. Stuckrad
On Sun, 22 Jul 2007, Robert Schetterer wrote:

  investors news-76212.xls, et all
  
  no real challenge
  
 jep , got 3 xls spams today

well, here too,

but I think soon we'll get the whole mix ...
a combinatoric explosion of envelope formats
and content variants, meaning
 'any windows-showable-fileformat' *
 'all the already known picture-tricks embedded'

Anybody working on generic detectors yet?
(I really would like to plug that (w)hole :-)

Something like amavis or clamav to first unpack
and then spamassassin to analyze it?

Stucki


Re: Spam PDF

2007-06-27 Thread Chr. v. Stuckrad
On Wed, 27 Jun 2007, Wael Shahin wrote:

 I have two servers one is running DCC and one is not, the one that is
 running DCC didn't pass the message or maybe I am mistaken but it didn't
 go through (Maybe didn't get there at all from the first place).
 On the other server that is not running DCC the email went through and
 it was an empty email body with a PDF attachment

No wonder I think. DCC will notice/flag spam 'already seen elswhere'.
AND that may be the only way to decide whether the pdf(s) are junk
or real information.  So Spamtraps or honeypots may be the fist choice.

The last 'try' of the spammers was to put the pictures into Word-docs
or powerpoint docs, so I assume they just go through every format
of 'embeddable attachment' for which a 'plugin or viewer' exists
and which is automagically opening in mailbrowsers (which must be
carelessly configured to show the picture, but which is default).

So on the long run we need a generic way to mime-strip contents
of attachments (like virus-filters do it!) and recursively feed
all parts of the mail into scanners for spam (eighter text or
picture scanner).

If there is a simple way to program signatures for virus-checkers
it might be possible to catch specific pictures therewith.

Alternatively you could forbid such attachments completely, but
that has no chance in a university environment like I'm in.

We got wo 'waves' of pdf's here.

The first wave was stopped here by noticing that the spammers
did program the spambots with a repeated pattern of filenames,
but they noticed and the second wave is only random nonsense
plus the pdf.  But every 'normal' user would never open a pdf
out of a mail of nonsense, so they reach only a small fraction
which might not be useful for pushing stocks.

So I hope that 'fad' might die out soon, like the other waves of
doubly-packed pictures in rtf, word, powerpoint did.

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: TVD_SILLY_URI_OBFU

2007-02-05 Thread Chr. v. Stuckrad
On Mon, 05 Feb 2007, Bowie Bailey wrote:

   body Test_01 /remove \\*\/i | /remove \\%\/i | /remove \\!\/i
   score Test_01 4.0 describe Test_01 Test remove asterisk for URL
   spams 
 
 How about this? (untested)
 
 body Test_01 /remove \[*%!]\/i


Since Sunday after two new obfuscation chars and
two new subdomains in the same mails I use
(because I hope it to be more specific):

[ For Beginners: '\W' is a non-word-character, '\S' is 'not space'
  and never use '.*'! Instead use a fixed maximum lenght '.{m,M}'
  where 'm' is minimum and 'M' is maximum of length ] 

# Obfuscation-nonword-char instead of dot
body __MEDOBFU1A/http:\/\S{1,25}\Wcom/i
body __MEDOBFU1B/replace ?\W.{1,30}(?:with|by)\s?\./i
# Obfuscation-nonword-char inserted
body __MEDOBFU2A/http:\/\/\S{1,30}(?:\W\S{0,10}\.com|\.\Wcom)/i
body __MEDOBFU2B/remove ?\W/i
# both in one rule
meta __MEDOBFU1  ( __MEDOBFU1A  __MEDOBFU1B )
meta __MEDOBFU2  ( __MEDOBFU2A  __MEDOBFU2B )
meta MEDOBFU   ( __MEDOBFU1 || __MEDOBFU2 )
score MEDOBFU   3
describe MEDOBFUPharma spam with illegal character in Hostname of URL

Using \W may be a risk because the class contains too
many characters, but so far I did not hear of FPs.

The only trouble with it is, because I write this to the list,
tomorrow they will sprout a lot of new different adapted versions
of the same basic idea all over the place.

So what really will be needed, would be a combination of
Rules for 'illegal hostname in url' and something like
the URIBLS to catch 'sytactically legal looking' obfuscations.
(if such a thing is feasible)

Stucki


-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Sudden drop in spam-rate, parallel to a surge of new trojans - beware

2006-11-21 Thread Chr. v. Stuckrad
Hi!

Yesterday we had a sudden drop in spam-percentage from 80% to near 60%.
Parallel to it I got six copies of an undetectable (by NAI and ClamAV)
new trojan 'exe' in the Mail.

Do we have to prepare for a new flood by an updated
(just now reorganizing) botnet?

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: Greylisting

2006-11-21 Thread Chr. v. Stuckrad
On Tue, 21 Nov 2006, Vahric MUHTARYAN wrote:

 I'm using SA for a long time without any problem, nowadays
 spammers are using too much graphical objects and they are tring
 to change it day by day. I'm tring to use fuzzyocr but it's taking
Same Problem here ...
 too much cpu. I think that try greylisting . I wonder are there
 anybody use greylisting ? Somebody can give me feedback ? 

But wouldn't Spammers simply send every Mail twice in an attemtpt
to break greylisting, then after the automatic whitelisting has been
switched, you get everything twice, simply doubling the amount of spam
on the long run?

Just curious why I get so many spams twice or thrice in an short time
(I have NOT installed greylisting because of that phenomeneon, I assumed
geylisting to 'go awy' or 'to be just a fad', but I re-think about it,
because of the CPU-Cycles needed for FuzzyOCR).

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


OT/Humor: Do I have to live in fear of spammers?

2006-10-25 Thread Chr. v. Stuckrad
Today a subject went undetected through the filter and
'made my day' (ROTFL, couldn't resist to post :-))

Subject: Consequently We must kill you not perhaps.
... Stocks spam ...

Does somebody have a list for something like
 'the best random-generated spam/text'
without polluting this list ?

YoursStucki


-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: forwarding email using /etc/aliases and keeping spamassassin headers intact

2006-09-20 Thread Chr. v. Stuckrad
On Wed, 20 Sep 2006, Larry Starr wrote:

 Are you certain that SA even sees the message before it's forwarded?
 
 My first guess, without seeing config files, etc.  Would be that your SMTP 
 daemon (sendmail?) is forwarding the message as it's received.

This sounds like 'filtering with procmail' during personal delivery.
In this configuration the MTA will forward (by .forward or /etc/aliases)
 *before* procmail would ever be called.

In this case the user should forward by a private procmail-rule
insteaded of the MTA, so that his procmail has a chance to filter Spam.

Stucki


Re: Running on Debian stable

2006-08-18 Thread Chr. v. Stuckrad
On Fri, 18 Aug 2006, Magnus Holmgren wrote:

 You could install just spamassassin (but not spamc) from testing, without 
 having to pull in anything else.

There's also a spamassassin on dabian 'volatile'
under 'volatile-sloppy' (from sources.list):

deb http://ftp2.de.debian.org/debian-volatile sarge/volatile main
deb http://ftp2.de.debian.org/debian-volatile sarge/volatile-sloppy main

I did NOT yet test it yet, I only use the updated clamav there,

Stucki!

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: Image spams getting thru

2006-08-01 Thread Chr. v. Stuckrad
On Tue, 01 Aug 2006, Theo Van Dinter wrote:

 On Tue, Aug 01, 2006 at 09:24:55AM -0700, John D. Hardin wrote:
 ...
 Well, until greylisting becomes enough of a problem that the spammers change
 their software to queue and retry, thereby eliminating the benefit completely.
Or even simply send spam unconditionally twice or thrice
just to be sure to get through the greylist.

It just needs knowledge how fast you have to give the same
combination of envelope-addresses to the same zombie again.

And THIS would explain why I get lots of spams more than once,
but in 'chunks' of 3 to 6 times the same thing in a few minutes
and then pausing for a long while.

So just by re-arranging the (spam-)address-lists and sending
at least twice the amount of spam, greylisting may be circumvented.

Just an idea, because we currently/suddenly get over 20% more spams
for the last few days.

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: exim4 + forwarding + spamassassin

2006-07-27 Thread Chr. v. Stuckrad
On Thu, 27 Jul 2006, jdow wrote:

 From: Loren Wilton [EMAIL PROTECTED]
...
 I've never seen the logic of placing SpamAssassin inside the incoming
 transaction before the termination of the SMTP connection rather than
 down the pipe in the MDA.

If you want to 'reject spam' (wih score over a given
threshold) and because you do not want to generate bounces,
you have to check 'inside the transaction', to tell the sending
MTA, that you do not accept the current mail becaus of spam.

This only works with site-wide bayes and global setup, except
if you make sure, that you know the (then exactly one?) recipient
of the message at the end of incoming data (the single '.' in the
SMTP-Protocol, the 'acl_smtp_data' in exim4).

Beware of 'overloading the system' if you check incoming mails
'durig arrival', you will have to restrict the number of concurrent
SMTP-connections by the maximum of spamchecks your system can handle.

Stucki

PS.: I too prefer 'only to tag' the spams, and let the user decide
do discard them.  I tested both ways and to me the only safe way
to never crowd the system ist to spamcheck on the inside in an
exim-queuerunner.  The nr. of queuerunners can then simply be
adjusted to the capabilites of the server.

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: Will bayes-db be 'skewed' by feeding it spam only (one central database)

2006-07-19 Thread Chr. v. Stuckrad
On Tue, 18 Jul 2006, Dirk Bonengel wrote:

...
 If I was in your position, I'd try to switch over to a system like Maia 
 Mailguard that keeps a copy of each mail in a database and users can 
 confirm and/or correct the underlying SpamAssassin engine's decisions. 
 This system uses a singel bayes DBWorks fine at a customer of ours 
 that uses some weird proprietary document managing software

THIS looks *very* interesting, as it may directly solve the problems
we planned to solve in our *next* MTA (not postfix, but exim4 + cyrus)
where we already 'test' amavisd-new+clamav+nai-uvscan for filtering and
where we needed acces for the users to the filter-settings.

Does it really keep *every* Mail in the database?
Or only Mail which might be accepted if the user wants it.
(50% Mail coming in have useless adresses here)

But *now* I'm stuck with qmail+qmail-queue-patch and the older
amavis-perl(largely patched).  So *now* the users have no influence
except 'telling me' [which they mostly do not] :-)

Stucki


Re: Will bayes-db be 'skewed' by ... autolearning ham?

2006-07-19 Thread Chr. v. Stuckrad
On Tue, 18 Jul 2006, Dirk Bonengel wrote:
 did you investigate auto-learning? This might let your system learn ham 
 as well as spam. Works fine here (same situation  - gateway server to a 
 Lotus Notes system, no feedback loop possible)

May be I should change the threshholds for autolearning
different from the default? (I never touched them so far).
I just found *lots* 'autolearn=ham' in my log,
and I can not believe that so many are correct.

Out of the current log I see Mail classified as
   21805 ham
   11493 autolearned as ham   (this seems suspiciously high?)
   85963 spam
   52977 autolearned as spam

So I fear the 'skew' in my database comes form autoloearning
'bayes-fodder' of spammers and not fron 'skewed explicite learning'.

WHat may make it even worse is, that 'inhouse mail==ham' is
never learned, because it's never spamchecked (users did complain
too much about the slowdown, so only the 'outside' goes through the
Spamfilter).

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: Will bayes-db be 'skewed' by feeding it spam only (one central database)

2006-07-18 Thread Chr. v. Stuckrad
On Mon, 17 Jul 2006, Logan Shaw wrote:

...
 someone carrying a knife, they have been a violent criminal,
 so knife-carrying correlates perfectly with being a criminal.
 
 Now imagine that you see a chef.  He is carrying a knife, but
(Good point: [OT: I even know people who react that way on TV-News] :-)

...
 by doing that, you will give it a very negative view of the
 world, where everything looks like spam.
 
 (This is all assuming, of course, that your Bayes database is
 empty when you train it with spam only.)

Assuming this scenario I ORIGINALLY started the database
on ham of a long backlog of MY mail, which THEN had enough
spam AND ham to start with, so it's not as bad as would be possible;
but since the last 'fresh start' I 'updated' only the false negatives.
And checking near 6000 (low scoring) Spams a week I found only
'classical false positives' (like of this list :-) and for months
*I* did not loose(sort away) anything important. But may be
one in two months one of our power-users complains about a real
false positive, and if I'm allowed, I feed THAT one in.

 configuration changes that need to be made.  Do you have the
 latest SpamAssassin, and have you enabled some network tests
not the latest, because debian 'stable' is not fast in
the uptake of new versions.  May be I should move to the
volatile packages ...
 like DCC or razor and some RBLs?  Those should be carrying
 some of the load; you shouldn't be relying on Bayes only,

Of course. razor, pyzor, dcc, and the newer german iX-plugin,
and RBLs do catch lots of mails pushing thousands to scores
above 20 :-)

 If your Bayes database really is messed up, personally I would
...
 you *do* have is worthwhile.

H may be on one of the next 'maintenance days',
when (nearly) everything is down for a while, so nothing
will slip through during training ...

But this 'keeps' me thinking about the different 'hams' in
our department. Some are french and some even might be Chinese.
So if I train again with *my* mail (postmaster-problems and
a bit of half-private stuff) the database might start anew
skewed 'against' real hams of other parts of the department!
(While I think 'my spam' will be fine to train with).

The only 'real solution' might be to switch to a SQL-Database
and 'bayes-per-user', but then I'd have to 'train' hundreds
of Students how to 'train' their own databases themselves :-))

...
 Well, there are probably several different explanations.
 The best place to start is by looking at the spams that get
 through and how they scored, especially comparing that to what
 scores others get on the same messages or similar ones.

That's one of the problems here. The mail-filter(-host) runs on old
amavis-perl and does not include the whole scoring headers in the mail,
but only a marking header with the score itself.  So when I later check
the same mail (cleaned of the previous marking) I get completely
different (mostly horrendously higher) scores for the same, but without
really seeing the differences.  Seemingly the later in time an 'one of a
series spam' comes in, the more of the dynamic systems have learned it
and score it.  I nearly believe we often are 'at one end' of some
'lists to be spammed', so we get it 'fresh', and only the first users
are hit, others get it 'after' the filter dynamically chokes down on it
and so the different users do complain about different 'slips'. Sometimes
it *seems* as if spammers work their list alphabetically, so user a*
is getting something often, which w* never sees, and other way around
too :-)

Thanks Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Will bayes-db be 'skewed' by feeding it spam only (one central database)

2006-07-17 Thread Chr. v. Stuckrad
Hi!

I'm a postmaster working with spamassassin (now debian sarge)
for the last years, we habe one filter-host for all mails,
so at the moment we have only one global bayes-database..

We are a department for math and computer science and so we get zillions
of spam for all addresses 'known on the net' and we get ham for lots of
different 'themes' for different workgroups in diverse languages (mostly
german of course, being Berlin Germany).
Not beeing allowed to peek into other users mailboxes I have no
'representative ham corpus' but only my own, which seems to be
very postmaster-specific, while I seem to get a typical average
of spams (because my address already existed on a 'News' server :-).

Can somebody tell me, whether the bayes-database's accuray does
deteriorate by feeding it 'only my spam' (my false negatives) and
not feeding it the (to me unknown) typical hams.

To me it lately seems to slowly skew to let more and more spam
through, instead of 'catching' it.  Is this typical?  Do I have
to recreate the database? Or do I need to get 'ham from a set
of typical users' to balance the database? OR are there typical
values for bayes_auto_learn_threshold_{non,}spam, different from
the defatult, to use in my case?

Just curious why so many spams get through to me ... 
(i.e. around 10 false negatives relative to 90 marked as spam,
which ist 'relatively bad' compared to many opinions on the list)

Just curious,  Stucki (postmaster of math/inf/mi.fu-berlin.de)

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: iXhash plugin docs updated, version for 3.0.x added.

2006-06-21 Thread Chr. v. Stuckrad
On Wed, 21 Jun 2006, Dirk Bonengel wrote:

 - added a version that runs under SpamAssassin 3.0.x

Thanks a lot!  After shortening some of the descriptions
(my --lint complains because of more than 50 chars)
it already caught some spams this evening!

My users will like that :-)  Stucki (postmaster at mi.fu-berlin.de)

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/


Re: sa-learn not learning with sudo

2006-04-24 Thread Chr. v. Stuckrad
On Sat, Apr 22, 2006 at 10:55:29AM +0200, Michael Monnerie wrote:
...
 # sudo -H -u vscan sa-learn --dump
...
 But when I do
 # su -l vscan
...
 # sudo -H -u vscan sa-learn --dump
...
 Now why is there a diff between sudo as a user or directly logging in as 

One of the differences will be all the commands in the
User's shell-startup-Files!  Those are ignored, if you
run the command directly by sudo.

It also depends on the version of 'sudo', because one
of the latest changes *dropped* the HOME-Variable
from the environment (at least if you run the command
directly from sudo!).

Lots of our automated cron-scripts suddenly failed
by this 'security fix' and we had to replace
OLD:  sudo command
NEW:  sudo env HOME=$HOME command
to 'bridge the gap' and re-use the *current* HOME
'inside of sudo'.

May be the 'sudo -l vscan' also sets the missing HOME!

YoursStucki   (postmaster hit by the same? :-)



Re: [OT] Amavisd replacement suggestion

2006-03-07 Thread Chr. v. Stuckrad
On Tue, Mar 07, 2006 at 04:42:31PM +0100, Michael Monnerie wrote:
 Isn't PITA some sort of Greek bread? The one they use for Gyros, I 
 believe. Wait, looking on wikipedia: http://en.wikipedia.org/wiki/Pita
 So why is it like Greek bread?

May be, amavisd is best if toasted (as I like pita==pide :-)

But if 'amavisd is a PITA' meant the old Version,
which starts 'one perl-process per mail' that
is enormously slow and cpu-power-hungy compared
to amavisd which comples only once, then stays
in memory and then only forks children.

So the old one is more a pain in the server,
a chance to 'toast' the Server or your mail :-)

Stucki


Re: pcre

2006-02-09 Thread Chr. v. Stuckrad
On Thu, Feb 09, 2006 at 03:24:58PM -, John Hall wrote:
 Ronan [EMAIL PROTECTED] wrote in message 
 
  Anyone have any input on this? What would be the implications? Should it 
  just be a straight translation perl - c , or are there other factors?
 
 Ronan,
 
 Why would using pcre be quicker? Perl's regex engine is written in C as 
 well. Besides, there is more to SA than just matching regexes.

The most important Difference between 'grep-ing' by pcre versus perl
in my opinion is the 'Startup-Time'.  Starting/dynamically-linking a
whole 'perl-interpreter' is a lot more Work than just starting a pcre
Pattern-Engine.

So if you 'just grep for Text' with a script, pcre(grep) is your
friend.  BUT if you need lots of dynamic libraries, use loadable
Modules, and connect to networks, like spamassassin does,
'pcre' simply has nothing to compare with that.

And in the case of 'spamd' the startup-phase loads only once,
then there only fork children, so there should be no large
startup-penalty.

ONLY you should not use 'dangerous/slow perl-patterns'
(avoid ambiguities, avoid remembering brackets without (?: ),
limit pattern-match-lengths by not using '.*' but .{min,max},
construct easily decidable left-factored searches)

As far as I remember perl does 'allow' a few more complicated
(not to say convoluted) cases than pcre does, but you'll better
not use them anyway in spamassassin patterns.

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]  \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/


Re: Exim+SA=Server Overloaded!

2006-01-25 Thread Chr. v. Stuckrad
On Tue, Jan 24, 2006 at 02:01:55PM -0200, Eduardo wrote:
 Hello!
 Sorry to send another email about the same subject. But my mail server 
 crashed so i couldn't see the answers.
 
 I am calling my spamassassin service in SMTP time with some ACL rules in 
 my exim4 configuration file. I start the SA service, start exim4 service 

As far as I had the same problem, while I work on our
new/future exim+amavis+spamassassin-MTA, it stems from
'too many SMTP-connections in parallel' and in my case it
forced me after a few tests to move from 'spamassassin by ACL'
to 'scanning by pipe / in queue-runners'.

After you stop/restart the server, all MTAs which waited for
your server to come up, will 'crowd in' to deliver. So your
number of parallel incoming connections will be at its maximum.

If you do spam-checks 'by ACL' (in the SMTP dialog), you'll
need spamd-access for each connection in parallel, which
nearly always will either crash the server by overload,
or begin to let spam through (or defer connections) by timeout
(too many connections to one spamd).

Therefore we changed our spam-check to the other method
'by pipe', only checking for spam 'in the queue'.

This way, we can tell exim to accept the mail, put it
into the queue ('queue_only'), and then set the number of
parallel queue-runners exactly to the maximum capacity of
the spam-filter.  This visibly slows down delivery at a whole,
but it should never force the server into thrashing.

The ideal setup though would be adapting to the load, scanning
by ACL, until too crowded, then switching to 'scan later' in
the queue, but I did not yet understand exim THAT far.

Stucki (postmaster of math/inf/mi.fu-berlin.de)

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]  \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/


Re: Gain an extra 25%! (was Purging the Spamassassin Database)

2006-01-16 Thread Chr. v. Stuckrad
On Mon, Jan 16, 2006 at 04:09:37PM +0100, M.S. Lucas wrote:
 Could this be made a default with the small size of the id columns and a 
 note in the installation file for the big users?
 There are more users of SA with less then 65k users then with more.

Does it mean '65k is the largest User-Number' (numerical) like in UNIX-UIDs,
or really '65k different Users in the Database of Setups and Tokens?

The latter really will be relatively seldom.

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]  \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/


Re: spamcop.net tactics

2005-11-22 Thread Chr. v. Stuckrad
On Tue, Nov 22, 2005 at 09:24:28AM -0800, Linda Walsh wrote:
 That doesn't mean it's a moral, an ethical or respectable reason:
 Spite is reason enough for most people these days. 
 
 Michele Neylon:: Blacknight.ie wrote:
 
 if your IPs end up in there it's usually for a
 reason.

Before we get into 'arguments' or even 'flamewars':

We (@{math,inf,mi}.fu-berlin.de) were hit by the same problem,
we also could not find *anything* visible, which had could have
put us into their list, and so we had to resort to 'circumventing'
the assumed problem.

Seemingly 'spamcop' not only counts 'real spam' (explicitly
sent to spam-traps) but also counts 'any bounce stranding in
their spam-trap' as an 'spammer or open-relay'.

So simply by having users use 'vacation' or viruses/worms
sending themselves from faked spam-trap-addresses and bouncing
at your site, you can be blacklisted for 24 hours (for each?).

After reducing 'bounces' by patching 'qmail' with a user
check in 'RCPT' of the SMTP-Delivery, making all lists
reply to local owner-addresses instead of bouncing,
by checking all auto-answering-services to never answer
on bounces, bulk-mails and spams, and such,
thereby reducing the 'chance' of hitting the
spam-traps again, we 'survived' so far without being
blocked again (at least without being blocked again
for more than the lifetime of mails sent to us).

Stucki(postmaster)


Re: SA 3.1 X-headers prepended instead of appended

2005-10-22 Thread Chr. v. Stuckrad
On Fri, Oct 21, 2005 at 05:19:40PM -0400, Daryl C. W. O'Shea wrote:
 No but here is what the headers look like:
 
 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on
  domain.com
 X-Spam-Status: No, score=-2.4 required=5.2 tests=BAYES_00=-2.599,
  DNS_FROM_AHBL_RHSBL=0.231,HTML_MESSAGE=0.001,SPF_HELO_PASS=-0.001,
  SPF_PASS=-0.001 autolearn=no version=3.1.0
 From mailnull  Thu Oct 20 23:42:08 2005
 Return-Path: [EMAIL PROTECTED]
 Received: from ccm08.roving.com (ccm08.roving.com [63.251.135.109])
 etcetcetc.
 
 Something in your mail flow is broken.  You've got what appears to be an 
 mbox line:
 
 From mailnull  Thu Oct 20 23:42:08 2005

Assumed that I pipe a Mail directly from my MUA (mutt, pine, ???)
through 'spamc' and back into my mailbox.  Will I get the same
(wrong, because of destroying my 'From ...'-Line ) result? 

I ask, because I often did that 'piping' via 'procmail' from one
to another mailbox; and I can not test it yet, having no 3.1.0
yet :-)

Stucki


Re: missed by AV programs

2005-09-19 Thread Chr. v. Stuckrad
On Mon, Sep 19, 2005 at 03:55:12PM -0400, Rob McEwen (PowerView Systems) wrote:
 RE: missed by great AV programs
 
 (keeping in mind that these I'm mentioned may catch up by the time you read 
 this)
 

Right, in the time since you wrote this, NAI (McAffee) first
sent an extra ALERT-Letter, then created/updated an earlier
new DAT-File to catch the new Variants (Number 4585)!

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]  \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/


Re: Spam with Re[2]: or Re[4]:

2005-09-15 Thread Chr. v. Stuckrad
On Thu, Sep 15, 2005 at 03:42:42PM -0400, Ronald I. Nutter wrote:
 # Check for bad Re: tag
 header BAD_RECOLON_TAG Subject =~ /\bRe:\b/i
 
 stopping email with something past the Re:.  Is my concern valid and how
 do I allow the email to get through that has something after Re: ?

I assume you want to catch Mails with 'Re:',
but 'only without any further contents'?
Then you'd need to use '$'(line end)
instead of the second '\b'(word end) giving:

header BAD_RECOLON_TAG Subject =~ /\bRe:$i

This will be DANGEROUS IF mail-programs
automatically add 'Re:' to empty Subjects!
Then you'll possibly get false positives.

OH, by the way, what are the double-quotes for?
I think they would be seached for! So the pattern
will not work as assumed?

In an exim4-filter (it uses PCRE Patterns just like perl)
I just wrote/tested a pattern against the 'Re...'-Spams
analogous/rewritten to spamassassin:

header BAD_RECOLON_TAG Subject =~ /^re:?\s*\[\d+\]:?\s*$/i

Which is:
 re  the characters
 :*  the colon (possibly)
 \s* whitespace (possibly)
 \[  the left bracket (the typical case)
 \d+ one ore more digits (from 2 to 111 I saw random numbers)
 \]  the closing bracket (all my spams had it)
 :?  another colon (I really saw those Re:[1] and Re[2]:)
 \s* possibly more whitespace up to
 $   the end of the Subject:

If anything (except more whitespace) follows the tag
this pattern fails.  So writing 'Re: [2] something'
goes without hitting the rule.

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]  \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/


Re: SpamCop listing internal hotmail servers?

2005-09-07 Thread Chr. v. Stuckrad
On Wed, Sep 07, 2005 at 06:37:54PM -0400, Greg Allen wrote:
 As a result, she got our server blacklisted several times and affected about
 400 users. I went round and round with her telling her to knock it off.

You don't even need a user to actively report to spamcop.
A normal users simple 'vacation'-Program may be enough!
Spamcop sends out 'relay-probes' and 'bounce-probes'.
And I was told, if *anything* ist bouncing back to teir
testserver (instead of being stopped at the SMTP dialog)
the host is assumed to send spam-bounces and goes into
the rbl-list for minimally 20 hours. (We had to patch
our qmail to get out of this after being rbled for a week).

So I'd say spamcop is 'harmful' instead of 'useful'.

Stucki


-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]  \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/


Re: OT: sa-learn, interfaced with Cyrus mailboxes

2005-08-21 Thread Chr. v. Stuckrad
On Sun, Aug 21, 2005 at 01:59:00AM -0400, Forrest Aldrich wrote:
 I just switched over to Cyrus IMAP - and it didn't occur to me I'd need 
...
 I wonder whom else is using Cyrus IMAP here, and how you may be handling 
...

I'm on the way from 'qmail'+'UW-Imap' to 'exim'+'cyrus'.
(Testing configurations and waiting for our Project of
 generating an account-database to be used for mail-addressing).

I'll let spamassassin add a header, then later an exim-router
will switch the 'tagged' mails to the mailbox named
username+spam into cyrus. And the '+spam' tells cyrus
to put the mail into an username/spam extra Mailbox.
(you also can sort by 'sieve'-scripot in cyrus).

'spamassassin' can be run eighter in ACLs (works if you
limit the number of concurrent smtp-connections), or
via exim queue-runners filtering the mail later
(then you need to limit this number of queue-runners).

But sorry, 'bayes-learning' is on the agenda for
'later' because we're not yet sure whether we'll
keep per-user-data (in SQL -database?) or stay
with site-wide-data as now.

Yours   Stucki  (postmaster at mi.fu-berlin.de)


Re: How to use Multilog ?

2005-08-15 Thread Chr. v. Stuckrad
On Mon, Aug 15, 2005 at 09:09:20AM -0400, Matt Kettler wrote:
 Perhaps you want something like:
 
 spamd -s stdout | multilog {insert multilog options here}

This should be exactly what you want.
BUT in the manual I only see 'stderr' allowed
for '... -s stderr'.  If 'stdout' does not work
you might need to run

 /bin/sh -c 'exec spamassassin -s stderr ... ... ... 21'
instead of
 'spamassassin -s stdout ... ... ...'
This way you'll get stderr redirected to stdout by the shell,
and multilog gets the output.

Multilog (normally started by Bernsteins Daemontools
via supervisor) analyses standard input!

See: http://cr.yp.to/daemontools/multilog.html

Stucki

-- 
Christoph von Stuckrad  * * |nickname |[EMAIL PROTECTED]  \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 6600|
Arnimallee 2-6/14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75454/


Re: Very long scan times - Finding the culprit rule

2005-08-15 Thread Chr. v. Stuckrad
On Mon, Aug 15, 2005 at 06:51:48AM -0700, jdow wrote:
 As soon as you touch swap space you're dead. It's not unusual to see times
 for processes increase by 10 or even 100 times. (Although about 10 is most
 common.)

Happened to us already twice.  Is seems to hit 'just by chance'.

I assume it to be a 'bunch of too many large mails' hitting
'complicated rules' (especially rules with 'variably long'
patterns like '.{1,30}'), and so bloating up *all* children of
spamd in parallel.  Normally only one or two are bloated
and they 'die soon' being replaced by normally sized ones,
but extremely seldon *all* bloat, and the server goes down.

Stucki


Re: Very long scan times - Finding the culprit rule

2005-08-15 Thread Chr. v. Stuckrad
On Mon, Aug 15, 2005 at 07:27:33AM -0700, Loren Wilton wrote:
 You can stop the first two from being problems by running a manual expire
 from a cron job every so often and disabling the auto-expire runs.  You
 should have a limit of 250K or so on the mail size to try to keep the third
 from being a problem.

Did that, it works (mostly, see below)...

 Usually (at least in my experience) the way a rule is written doesn't affect
 the spamd memory size.

Sorry, this is definitely WRONG!

If you write (like I once did) some rule containing spurious
'arbitrary long ..*-Constructs', the regex-automaton goes crazy
and a mail of 250k may need more than 250MByte memory per child,
instead of the currently seen near 80M.

Simply 'shortening' the possible evaluation of the expression by
replacing '..*' by .{1,N} (with 'N' a 'reasonably short' number)
shrunk the problem to manageable sizes!

Since then I never again used .+ or .* but ALWAYS limit the length.

Stucki