porn portal spammers v2

2009-09-25 Thread Guillaume Gelle

Dear all,

As usual, spammers improved and instead of receiving 
profiles|groups|personnal.yahoo.com links, now, I'm being hit with 
www.google.com/reader links.
(ie : A 
href=3Dhttp://www.google.com/reader/item/tag:google.com,2005:reader/ite=m/69a282969886af5e;Haste
 to come/A/FONT/DIV)

I took the firts rule submitted by SQL student (which worked great) and updated 
to this kind of google links :

uri  LOC_GOOGLE /^http:\/\/www.google[.,]com\/(reader)/i
scoreLOC_GOOGLE 0 2.2 0 2.2
describe LOC_GOOGLE Contains google.com/reader uri

Comments are welcome, this is the first rule I share with the SA community here 
:)

cheers,
Guillaume



_
La Suisse reçoit plus d'espace! Votre disque dur virtuel de 25 Go avec Windows 
Live SkyDrive.
http://skydrive.live.com

Re: Understanding SpamAssassin

2009-09-25 Thread LuKreme

On Sep 24, 2009, at 7:44 PM, poifgh wrote:
For 101st mail, if the regex MEDICINE is unable to match the  
obfuscated
text, then the mail would have a low score, but bayesian learner  
would say,

seeing the words surrounding obfuscated text, that this mail is spam.


Essentially this is how it works. Bayes looks for tokens in the  
messages and categorizes them as spam or ham depending on two factors,  
the overall score or the specific command line flag. If the score is  
high enough, then the message is learned as spam, which means all it's  
tokens are classified as spam. If the score is low enough, the message  
is learned as ham and its tokens are likewise classified as ham.  
Tokens that appear in both classes cancel out, and new messages are  
examined for tokens. Depending on how many there are of each type and  
(and this is the clever bit) how strong each is an indicator of  
spamishness/hamishness that is how the final bayes 'score' is weighted.


The reason the manual training is useful is that there is a wide range  
of score in-between auto-learn ham and auto-learn spam.


A bayes_50 is a neutral score, and this is generally seen as a 0  
weight score. However, in my experience quite a lot of emails with a  
bayes_50 are actually spam. Ham messages tend to score out lower,  
assuming your data is sufficiently large.


score BAYES_99 5.0
score BAYES_95 4.5
score BAYES_80 2
score BAYES_60 1.00
score BAYES_50 0.25
score BAYES_40 -0.50
score BAYES_20 -2.50
score BAYES_05 -3.50
score BAYES_00 -5.00

So yes, for me Bayes_99 is a poison pill, and 95 is close enough. I  
have very little hitting _80 or _60 or _40, so these scores are  
basically WAGs.


TOP SPAM RULES FIRED
RANKRULE NAME  %OFMAIL %OFSPAM  %OFHAM
   1BAYES_99 57.12   92.661.84
   2HTML_MESSAGE 78.17   79.89   75.51
   3URIBL_BLACK  43.66   70.761.49
   4RCVD_IN_JMF_BL   36.20   57.453.14
   5SPF_PASS 37.14   50.73   15.99
   6URIBL_JP_SURBL   28.99   47.560.10
   7URIBL_OB_SURBL   21.01   34.440.13
   8DKIM_SIGNED  31.58   31.10   32.33

TOP HAM RULES FIRED
RANKRULE NAME  %OFMAIL %OFSPAM  %OFHAM
   1AWL  45.92   19.29   87.37
   2HTML_MESSAGE 78.17   79.89   75.51
   3BAYES_00 21.300.08   54.31
   4RCVD_IN_JMF_W16.630.78   41.29
   5DKIM_SIGNED  31.58   31.10   32.33
   6DKIM_VERIFIED25.13   23.44   27.77
   7BAYES_50 11.881.94   27.36
   8SPF_PASS 37.14   50.73   15.99

Now, this is misleading here because this is looking at the spammed  
log, and when ti gets right down to searching, a large number of  
BAYES_50 messages will end up being classified as spam.


Other surprises are that DKIM is pretty useless and SPF_PASS is  
actually a slight spam indicator.


--
if you ever get that chimp of your back, if you ever find the thing
you lack, ah but you know you're only having a laugh. Oh, oh
here we go again -- until the end.



Re: Two more SA/MySQL questions.

2009-09-25 Thread Benny Pedersen

On fre 25 sep 2009 00:49:36 CEST, LuKreme wrote
Where that bayes_user let me store the email address for the  
MySQL/postfixadmin users individually.


id is a map to bayes_vars where you find username for the id

that way more then one email user can share one id user in bayes

So, if I have us...@example.com and us...@example.org their bayes  
would be saved and checked versus only their own data.


Make sense?


in that case you need to make shared user id, not just sitewide id, it  
have nothing to do with how postfixadmin see and manage things for you


--
xpoint



Re: Understanding SpamAssassin

2009-09-25 Thread Benny Pedersen

On fre 25 sep 2009 09:58:41 CEST, LuKreme wrote
Other surprises are that DKIM is pretty useless and SPF_PASS is  
actually a slight spam indicator.


you miss the point, there is no USER_IN_*

so without some whitelist_from_* dkim and spf will not be helpfull

if it was so you will have gived spammers a free ride, what you wanted ?

--
xpoint



Re: Two more SA/MySQL questions.

2009-09-25 Thread Benny Pedersen

On tor 24 sep 2009 23:06:56 CEST, Jari Fredriksson wrote

Bayes tables do not have user id or user name,so I guess they are  
meant for global: no per user bayes no.



CREATE TABLE `bayes_token` (
  `id` int(11) NOT NULL default '0',
  `token` char(5) NOT NULL default '',
  `spam_count` int(11) NOT NULL default '0',
  `ham_count` int(11) NOT NULL default '0',
  `atime` int(11) NOT NULL default '0',
  PRIMARY KEY  (`id`,`token`),
  KEY `bayes_token_idx1` (`token`),
  KEY `bayes_token_idx2` (`id`,`atime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE `bayes_vars` (
  `id` int(11) NOT NULL auto_increment,
  `username` varchar(200) NOT NULL default '',
  `spam_count` int(11) NOT NULL default '0',
  `ham_count` int(11) NOT NULL default '0',
  `token_count` int(11) NOT NULL default '0',
  `last_expire` int(11) NOT NULL default '0',
  `last_atime_delta` int(11) NOT NULL default '0',
  `last_expire_reduce` int(11) NOT NULL default '0',
  `oldest_token_age` int(11) NOT NULL default '2147483647',
  `newest_token_age` int(11) NOT NULL default '0',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `bayes_vars_idx1` (`username`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;


id match in both tables gives the username in bayes vars

but yes some places there is only system wide bayes, its so in amavisd :(

but sa it self have pr user bayes awl userprefs

--
xpoint



Re: Two more SA/MySQL questions.

2009-09-25 Thread Benny Pedersen

On tor 24 sep 2009 22:57:13 CEST, LuKreme wrote
Is there a write-up/how-to anyone's put together about setting up  
bayes with MySQL?


please read some docs first


Is it possible to migrate existing bayes to MySQL,


this is well explained in docs how to use --backup --restore in  
sa-learn --help


it just would be nice it also could do awl


or do you simply start over? Does using MySQL bayes allow you to
fake per-user bayes with MySQL-based users?


sa-learn --help see the --username

just dump as regulary user (NOT root) and restore into sql with  
--username should do it


--
xpoint



Re: Understanding SpamAssassin

2009-09-25 Thread Mark Martinec
LuKreme wrote:
 Other surprises are that DKIM is pretty useless and SPF_PASS is
 actually a slight spam indicator.

Benny Pedersen wrote:
 so without some whitelist_from_* dkim and spf will not be helpfull

Indeed. Score points should be kept close to zero for rules
DKIM_SIGNED, DKIM_VALID and DKIM_VALID_AU (or DKIM_VERIFIED in pre-3.3).

The value of DKIM verification does not come from score points of these
informational rules directly, but from derived rules: from DKIM-based
whitelisting and from fraud protection (DKIM_ADSP_* rules with their
associated 'adsp_override' in 3.3.0, or hand written rules in pre-3.3).

  Mark


Re: porn portal spammers v2

2009-09-25 Thread McDonald, Dan
On Fri, 2009-09-25 at 09:30 +0200, Guillaume Gelle wrote:
 Dear all,
 
 As usual, spammers improved and instead of receiving profiles|groups|
 personnal.yahoo.com links, now, I'm being hit with
 www.google.com/reader links.
 (ie : A
 href=3Dhttp://www.google.com/reader/item/tag:google.com,2005:reader/ite=m/69a282969886af5e;Haste
  to come/A/FONT/DIV)
 
 I took the firts rule submitted by SQL student (which worked great)
 and updated to this kind of google links :
 
 uri  LOC_GOOGLE /^http:\/\/www.google[.,]com\/(reader)/i

Why the parentheses?  You only have one option, so parentheses are just
additional logic.  You've also used the wrong sort of parentheses - (?:)
should be used to avoid enabling backtracking, since backtracking causes
significant performance impact...

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
www.austinenergy.com


signature.asc
Description: This is a digitally signed message part


3.3.0 and sa-compile

2009-09-25 Thread to...@starbridge.org
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,
i'm running SA 3.3.0 (3.3.0-alpha3-r808953) and i've some problem with
compiled rules.

sa-compile runs without errors, and SA seems to works fine when restarted.
But some body rules are now not detected.

exemple of simple body rule (for testing):

body TONIO_SPAM_TEST/toniospam/i
describe TONIO_SPAM_TESTMentions Generic toniospamtest
score   TONIO_SPAM_TEST 5

if i commented out
loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
in v320.pre, rules is working again.

I've tested with SA 3.2.5 and it's working fine with Rule2XSBody active.
I've tried to delete compiled rules and compile again: same result.

Some info on my environnement:
debian testing
xsubpp version 2.200401 (from debian perl package)
re2c version 0.13.5-1

Thanks for your help
Regard
Tonio



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkq8q6sACgkQ8FtMlUNHQIN9zwCg3s5HNL7DKUBRo8fTLbD6BsqV
aWMAoLnDI/+eABGST8KEG5todvABFUSF
=aDLE
-END PGP SIGNATURE-



Re: Two more SA/MySQL questions.

2009-09-25 Thread LuKreme

On 25-Sep-2009, at 03:14, Benny Pedersen wrote:


On fre 25 sep 2009 00:49:36 CEST, LuKreme wrote
Where that bayes_user let me store the email address for the MySQL/ 
postfixadmin users individually.


id is a map to bayes_vars where you find username for the id


But that ID would simply be the vpopmail user, not the individual  
email addresses that are in the MySQL map, right?



that way more then one email user can share one id user in bayes


Which is exactly what I want to avoid.

So, if I have us...@example.com and us...@example.org their bayes  
would be saved and checked versus only their own data.


Make sense?


in that case you need to make shared user id, not just sitewide id,  
it have nothing to do with how postfixadmin see and manage things  
for you


How does a shared UID separate the individual accounts in the MySQL  
map into individual IDs in the MySQL bayes?





--
Love seeketh not itself to please Nor for itself hath any care But
for another gives its ease And builds a heaven in Hell's
despair



Re: Understanding SpamAssassin

2009-09-25 Thread LuKreme

On 25-Sep-2009, at 03:56, Mark Martinec wrote:
LuKreme wrote:

Other surprises are that DKIM is pretty useless and SPF_PASS is
actually a slight spam indicator.


Benny Pedersen wrote:

so without some whitelist_from_* dkim and spf will not be helpfull


Indeed. Score points should be kept close to zero for rules
DKIM_SIGNED, DKIM_VALID and DKIM_VALID_AU (or DKIM_VERIFIED in  
pre-3.3).


As they are, and I never said anything differently. I don't know where  
Benny got he idea I was giving spammers a 'free ride.'


I meant to say pretty useless on its own.




--
I think it's the duty of the comedian to find out where the line is
drawn and cross it deliberately.



Re: 3.3.0 and sa-compile

2009-09-25 Thread Benny Pedersen

On fre 25 sep 2009 13:38:19 CEST, to...@starbridge.org wrote


I've tested with SA 3.2.5 and it's working fine with Rule2XSBody
active. I've tried to delete compiled rules and compile again: same
result.


forget to sa-compile in 3.3 ?

--
xpoint



Re: 3.3.0 and sa-compile

2009-09-25 Thread to...@starbridge.org
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Benny Pedersen a écrit :
 On fre 25 sep 2009 13:38:19 CEST, to...@starbridge.org wrote

 I've tested with SA 3.2.5 and it's working fine with Rule2XSBody
 active. I've tried to delete compiled rules and compile again:
 same result.

 forget to sa-compile in 3.3 ?

sa-compile has been run correctly with no errors (even in debug)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkq8syMACgkQ8FtMlUNHQIP2+wCdGgQpUR/MLpHT8hdW+4ooARAP
bYIAnRHf2xM7QeUE1HGWirN3OTTovnVW
=tO8Q
-END PGP SIGNATURE-



Re: Two more SA/MySQL questions.

2009-09-25 Thread Benny Pedersen

On fre 25 sep 2009 13:50:45 CEST, LuKreme wrote


How does a shared UID separate the individual accounts in the MySQL
map into individual IDs in the MySQL bayes?


it can only be done with unix users so far, eg have 2 unix users share one
id in spamassassin set same username in user_prefs

and this is imho why amavisd also can only do site wide bayes, not pr user

i dont use vpopmail so it might not be how it works in your setup

a dream is to have for me amavisd with spamassassin sql based user prefs

--
xpoint



Re: 3.3.0 and sa-compile

2009-09-25 Thread Matt Kettler
to...@starbridge.org wrote:
 Benny Pedersen a écrit :
  On fre 25 sep 2009 13:38:19 CEST, to...@starbridge.org wrote

  I've tested with SA 3.2.5 and it's working fine with Rule2XSBody
  active. I've tried to delete compiled rules and compile again:
  same result.
  forget to sa-compile in 3.3 ?

 sa-compile has been run correctly with no errors (even in debug)



Re: partial (lazy) scoring? - run a second time?

2009-09-25 Thread ArtemGr
Benny Pedersen me at junc.org writes:
 fuzzyocr stop scanning if spam score is over a limit, why scan
 ocr when spamassassin can do it without ocr ?

Good to know.

 it could maybe be a option to make spamassassin stop scanning in  
 generic if spam score is high ?
 
 there is alot of plugins that basicly just does digest match and
 check remote if found elsewhere, only diff is how digest is done
 and for what
 
 my former own mailhost have changed from spamassassin to dspam, less
 work for him and his users, and definaly lees work for his low budget
 quad xeon intel server with 6Gb ram and alot of disks/diskspace

Mmm. DSPAM seems to be just another Bayesian filter, no?
I'm now using j-chkmail milter for DNSBL and URI DNSBL checks, then CRM114.





Re: Understanding SpamAssassin

2009-09-25 Thread Bowie Bailey
poifgh wrote:
 Bowie Bailey wrote:
   
 For auto-learning, the high and low scoring messages are fed to Bayes. 
 However, for an optimal setup, you should manually train Bayes on as
 much of your (verified) ham and spam as possible.  The more of your mail
 stream Bayes sees, the better the results will be.

 Your description of Bayes is pretty close.  It breaks down the message
 into tokens (words and character sequences) and then keeps track of
 how likely each of those tokens is to appear in either a ham or spam
 message.  When a new message comes in, Bayes breaks it into tokens and
 then scores it depending on which tokens were found in the message.

 

 Suppose we do not have manual Bayesian training. We only do online training
 in which high and low scoring mails are fed to the learner [is the a usual
 thing to do? How many people manually train their bayesian filter?]
 A high scoring spam is then fed to the learner. The spam is high scoring
 since a few rules [regex] matched. Now the bayesian leaner would learn all
 the tokens from this mail. Next time a mail [say M] with similar tokens is
 seen, it would be flagged as spam [using bayes rule]. why would bayesian
 learning be needed for us to say M is spam. Since it contains very much
 similar words like earlier high scoring mails, shouldnt we expect the regex
 rules to work for M as well? - since M is very much similar to those mails
 from which we learnt from ?
   

Look at it this way -- Bayes is learning what your spam looks like and
what your ham looks like.  Most of your spam will be caught by other
rules, but there are times when an email will come in that the main
rules do not catch.  Bayes is frequently able to catch these because it
is looking at the message as a whole rather than looking for particular
words or phrases as the main regex rules do.

Manual training is not strictly required for Bayes, but the more manual
training you do, the higher the accuracy and the more useful it
becomes.  At the least, you should manually train Bayes on all of your
false positives and false negatives.  This can be scripted to happen
automatically based on folders which are expected to contain hand-sorted
spam and ham.

 Here is how I think bayesian is helpful [which could be be entirely my
 misunderstanding]. Suppose a set of spam mails look like

 Please buy M3d1C1NE X at store Y for cheap. 

 Now spammers have obfuscated word medicine in the mail. Spammers send, say
 a thousand spam each having a different way in which medicine is spelt
 out, but all the other words around it remain nearly the same. Only some of
 the first 100 of these mails would hit [say if there exists] a MEDICINE rule
 [regex]. Those particular mails would have high spam scores and hence the
 bayesian filter would learn that mails containing words Please, buy,
 at, store, for, cheap corresponds to have a high spam probability.  

 For 101st mail, if the regex MEDICINE is unable to match the obfuscated
 text, then the mail would have a low score, but bayesian learner would say,
 seeing the words surrounding obfuscated text, that this mail is spam.

 Does it work this way? Does it work only this way [if not manually trained]? 
   

That is a pretty fair description of how it works regardless of how you
train it.  The advantage of manual training is that you allow it to
learn from the lower scoring spam (and higher scoring ham), which are
the kinds of messages that can most use the extra points from the Bayes
rules.

-- 
Bowie


RE: Report in header of SPAM emails

2009-09-25 Thread Luis campo

Hello 

if I restarted the spamd
 

greetings

 


 
 From: ja...@iki.fi
 To: users@spamassassin.apache.org
 Subject: Re: Report in header of SPAM emails
 Date: Thu, 24 Sep 2009 15:58:58 +0300
 
  dears Srs,
  
  
  I have added the option report_safe 1, but the mail
  deposited in the quarantine folder 
  not have any attached and SA report
  
  Do not use the amavis, if not the simscan
  
 
 Did you restart Spamd after the change?
  
_
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+worldmkt=en-USform=QBRE

Re: [SA] MagicSpam

2009-09-25 Thread Adam Katz
Aaron Wolfe wrote:
 Even given a server that has these things, I'm surprised they 
 have invented technology that can analyze a postfix install to 
 the degree needed for correct installation of their product with 
 no more than a single click.  With tech like that, I can't 
 believe they haven't taken the world by storm.  Maybe they're 
 still working on single click world domination technology.

rich...@buzzhost.co.uk wrote:
 I have to totally agree. Postfix is *so* configurable that a single
  point  click installer is just nonsense. I don't think Postfix 
 'installation' as it is could be any easier than Debian: apt-get 
 install postfix. That's the easy bit. It's the configuration that 
 takes the skill.

I disagree.

This depends on the product's nature.  I believe MailChannels Traffic
Control does exactly that.  The one-click would be an RPM/DEB
package (or an actual GUI installer like BitRock) for Windows-style
sysadmins who need a GUI while other sysadmins would be able to
install with rpm or dpkg (or an included install binary/script) with a
single command.  All Traffic Control does is sit in front of the mail
server and act as a discriminating proxy.

Having not read any of MagicSpam's documentation, I can only assume
that their product acts somewhat similarly, directly intercepting
incoming mail as if it were the server, then doing some kind of
hand-off to the real mail server.  For 90+% of the users out there, no
configuration options would be needed, and for a good number of the
rest, a few menus could handle the bits that can't be resolved themselves.

Traffic Control's selective tarpits are enough to stop almost all
incoming spam, and the rest can be handled by a filter-based program
like SpamAssassin.  MagicSpam might do something similar.
Milter-greylist (which is outgrowing its name -- it now supports SPF,
DKIM, SpamAssassin, ...) currently has tarpitting in development.

 Fair play to Linuxmagic if they can offer the support - which is 
 what corporates want. Selling cobbled together open source is 
 nothing new.

Of course, the key to any of this is good support.  I suspect
MagicSpam uses their own (patented) technology too, but that really
has nothing to do with this since it's quite clear that a supported
F/OSS spam-fighting bundle is itself quite profitable.

-- 
Adam Katz
khopesh on irc://irc.freenode.net/#spamassassin
http://khopesh.com/Anti-spam


Re: Two more SA/MySQL questions.

2009-09-25 Thread Kris Deugau

LuKreme wrote:
But that ID would simply be the vpopmail user, not the individual email 
addresses that are in the MySQL map, right?


It maps to whatever username spamc is passed, or the *nix UID 
spamassassin/spamc finds automatically.  (I would *hope* you're using 
spamd/spamc in an environment where you're using vpopmail...  g)


If you don't specify a calling user, all of your spamc calls are to the 
same logical username (looked up via *nix UID in the system password 
file), and SA has no way to know which user you're filtering for.


If your spamc calls are at delivery and *do* refer to the real user 
(email address), you should already have per-user Bayes running by 
default.  (At least, as far as I can see in the docs, and from a mild 
panic here trying to figure out why the Bayes tables were growing wildly 
but didn't seem to be doing anything.)


How does a shared UID separate the individual accounts in the MySQL map 
into individual IDs in the MySQL bayes?


It only relies on *nix UID users if you don't explicitly pass a username 
to spamc.  spamd uses whatever username spamc passes in, and uses the 
listing in bayes_vars to restrict which bayes_tokens entries it looks at.


We call spamc with -u recipient, and run spamd with -x -q to enable 
SQL.  Historically, we haven't used userprefs on this system (and the 
preprocessed white/black entries handle most of that) but with some of 
the legacy systems we've imported I've started to add a few sets of 
userprefs in SQL.


IMO spamd has something of an awkward limitation for some setups where 
you can't have X as on-disk files, and Y as SQL, or you can't do 
certain types of virtual users with some types of userpref/Bayes/AWL/etc 
storage.  I thought of a specific edge case a while ago but I can't 
recall it at the moment.


-kgd


Re: partial (lazy) scoring? - run a second time?

2009-09-25 Thread RW
On Fri, 25 Sep 2009 12:32:35 + (UTC)
ArtemGr artem...@gmail.com wrote:

 Benny Pedersen me at junc.org writes:

  my former own mailhost have changed from spamassassin to dspam, less
  work for him and his users, and definaly lees work for his low
  budget quad xeon intel server with 6Gb ram and alot of
  disks/diskspace
 
 Mmm. DSPAM seems to be just another Bayesian filter, no?

It's not simply a standalone filter like SpamAssassin or Bogofilter,
it's an integrated system with quarantine and a web-based user
interface, which is why it appeals to many admins. You can use it as
standalone filter, but they don't go out of their way to make that
intuitive.


  but i keep away from dspam, it is to unstable for me, but it might
  just be me as always :)

The original developer sold it to a commercial company who did little
with it, and it got into a bit of a mess. Earlier in the year a new
project was started to maintain it, so hopefully it'll improve.



Re: Report in header of SPAM emails

2009-09-25 Thread Jari Fredriksson
 Hello
 
 if I restarted the spamd
 

You need more words when communicating ;)

Does this mean the report still does not show up, even if you restarted SA 
after the setting?

Well. SpamAssassin does not have a Quarantine. Something in your system has, 
and that is what is calling the SpamAssassin.

How do you call SpamAssassin?

1. Do you have spamd up and running?
2. Is it called by spamc?
3. Do you have amavisd-new?
4. Do you have simscan?
5. How does the mail get processed by SpamAssassin in your system?
6. Which software puts the spam into Quarantine folder? SpamAssassin it is not.


Use message size in a rule?

2009-09-25 Thread Rich Graves
For HTML content, we can check the length with

 eval:html_eval('length', ' 384')

but I don't see anything similar for body or rawbody. For my purposes, the 
Content-Length from the spamc connection would do, but it doesn't seem to be 
exposed.

I see at least two plugins looking at length:

 ImageInfo.pm:  my $textlen = length(join('',@$body));
 TextCat.pm:  my $len = length($body);

but it seems a waste to make multiple in-memory copies a large message just to 
see how big it is.

The bigger picture: I'm working on some ISP/.edu phishing rules inspired by the 
old 419 rules... lots of words and short phrases indicating an attempt to get 
our account information (either through email or free web form sites), and a 
meta rule that fires only if there are several hits. Due to the risk of false 
positives on long messages, I'd only like to apply the rules to messages with 
short bodies. 
-- 
Rich Graves http://claimid.com/rcgraves
Carleton.edu Sr UNIX and Security Admin
CMC135: 507-222-7079 Cell: 952-292-6529


Some messages not being checked by spamassassin - most are but a few get through

2009-09-25 Thread jmunjr

Hi I am new here. Thanks for the site and all the contributions you have all
made.

Read some of the posts but can't seem to figure this out.

The majority of email coming into my server (Centos5.2, postfix, dovecot)
gets scanned and get the X-Spam-Status inserted just fine by spamassassin
3.2.4 but several get through with no spam header inserted.

Any reasons why?

What do I need to do to diagnose this?  What logs, etc?  What to look for? 
How to fix?  I'm a novice at this but I am able to get around in Linux ok. 
If possible please provide details on any significant tasks/changes.

Thanks for your help.
-- 
View this message in context: 
http://www.nabble.com/Some-messages-not-being-checked-by-spamassassin---most-are-but-a-few-get-through-tp25615857p25615857.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: [sa] Some messages not being checked by spamassassin - most are but a few get through

2009-09-25 Thread Charles Gregory

On Fri, 25 Sep 2009, jmunjr wrote:

The majority of email coming into my server (Centos5.2, postfix, dovecot)
gets scanned and get the X-Spam-Status inserted just fine by spamassassin
3.2.4 but several get through with no spam header inserted.


The lack of header, by definition, means that spamassassin did not run.

So this would mean that your MTA or LDA (wherever you run SA) is
making a decision not to run SA against the message.
The most likely reason is a test that decides not to run SA
when the message is larger than a certain byte count.
But this is not an SA config issue, it would be in the script/program
that executes spamassassin or spamc first guess /etc/procmailrc.

- Charles



Re: [sa] Some messages not being checked by spamassassin - most are but a few get through

2009-09-25 Thread jmunjr

Thanks Charles.  Rats, I meant to say that the message sizes for these are as
small or smaller than other messages getting checked.  In fact I don't
believe procmail even has a filesize limit(something I need to change).

Any other thoughts ?

Thanks again



Charles Gregory wrote:
 
 On Fri, 25 Sep 2009, jmunjr wrote:
 The majority of email coming into my server (Centos5.2, postfix, dovecot)
 gets scanned and get the X-Spam-Status inserted just fine by
 spamassassin
 3.2.4 but several get through with no spam header inserted.
 
 The lack of header, by definition, means that spamassassin did not run.
 
 So this would mean that your MTA or LDA (wherever you run SA) is
 making a decision not to run SA against the message.
 The most likely reason is a test that decides not to run SA
 when the message is larger than a certain byte count.
 But this is not an SA config issue, it would be in the script/program
 that executes spamassassin or spamc first guess /etc/procmailrc.
 
 - Charles
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Some-messages-not-being-checked-by-spamassassin---most-are-but-a-few-get-through-tp25615857p25616884.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



RE: porn portal spammers v2

2009-09-25 Thread Guillaume Gelle


That's right, I should have remove the parenthese, they serve nothing here. 
It's more in case of something comes later and add some | after reader, etc.

 

Don't know what you mean by (?:) and backtracking tho, I'll double check the 
wiki page about syntax ;)

 

Thanks,

Guillaume

 

 

 
 Subject: Re: porn portal spammers v2
 Date: Fri, 25 Sep 2009 06:22:03 -0500
 From: dan.mcdon...@austinenergy.com
 To: users@spamassassin.apache.org
 
 On Fri, 2009-09-25 at 09:30 +0200, Guillaume Gelle wrote:
  Dear all,
  
  As usual, spammers improved and instead of receiving profiles|groups|
  personnal.yahoo.com links, now, I'm being hit with
  www.google.com/reader links.
  (ie : A
  href=3Dhttp://www.google.com/reader/item/tag:google.com,2005:reader/ite=m/69a282969886af5e;Haste
   to come/A/FONT/DIV)
  
  I took the firts rule submitted by SQL student (which worked great)
  and updated to this kind of google links :
  
  uri LOC_GOOGLE /^http:\/\/www.google[.,]com\/(reader)/i
 
 Why the parentheses? You only have one option, so parentheses are just
 additional logic. You've also used the wrong sort of parentheses - (?:)
 should be used to avoid enabling backtracking, since backtracking causes
 significant performance impact...
 
 -- 
 Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
 www.austinenergy.com

_
Hotmail: la nouvelle technologie anti-spam aide à bloquer les messages 
indésirables, ici vous pouvez régler votre filtre Spam.
http://mail.live.com/mail/options.aspx?subsection=4

Re: Some messages not being checked by spamassassin - most are but a few get through

2009-09-25 Thread John Hardin

On Fri, 25 Sep 2009, jmunjr wrote:

The majority of email coming into my server (Centos5.2, postfix, 
dovecot) gets scanned and get the X-Spam-Status inserted just fine by 
spamassassin 3.2.4 but several get through with no spam header inserted.


Any reasons why?


Message size. How big are the messages that don't get scanned? spamc will 
not send messages larger than a configurable maximum size to spamd.


System overload. How heavily loaded is the system doing the scanning - 
especially, is it swapping?


Possibly DNS timeouts. Do you have a caching nameserver set up locally, 
and is SA configured to use that nameserver?


System tasks, like automatic restart of spamd after running sa_update. If 
a message it receiver during the brief window spamd is down, it may be 
passed unscanned.


What command-line arguments are being used with spamc?

Check your MTA log (e.g. /var/spool/maillog) for the time around when the 
message was received, and for the message's Message-ID. That may yield 
some clues about what happened.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Windows Vista: Windows ME for the XP generation.
---
 Approximately 8856840 firearms legally purchased in the U.S. this year


Re: [sa] Some messages not being checked by spamassassin - most are but a few get through

2009-09-25 Thread Charles Gregory

On Fri, 25 Sep 2009, jmunjr wrote:

Thanks Charles.  Rats, I meant to say that the message sizes for these are as
small or smaller than other messages getting checked.  In fact I don't
believe procmail even has a filesize limit(something I need to change).
Any other thoughts ?


If procmail invokes SA, look at rules in procmailrc that may be delivering 
the mail before it gets to the rule with SA whitelist rules?


- C


RE: porn portal spammers v2

2009-09-25 Thread John Hardin

On Fri, 25 Sep 2009, Guillaume Gelle wrote:

Don't know what you mean by (?:) and backtracking tho, I'll double check 
the wiki page about syntax ;)


Try this:

 uri URI_GOOG_READER m;^https?://(?:www\.)?google[\.,]com/reader/;i


On Fri, 2009-09-25 at 09:30 +0200, Guillaume Gelle wrote:

now, I'm being hit with www.google.com/reader links. (ie : A 
href=3Dhttp://www.google.com/reader/item/tag:google.com,2005:reader/ite=m/69a282969886af5e;Haste 
to come/A/FONT/DIV)


I took the firts rule submitted by SQL student (which worked great)
and updated to this kind of google links :

uri LOC_GOOGLE /^http:\/\/www.google[.,]com\/(reader)/i


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Windows Vista: Windows ME for the XP generation.
---
 Approximately 8856840 firearms legally purchased in the U.S. this year


Re: Report in header of SPAM emails

2009-09-25 Thread Martin Gregorie
On Fri, 2009-09-25 at 19:08 +0300, Jari Fredriksson wrote:

 Well. SpamAssassin does not have a Quarantine. Something in your
 system has, and that is what is calling the SpamAssassin.
 
I already told him that simscan is doing the quarantining. Thats
definite.

I'm also 98% certain that simscan quarantines the original message and
discards the version returned by spamc. This is why the report_safe has
no effect.

Somebody might care to read the simscan code and confirm my analysis.
simscan is a single, fairly straight forward C source file. However, the
OP can easily prove this by inspecting or posting a quarantined message
to pastebin: if I'm right the quarantined message won't contain any SA
headers added by his host.


Martin




Re: Some messages not being checked by spamassassin - most are but a few get through

2009-09-25 Thread John Hardin

On Fri, 25 Sep 2009, John Hardin wrote:

System tasks, like automatic restart of spamd after running sa_update. If a 
message it receiver during the brief window spamd is down, it may be passed 
unscanned.


Ugh.

... if a message IS RECEIVED ...

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Windows Vista: Windows ME for the XP generation.
---
 Approximately 8856840 firearms legally purchased in the U.S. this year