Re: Spamassassin not tagging some emails

2009-10-23 Thread MySQL Student
Hi,

 SpamAssassin DOES NOT bypass scanning, if the internal or trusted
 networks contain the server in it.

Hmm.. thanks for correcting me.

How would you, then, go about preventing SA from scanning the
localhost or a specific domain without whitelisting that domain or
range?

Thanks,
Alex


Re: Email / Inbox Speed Problems

2009-10-23 Thread MySQL Student
Hi,

I really hate to respond to this because it's so off-topic (how long
did it take you to write that email, anyway?), but you're s
missing the point that I just can't let it go, and it's slow on a late
Friday night.

 Yet, you open up a new Mac and what's inside?  A PC motherboard and
 processor, that's what there is!!!  You can even boot OSX on a PC

That's not the point. Haven't you ever bought a bottled water, or
spoken with someone that has, because it tastes better? It's all in
the marketing. Apple caters to people that just don't care that it's a
PC inside.

 Yet, Apple's response to the Open Source community is APSL 2.0 which
 is incompatible with GPL.  And do you think that anyone in a Mac store

That's a different issue. There's no business model for corporations
like Adobe building open source apps for the PC, let alone the Mac
where the userbase is even smaller.

 Your amazed WE have Mac customers?!?  At least WE try to EDUCATE them
 so they aren't stuck with Apple sticking it to their wallets.  I'm
 amazed that ANY Mac-specific retailer, much less APPLE, has ANY Mac
 customers.

You had mentioned someone jammed a screwdriver into the computer and
broke it, and you really think they care about going to Fry's to buy a
replacement hard disk? They just don't care. They want it to just
work. Who cares that the mouse is $30? They buy them for the
convenience, the looks, the infamous support for multimedia, and the
ease-of-use. They buy them because it's a single point of contact.
They buy them because someone can make the choice for them, and they
can get on with doing things other than worry about the details of the
computer and just start using it.

Best,
Alex


Re: Elena wants an iron cast oven

2009-10-22 Thread MySQL Student
Hi,

 What's the business model of this scam? I can't believe they really want
 millions of iron cast ovens from all around the world. Maybe I should
 answer and ask directly ;D

 Long time since I've last seen one of these...

 My impression was, they want money of course. The victim falling for it

Yes, follow the money. It's always about the money. The oven ploy is
just weird enough to attract your attention in hopes of garnering a
response.

Regards,
Alex


Re: Elena wants an iron cast oven

2009-10-22 Thread MySQL Student
Hi,

 http://englishrussia.com/?p=2137

 plenty of abandoned scrap metal already in Russia.

Maybe they could blow it up like the brain surgeons did to that dead
whale that was littering the beach in Oregon?

# The Infamous Exploding Whale
http://www.youtube.com/watch?v=8Vmnq5dBF7Y

Alex


Re: Spamassassin not tagging some emails

2009-10-22 Thread MySQL Student
Hi,

On the message that should have been scanned:

 The emails that has not been tagged at all:

[...]
 From: Angus - 3idea angus.d...@3idea.com
 To: supp...@3idea.com

Are you forwarding this spam from your internal account to this other
internal supp...@3idea.com account? It also looked like there was no
external mail server involved.

If so, I would think that SA trusts your internal network, and
therefore is just passing the message through without even evaluating
it. If you want your internal mail to also be scanned, remove your
mail server from trusted_networks and internal_networks.

I think that should fix it.

Regards,
Alex


hostkarma/uribl_black disparity

2009-10-22 Thread MySQL Student
Hi,

Over the past few days I have been investigating more closely email
that wasn't tagged that I thought should have been, and vice-versa,
using various factors, such as URIBL_BLACK and JMF_W. I'm very
surprised that obvious hosts are on the URIBL_BLACK list, like
receiveeweek.com.

Even more interesting is a bunch of FNs that contain both URIBL_BLACK
and JMF_W. I'm not sure which is correct in many cases, because they
are not always so cut-and-dried. For example, there was a Citi Bank
email (whitelisted) that happened to use an image server
(csnimages.com) that is in URIBL_BLACK.

While I don't think that particular email should have been tagged as
spam, it's only an example, and I hoped someone would be interested
enough to check out a list I created with these types of disparities
I've had over the last day or so.

It's too long to include here, so I've created a pastebin for it:

http://pastebin.com/m4a1561b5

I realize this type of thing could happen for many reasons, not the
least of which is an otherwise-legitimate host that has been
compromised and now used to send spam. However, many on my list are
quite persistent, like blr-events.com and eturbonews.com, which I have
no idea whether it is legitimate or bogus.

Whatever the case, there are definitely mistakes, and I'd like to help
correct them.

Ideas appreciated. I'd be glad to gather more info if necessary.

Thanks
Alex


Re: Is there a WANTS_MY_INFO rule?

2009-10-17 Thread MySQL Student
Hi,

 In order to confirm you Web-Mail identity, you are to provide the
 following data;

 First Name:
 Last Name:
 Username/ID:
 Password:
 Date of Birth:

Try John Hardin's fillform:

http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhardin/?sortby=date

Regards,
Alex


Downloading sandbox rules

2009-10-17 Thread MySQL Student
Hi,

I'd like to download a few of the rules from the SVN sandbox for
testing without using svn for this. It used to be possible by clicking
Download but in the last week or so the site was updated and that
option is no longer available. Do I have to use svn now for this?

http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/

Thanks,
Alex


Re: Downloading sandbox rules

2009-10-17 Thread MySQL Student
Hi,

Sorry, just after I sent this I saw the message from yesterday about using svn.

Thanks,
Alex

On Sat, Oct 17, 2009 at 1:24 PM, MySQL Student mysqlstud...@gmail.com wrote:
 Hi,

 I'd like to download a few of the rules from the SVN sandbox for
 testing without using svn for this. It used to be possible by clicking
 Download but in the last week or so the site was updated and that
 option is no longer available. Do I have to use svn now for this?

 http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/

 Thanks,
 Alex



Re: Constant Contact

2009-10-17 Thread MySQL Student
Hi,

 rawbody  __CCM_UNSUB 
 /https?:..visitor\.constantcontact.com\/[^]{60,200}SafeUnsubscribe/

 Ouch!  Rawbody, that hurts.

Do you mean that it's much more resource-intensive than a regular
body check? When is it necessary (or possible) to use it over the
URIDetail substitute you mentioned?

For example, I have to use rawbody here because I'm searching within
HTML tags:

rawbodyDDN_SPAM_3   /\/.{5}\-.{4}\-.{3}\/.{5}\-.{4}\-.{3}\-1\.jpg
border=0\\\/a\\br\/
describe   DDN_SPAM_3   New DDN Spam
score  DDN_SPAM_3   2.201

However, I suspect it's pretty resource-intensive, and I have several
of them, along with dozens of rules like:

rawbody   __SARE_HTML_INV_TAG  /\w\!\w{18,60}\w/i^M

Is there a way to easily measure the overhead of a particular rule?
I'd love to find out which rules are consuming the most resources.

Certainly as the number of rules have increased, the constant load on
the server has increased. Does everyone systematically run sa-compile
on their rules?

Thanks,
Alex


Re: Constant Contact

2009-10-16 Thread MySQL Student
Hi,

 Does anybody here know anything about the legitimacy of Constant
 Contact http://www.constantcontact.com/anti_spam.jsp ?

 Sometimes abused, but too legit to outright block based on sending IP, imo.

In addition to constantcontact, can I add the following to the list of
hosts I'd like people's input on as to whether it's spam:

- blueskycommunications.com
- pm0.net
- topica.com

I believe topica.com is very similar to constantcontact in that they
send bulk mail for small businesses, and don't necessarily care what
they send. The emails typically contain something like You may be
eligible for a cash advance and a URL like
macho-man-fitness.c.topica.com that is just a redirect to something
like cashadvancenow.com.

It's only on URIBLS grey list.

Thanks,
Alex


Re: Constant Contact

2009-10-16 Thread MySQL Student
Hi,

 How is Constant Contact better than (say) GNU mailman for that purpose? I
 don't understand the concept of sending internal mail via an external third
 party...

In addition to what's already been mentioned, CC also provides a nice
template that people can drop their message into and click Send.
This is very appealing to the local bagel shop or restaurant that
wants to advertise their specials to their favorite customers without
even having an Internet connection of their own.

I don't doubt that if you solicited to these types of businesses with
your mailman product and the ability to add their logo to the top of
an HTML email, they'd choose your service just the same.

Best,
Alex


Re: sneaky pharma spam shooting past standard rules

2009-10-15 Thread MySQL Student
Hi,

 With this:

      Received: from public30108.xdsl.centertel.pl (HELO
 marcin-8963fd6f) (79.163.117.156)

 my postfix setup would have simply dropped it on the floor at the
 HELO/EHLO. If it doens't HELO with an FQDN and a proper rDNS, we don't
 talk to it.

Kurt, can you explain how you're doing it with postfix?

Thanks,
Alex


Re: sneaky pharma spam shooting past standard rules

2009-10-15 Thread MySQL Student
Hi,

 smtpd_helo_restrictions = permit_mynetworks,
        reject_invalid_helo_hostname,
        reject_non_fqdn_helo_hostname,
        permit

I'm currently using reject_non_fqdn_sender and
reject_non_fqdn_recipient. I wanted to be sure I should use the two
helo restrictions you've listed above in addition to the ones I'm
already using, correct?

Hopefully not too far off-topic now, but this is the total list of
restrictions I'm currently using:

smtpd_recipient_restrictions = permit_mynetworks,
reject_non_fqdn_sender, reject_non_fqdn_recipient,
reject_unknown_sender_domain, reject_unknown_recipient_domain,
check_client_access hash:/etc/postfix/client_access,
reject_unauth_destination, check_recipient_access
pcre:/etc/postfix/relay_recips_access,  reject_unauth_pipelining,
reject_invalid_hostname

Thanks,
Alex


Re: Hostkarma whitelist needs something..

2009-10-14 Thread MySQL Student
Hi,

  http://www.impsec.org/jhardin/antispam/

This should be:

http://www.impsec.org/~jhardin/antispam/

(note the missing tilde :-)

Regards,
Alex


Mismarked Ham

2009-10-14 Thread MySQL Student
Hi,

I thought I would look through the quarantine for BAYES_00 to see if
there were any mis-marked messages or if bayes was not firing
correctly, and I have found a few, although not how I expected it
would be.

Instead of finding BAYES_00 in spam, I've found it in ham that was
pushed over the threshold to spam because of other rules. Here are the
headers from one such instance:

http://pastebin.com/m6c3cd5e3

exxample.com is my obfuscation. It was an HTML email with two small
GIF attachments that were a basic background image and two links to
youtube videos of a religious Muslim ceremony in Arabic with English
subtitles. All indications are that bayes is correct and it's ham.

Which rule(s) is then incorrect? What is the right solution here? Is
the only option to whitelist the user?

Thanks,
Alex


Re: Mismarked Ham

2009-10-14 Thread MySQL Student
Hi,

 What makes you think any of the rules are incorrect? A score of 6.1 is not
 100% (or even 99%, IIRC) spam.

Incorrect in that at least one of the rules fired when they should not
have, making the valid email to be marked as spam.

 there's a couple of things here.

 First, for some reason you have DKIM_SIGNED but not DKIM_VERIFIED, which
 seems odd as this looks like a legit gmail message with a legit DKIM
 signature. So there's one thing to check.

Why is that? How do I go about figuring that out?

 I'm not sure which of those scored what. Then there is the fact that your
 custom rule  L_UNVERIFIED_GMAIL hit. If that's the same rule I see in the
 list archives, that scored 2.5 and pushed this email firmly into being
 tagged as spam.

Yes, that looks like it. It was posted by Dan McDonald on August 25th
to the list. It's a meta:

meta L_UNVERIFIED_GMAIL  !DKIM_VERIFIED  __L_FROM_GMAIL  !__L_VIA_ML
priority L_UNVERIFIED_GMAIL  500
scoreL_UNVERIFIED_GMAIL  2.5

I've set it to 0.5 for now. Ideas on tracking down the DKIM_VERIFIED
issue would be appreciated.

 Maybe adjust that score, or adjust the assumptions that caused that rule to
 be added to your config?

 This IS a gmail message, right? So your unverified-gmail custom rule is in
 error.

Yes, that's correct. I think you've identified the root of the problem.

Thanks so much.
Best regards,
Alex


Re: Mismarked Ham

2009-10-14 Thread MySQL Student
Hi,

 I'm not sure which of those scored what. [...]

 Seconded. I do see quite a few custom rules. How much did they score?

My apologies; I hadn't realized so much of it was non-standard. It's
otherwise obviously not very possible to help without knowing what the
rules are for if you haven't seen them. I've re-run the spam through
SA. It looks like the bayes score has now changed, now making the
score 8.2. I've also reduced the L_UNVERIFIED_GMAIL down to 0.5 from
2.5.

X-Spam-Report:
*  2.0 RELAYCOUNTRY_HIGH Relayed by a country thats a bad spam source
*  0.0 RELAYCOUNTRY_US Relayed through United States
*  1.0 EXTRA_MPART_TYPE Header has extraneous
Content-type:...type= entry
*  0.5 FREEMAIL_FROM Sender email is freemail
(learnlivelove[at]gmail.com)
* -0.0 SPF_PASS SPF: sender matches SPF record
* -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
*  0.0 DKIM_SIGNED Domain Keys Identified Mail: message has a signature
*  0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
*  [score: 0.5000]
*  1.1 TVD_FW_GRAPHIC_NAME_LONG BODY: TVD_FW_GRAPHIC_NAME_LONG
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  0.0 T_TVD_FW_GRAPHIC_ID1 BODY: T_TVD_FW_GRAPHIC_ID1
*  1.4 SARE_GIF_ATTACH FULL: Email has a inline gif
*  1.6 PART_CID_STOCK Has a spammy image attachment (by Content-ID)
*  0.5 L_UNVERIFIED_GMAIL L_UNVERIFIED_GMAIL

Should SARE_GIF_ATTACH be such a high value by default?

full SARE_GIF_ATTACH   /name=\?[0-9a-z._\-]{3,18}\.gif\?/i
describe SARE_GIF_ATTACH   Email has a inline gif
scoreSARE_GIF_ATTACH   1.42

I think this one might also be too aggressive by default?

meta PART_CID_STOCK
(__ANY_IMAGE_ATTACH__PART_STOCK_CID!__PART_STOCK_CL!__PART_STOCK_CD_F)
describe PART_CID_STOCK  Has a spammy image attachment (by Content-ID)

 Even more strange, there is a T_ prefixed rule, which of course is not
 stock. And generally used for NON-published rules still in evaluation.
 How did that one end up in there? What does it score?

That originated in updates_spamassassin_org/72_active.cf, so it's part
of the channel updates:

mimeheader T_TVD_FW_GRAPHIC_ID1 Content-Id =~
/[0-9a-f]{12}(?:\$[0-9a-f]{8}){2}\@/

Thanks,
Alex


Re: .cn Oddity

2009-10-11 Thread MySQL Student
Hi,

 We use some rules if we talk open about it and say hey this spammer is
 stupid look here, then it will take less then 12 hours and that gap is
 closed and we loose a valuable trick.

 yes its the way it is, spammers can also read maillists and adapt there
 spamming rules to get bypassed

It sounds like social engineering needs to be part of the attack
rules/strategy that we employ on these spammers :-)

Regards,
Alex


Re: Valid mail from blacklisted dynamic IPs

2009-10-10 Thread MySQL Student
Hi,

 I also don't understand how SPF_SOFTFAIL could happen when there
 wasn't any SPF record to test to begin with.

 http://www.openspf.org/
 i have no spf either
 http://old.openspf.org/wizard.html?mydomain=junc.orgsubmit=Go! :)

But it's sent from cron, so the host is localhost.

I definitely have to read more to learn why SPF would fail without an
SPF record. Maybe that's the whole point.

 what is the sender domain ?, why do users need to be sending to a
 pop_before_smtp ?

They are mostly on laptops or home connections with dynamic IPs. Roadwarriors.

 remember that ip could as very well be one single user ? (NAT and friend)

 have there isp forbid them to not being allowed to send mail ?

No, they haven't, and that's perhaps the best suggestion is to just
have them use their own ISPs mail server in the first place.

Thanks so much. Great suggestions.
Best,
Alex


Re: Valid mail from blacklisted dynamic IPs

2009-10-10 Thread MySQL Student
Hi,

 I have a set of users that are authorized to use the mail server via
 pop-before-smtp, but SA catches the mail they send through the system
 as spam because they are on blacklisted Verizon or Comcast IPs:

 why are they not using smtp authentication?

I think you're referring to SASL? Some time ago we had used it, but
the implementation was so buggy and was such a security nightmare that
we removed it, not thinking it would become so intrinsic to email on
the Internet in the future.

Kind of like the security fears people had about bind-4 back then.

Thanks,
Alex


Re: SA needs a new paradigm for rule structure

2009-10-09 Thread MySQL Student
Hi,

 What we need are rules that combine a lot of simple rules into concepts
 and then combine those rules into rules that score - and score big. As
 an example, [...]

 Yes, SA definitely needs that and sorely lacks this ultimate feature!

Can I respectfully add to this that John Hardin has already done what
I think you're describing in his lotsa_money and advance_fee rules:

http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhardin/

Regards,
Alex


Valid mail from .cn

2009-10-09 Thread MySQL Student
Hi,

Some portion of our users are from China. I hoped someone could help
me troubleshoot the best way to permit a user from .cn to forward mail
without improperly being tagged as spam, yet still block the majority
of spam from .cn.

Here's the SA report:

X-Spam-Report:
*  0.1 RELAYCOUNTRY_CN Relayed through China
*  2.0 RELAYCOUNTRY_HIGH Relayed by a country thats a bad spam source
*  1.0 EXTRA_MPART_TYPE Header has extraneous
Content-type:...type= entry
* -0.0 SPF_PASS SPF: sender matches SPF record
* -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
*  0.0 LOC_URI_CN URI: Contains CN URI
*  0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
*  [score: 0.5000]
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  0.0 T_TVD_FW_GRAPHIC_ID1 BODY: T_TVD_FW_GRAPHIC_ID1
*  1.8 MIME_BASE64_TEXT RAW: Message text disguised using
base64 encoding
*  1.5 MY_CID_AND_ARIAL2 SARE CID and Arial2
*  1.6 PART_CID_STOCK Has a spammy image attachment (by Content-ID)
*  1.5 MY_CID_AND_STYLE SARE cid and style
*  1.6 MY_CID_ARIAL_STYLE SARE cid arial2 style

Bayes could probably use a bit of work, but is there something that I
should be investigating based on this to improve the accuracy, or
should I just whitelist_from_rcvd the user since it's a minority of
valid accounts from China?

Even if I remove the RELAYCOUNTRY_HIGH meta, it's still over the 5.0 threshold.

Thanks,
Alex


Fwd: SA needs a new paradigm for rule structure

2009-10-09 Thread MySQL Student
Hi,

I sent this message more than an hour ago, and it looks like it's yet
to hit the list. Resending.

Thanks,
Alex

-- Forwarded message --
From: MySQL Student mysqlstud...@gmail.com
Date: Fri, Oct 9, 2009 at 2:34 PM
Subject: Re: SA needs a new paradigm for rule structure
To: SA Mailing list users@spamassassin.apache.org


Hi,

   What we need are rules that combine a lot of simple rules into concepts
   and then combine those rules into rules that score - and score big. As
   an example, [...]
 
  Yes, SA definitely needs that and sorely lacks this ultimate feature!

 Can I respectfully add [...]

 Whoa, dude! You just left the heavy sarcasm in, and snipped everything
 from the quote that clarifies this statement and identifies it as
 sarcasm.

Yes, I'm really sorry about that. I didn't think that it would not be
interpreted as sarcasm with the way I quoted it, but looking at it
now, I see that it might.

Best,
Alex


Re: Valid mail from .cn

2009-10-09 Thread MySQL Student
Hi,

 Could you ask them to provide ham samples for the automated masschecks?
  We currently have none in the corpus so we cannot test the safety of rules
 against Chinese language mail.

Yes, I know how important that is. I recall you mentioning that a few
days ago. I think it would be quite difficult for me, though.

I'll evaluate how much mail there really is over the coming work-week,
and see if there's something I can do.

Best,
Alex


Re: Subject Rewrite Based on Score

2009-10-08 Thread MySQL Student
Hi,

  I actually would be doing that but the filter does not know how to
  handle int(), so I would have to build a filter for all possible number
  combinations, but if I could just get SA to do the basic math for me and
  write a header or subject I can filter off of that.

We do something similar here using a procmail/formail script which
calls a perl script to match on X-Spam-Status then rewrite the subject
to include the bayes score prepended to the subject. We then use a few
procmail rules to filter the mail based on the bayes score for
analysis.

Regards,
Alex


Re: Subject Rewrite Based on Score

2009-10-08 Thread MySQL Student
Hi,

 That sounds overly complicated and like a lot of wasted cycles. Calling
 a Perl script for each message? What you just described sounds a hell of
 lot like this light-weight SA configuration:

Yes, I should have mentioned that it is a copy of the mail that users
receive and only visible by a single account. It also only occurs once
every four hours as the mail is pulled from the spool.

Regards,
Alex


Re: Subject Rewrite Based on Score

2009-10-08 Thread MySQL Student
Hi,

 It still is spawning a Perl process per message. You can do away with
 that processing hog, if you use the add_header rule I mentioned before
 and have SA do it instead.

You may be right. I'll have to investigate doing this for this
specific user only. Thanks for the info.

Thanks,
Alex


Valid mail from blacklisted dynamic IPs

2009-10-08 Thread MySQL Student
Hi,

I have a set of users that are authorized to use the mail server via
pop-before-smtp, but SA catches the mail they send through the system
as spam because they are on blacklisted Verizon or Comcast IPs:

X-Spam-Status: Yes, hits=5.4 tag1=-300.0 tag2=5.0 kill=5.0
 use_bayes=1 tests=BAYES_50, BOTNET, FH_HOST_EQ_VERIZON_P, RCVD_IN_PBL,
 RCVD_IN_SORBS_DUL, RDNS_DYNAMIC, RELAYCOUNTRY_US, SPF_SOFTFAIL

I also don't understand how SPF_SOFTFAIL could happen when there
wasn't any SPF record to test to begin with.

One of the Comcast users:

X-Spam-Status: Yes, hits=6.4 tag1=-300.0 tag2=5.0 kill=5.0
 use_bayes=1 tests=BAYES_50, BOTNET, DYN_RDNS_SHORT_HELO_HTML, HTML_MESSAGE,
 RCVD_IN_PBL, RCVD_IN_SORBS_DUL, RDNS_DYNAMIC, RELAYCOUNTRY_US, SPF_SOFTFAIL,
 SUBJ_ALL_CAPS

We are working on better Bayes training, but sans that problem, what
is the right way to address this, through a rule that whitelists their
specific IP?

Another mail that I'm dealing with is one sent by Marriott that hit
SARE_HTML_URI_REFID, DCC_CHECK, and AE_DETAILS_WITH_MONEY, among being
whitelisted by JMF/HOSTKARMA. I don't know how it hit DCC when there
are details in there specific to the user, including account numbers,
user names, etc. How should I go about allowing this type of mail
without disrupting its ability to block mail that should be blocked
with these rules? I'm sure I can add a rule subtracting points if it
hits these and comes from Marriott, but I thought there might be
something that could address the more general problem rather than this
specific one from Marriott. Perhaps I'm making it too hard.

Thanks,
Alex


Re: Valid mail from blacklisted dynamic IPs

2009-10-08 Thread MySQL Student
Hi,

 Does your pop-before-smtp method cause your MTA to indicate they've been
 authed in the Received: header?

I don't believe so. There doesn't appear to be anything additional in
the header relating to pop-b4-smtp. I'm using postfix. Perhaps
off-topic, but ideas on how to do this, if you think it would be the
right approach?

 I also don't understand how SPF_SOFTFAIL could happen when there
 wasn't any SPF record to test to begin with.

 Are you sure? What was the envelope from domain for the message? (keep
 in mind, this checks the envelope from, not the from header..)

No, I'm not sure. I just don't see anything relating to SPF in the
message at all.

 Some of DCC's signatures are fuzzy, thus will match similar messages
 with minor differences. This is done to avoid spammers bypassing by

Yes, understood. The fuz1 and fuz2 max settings are 99,
which I assume is the max possible, set by the previous admin.

 As for dealing with it:
    whitelist Marriott at the SA level (as you suggest)
    whitelist Marriott at the dcc level
    remove or severely cut back the score of AE_DETAILS_WITH_MONEY, if
 you ever actually expect to get important email about traveling to the UAE.

I've whitelisted the Marriott address. I also actually removed the
rule entirely, and just relying on John's excellent lotsa and fillform
rules.

Thanks very much.
Best,
Alex


Re: OT bad news

2009-10-06 Thread MySQL Student
Hi,

 It's a shame that, living in Denver, I will be *just* out of range of
 hearing the screams as the mailspools fill with viruses, malware, and
 massive payloads of Spanish Prinsoner spams.

Awe, c'mon now. Yes, I agree SA is a better solution, but Microsoft
didn't get to be a multi-billion-dollar company solely because of its
marketing. Certainly a competent admin following some SANS guides can
secure an Exchange box to sufficiently avoid it getting hacked, and a
properly-installed version of Symantec will keep most spam away.

It /is/ possible, I suppose :-)

I'd bet that if he kept the FreeBSD box in place and just told his
boss he upgraded to Exchange, they'd never even know :-)

Regards,
Alex


Re: Uppercase E-mail in Latin America

2009-10-06 Thread MySQL Student
Hi,

 doesnt it appear to everyone else that this has the (slim to none) makings
 of a new urban legend?

I have to admit that when Warren posted this, I went to snopes to
check, and there was nothing there :-)

Regards,
Alex


Re: SpamAssassin Ruleset Generation

2009-10-06 Thread MySQL Student
Hi,

 Other than the sought rules, all the rules are manually generated? Is there
 any statistics on how frequently are new rules/regex adopted by
 spamassasssin? Who are the people who write them? Any details related to

Information on Justin Mason's SOUGHT rules is here:

http://taint.org/2007/08/15/004348a.html

Use sa-update to update your SA rules once or twice per day with the
new stuff. His ongoing development work is here:

http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jm/?sortby=date

HTH,
Alex


Re: .cn Oddity

2009-10-02 Thread MySQL Student
Hi All,

Regarding the .cn oddity, I added these to my rules, and of about 79k
messages today so far, I have the following:

uri LOC_URI_CN  m;^https?://[^/?]+\.cn\b;
uri T_CN_8_URL  /[\/.]+\w{8}\.cn(?:$|\/|\?)/i

LOC_URI_CN: 2926
T_CN_8_URL: 1634

HTH,
Alex


Re: Hostkarma white list

2009-09-29 Thread MySQL Student
Hi,

 For those of you getting spam from IPs/Hostnames on my hostkarma
 white list, if you could email me a list of false hits (IP or host name) I
 could probable clean out the bad entries in the white list pretty quick.

I'm not sure this is the best approach. I have a procmail recipe that
filters specifically the JMF_W and I go through it every day before
training the folder as ham. I'd say around a quarter of the messages
are spam.

How many entries on the whitelist? How were they added? I'd almost
rather start from scratch (or from a more proven list) with a
percentage known to be valid and build from there.

At the least, wouldn't it be best to move the default score closer to
zero on your wiki page for the time being?

Maybe another method for submitting FPs rather than emailing them to
you could be created?

Wouldn't the veracity of the list be better assured if you built the
list from a pile of known ham?

Mail originating from priorityoneemail.com [69.10.237.52] would be one
prime suspect for removal consideration.

On a somewhat related topic, how do people classify topica.com? That
is one for sure sends junk, but looks like people may actually request
it, heh.

Thanks,
Alex



















Re: Hostkarma Blacklist Climbing the Charts

2009-09-28 Thread MySQL Student
Hi,

 header RCVD_IN_JMF_W eval:check_rbl_sub('JMF-lastexternal', '127.0.0.1')
 describe RCVD_IN_JMF_W Sender listed in JMF-WHITE
 tflags RCVD_IN_JMF_W net nice
 score RCVD_IN_JMF_W -5

Hopefully my comment isn't out of place with the current discussion of
JMF/Hostkarma. I think this is not only a really bad default score,
but it should be reduced to -0.5 or perhaps not used at all.

I have a money/fraud email that hit RCVD_IN_JMF_W that passed through
these servers:

Received: from 41.220.75.3
Received: from webmail.stu.qmul.ac.uk (138.37.100.37) by mercury.stu.qmul.ac.uk
Received: from qmwmail2.stu.qmul.ac.uk ([138.37.100.210]
Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6])

It also hit these other rules:

X-Spam-Status: No, hits=1.3 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=AE_GBP, BAYES_50, LOTS_OF_MONEY, LOTTERY_PH_004470,
LOTTO_RELATED, MONEY_TO_NO_R, RCVD_IN_DNSWL_MED, RCVD_IN_JMF_W,
RELAYCOUNTRY_UK, SPF_FAIL, SPF_HELO_FAIL

Unless I'm really missing something, which server has JMF/Hostkarma
whitelisted that shouldn't be?

This happens time after time.

Thanks,
Alex














 header RCVD_IN_JMF_BL eval:check_rbl_sub('JMF-lastexternal', '127.0.0.2')
 describe RCVD_IN_JMF_BL Sender listed in JMF-BLACK
 tflags RCVD_IN_JMF_BL net
 score RCVD_IN_JMF_BL 3.0

 header RCVD_IN_JMF_BR eval:check_rbl_sub('JMF-lastexternal', '127.0.0.4')
 describe RCVD_IN_JMF_BR Sender listed in JMF-BROWN
 tflags RCVD_IN_JMF_BR net
 score RCVD_IN_JMF_BR 1.0
 ===8---

 You pick the names and then the world can use them. The JMF names are out
 there today.

 {^_^}    Joanne



Re: New money/fraud spam

2009-09-27 Thread MySQL Student
Okay, my bad, please ignore. Damn google auto-complete.

Alex

On Sun, Sep 27, 2009 at 6:46 PM, MySQL Student mysqlstud...@gmail.com wrote:
 Hi John,

 Another batch of money spam attached. Everything is the same as the last time.

 Thanks,
 Alex



Sought regex problem

2009-09-27 Thread MySQL Student
Hi,

I posted bug 6198 a few weeks ago, and there have been no comments or
fixes on it in two weeks, and I'm unsure what to do next. It's either
not a bug and I'm doing something wrong or it's not significant enough
to bother with the focus on v3.3.

Thought someone might have some ideas here? I'm using perl-5.6. Anyone
else using perl-5.6 with the sought rules?

[13204] dbg: config: read file /var/lib/spamassassin/3.002005/sought_rules_yerp_
org/20_sought.cf
[13204] warn: config: invalid regexp for rule __SEEK_D52BRW: / Don\'t want to
lose your potential of a lover\? Lucky you are, in 21th century all bed-related
male problems can be solved by the powerful remedy, the all-mighty blue caplet\!
This solution will give you the right support for 50\(\!\) hours\. Rock-like and
ready to go\. more\x{bb}/: / Don\'t want to lose your potential of a lover\?
Lucky you are, in 21th century all bed-related male problems can be solved by /:
Can't use \x{} without 'use utf8' declaration

Maybe it's a perl module that's incompatible?

Ideas greatly appreciated.
Thanks,
Alex


Re: Sought regex problem

2009-09-27 Thread MySQL Student
Hi,

 [13204] dbg: config: read
 file /var/lib/spamassassin/3.002005/sought_rules_yerp_
 org/20_sought.cf [13204] warn: config: invalid regexp for rule
 __SEEK_D52BRW:

  grep doesn't find   __SEEK_D52BRW in my copy of the rules.

This was from the sa-update when I submitted the bug report.

Thanks to all for the feedback and the update to the bugzilla. I'm in
the process of upgrading perl, but there are still a few applications
that depend on it.

Mark suggested in the bugzilla update that I change SpamAssassin to
add 'use utf8' into code generated from rules when it sees it is being
run with a pre-5.8 version of perl. How do I do this for the time
being?

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-22 Thread MySQL Student
Hi,

 Try using a local SA setup for stripping the headers. By local, I mean
 don't use your main production SA - run a separate copy with its own
 (cut down) configuration and all data base accesses and UBL calls etc
 turned off.

Much better idea, thanks. Thanks for the script, too.

Best,
Alex


Re: Re-running SA on an mbox

2009-09-21 Thread MySQL Student
Hi,

 Thank you all for your help. The mbox split suggestion is a good
 one. I'll follow that route and post my experience later.

 formail -s is the way to go.

I thought about that as a component of procmail. Sounds great.

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-21 Thread MySQL Student
 but this will invalidtate dkim headers if this headers is signed, are
 spamassassin aware of this problem ? (in general)

Are you saying there is a bug?

 mutt -f mbox

 in mutt save to another folder if missclassified

Yes, I use pine for that, but would like to eliminate as many of the
FNs as possible, particularly ones that I can't determine visually.

Thanks,
Dave


Re: Re-running SA on an mbox

2009-09-21 Thread MySQL Student
Hi,

 IIRC you previously mentioned using Pine. Just in case you're not aware
 the default format for Pine/Alpine is MBX, an extended version of
 MBOX. You can tell the difference because MBX mailboxes start with a
 dummy email that's hidden by the software.

It seems that if you save messages into a separate folder it does not
add the DUMMY information at the top. I believe this is why the system
was set up to use mbox and not mbx. Does this sound correct?

 I'd be very wary about allowing any tool to modify an MBX file unless
 you know it's safe. Where locking is an issue, Mark Crispin recommends
 that they only be accessed via the c-client library.

This isn't the actual spool file, but a copy in the home directory.

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-21 Thread MySQL Student
Hi,

It's certainly not a fast operation, but using the following will
split an mbox into individual messages:

export FILENO=0
mkdir msgs
formail -s sh -c 'cat - msgs/$FILENO'  mbox-name.mbox

I also created a loop that would strip all the SA headers from the messages:

for file in *; do echo Processing: $file; spamassassin -d  $file 
$file.txt; done

This worked for a few hundred of the messages, but then started to
fail on my production system with:

[22135] warn: bayes: cannot open bayes databases
/home/user/.spamassassin/bayes_* R/W: lock failed: File exists

How can I tell when another process is using the database and when it
is free for my script to use?

Is there a faster way to run spamassassin just to strip the SA headers?

Maybe there is a faster way, like passing the messages through the
running amavisd instead of having to restart spamassassin each time to
re-process each message?

Thanks,
Alex


Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

I have an mbox with about a 100 messages in it from a few days ago.
The mbox is a combination of spam and ham. What is the best way to run
SA through these messages again, so I can catch the ones that have
URLs in them that weren't on the blacklist at the time they were
received?

Must I break them all apart to do this, or can SA somehow parse the
whole mbox? If not, what program do you suggest I use to accomplish
this?

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

 Do you just want to re-scan the whole mbox and see what rules hit now
 for research reasons?

That's a good start, but I'd like to see if I can break out the ham to
train bayes.

 There's no way to (directly) get SA to modify email that's already in an
 mbox file. The mass-check and sa-learn tools can read them, but nothing
 in SA can write to that. However, there might be a utility out there to
 do this (although I'm not aware of any)..

Yeah, that's kind of what I thought. Maybe a program that can split
each message back into an individual file? Would procmail even help
here? Or even a simple shell script that looks for '^From ', redirects
it to a file, runs spamassassin -d on it, then re-runs SA on each
file? I could then concatenate each of them back together and pass it
through sa-learn.

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

 You probably want spamassassin --mbox. :)
 It won't modify the messages in-place, but you can do something like
 spamassassin --mbox infile  outfile.

My apologies if it wasn't clear, but these messages have already been
marked by SA. Some are ham, and the rest are FPs that I'd like to
re-run through SA, in hopes of it now properly detecting them as spam.

Thank you all for your help. The mbox split suggestion is a good
one. I'll follow that route and post my experience later.

Thanks again,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

 You probably want spamassassin --mbox. :)
 It won't modify the messages in-place, but you can do something like
 spamassassin --mbox infile  outfile.

 My apologies if it wasn't clear, but these messages have already been

Wait, my mistake. I read that too fast. Does that work, and rewrite
the X-Spam-Status header?

Guess I could find out for myself, but it just contradicts my
experience and info I've learned previously.

Thanks again,
Alex


URIBL_BLACK vs RCVD_IN_JMF_W

2009-09-18 Thread MySQL Student
Hi,

I have been going through about 15MB of email generated from a
procmail recipe searching for RCVD_IN_JMF_W, and you would not believe
how many also match URIBL_BLACK or URIBL_GREY. Call me naive, but are
there really that many providers that are unaware their clients are
sending spam? (okay, rhetorical question :-)

IOW, I guess this email is more of an informational note to those who
may not be aware, and perhaps for others to comment on whether they
even use it?

The winner for me was a Bank of America scam with the following two relays:

Received: from User (channelf.5460.net [61.137.93.80])
Received: from ortiz.unizar.es (ortiz.unizar.es [155.210.1.52])

No b-of-a relays, of course. This message also hit RAZOR2_CHECK and SPF_FAIL.

There's also a money scam that passed through nasa.gov, hit
RCVD_IN_JMF_W, and a few fraud rules:

Received: from ALTPHYEMBEVSP30.RES.AD.JPL ([128.149.137.84]) by
Received: from mail.jpl.nasa.gov (altvirehtstap02.jpl.nasa.gov [128.149.137.73])
Received: from mail.jpl.nasa.gov (sentrion2.jpl.nasa.gov [128.149.139.106])

X-Spam-Status: No, hits=1.1 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=AE_ADVICE_WITH_MONEY, AE_FRAUD_ADVICE, BAYES_50, LOTS_OF_MONEY,
 MILLION_USD, MONEY_TO_NO_R, RCVD_IN_DNSWL_MED, RCVD_IN_JMF_W, RELAYCOUNTRY_US

I have RCVD_IN_JMF_W set to 0.5 points. It was also listed in
RCVD_IN_DNSWL_MED? Running it a bit later, it scored as spam with the
RAZOR rules:

X-Spam-Report:
*  0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
* -0.5 RCVD_IN_JMF_W RBL: Sender listed in JMF-WHITE
*  [128.149.139.106 listed in hostkarma.junkemailfilter.com]
* -4.0 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/,
*  medium trust
*  [128.149.139.106 listed in list.dnswl.org]
*  0.0 RELAYCOUNTRY_US Relayed through United States
*  1.0 AE_FRAUD_ADVICE BODY: Someone offering free advice
*  1.8 MILLION_USD BODY: Talks about millions of dollars
*  2.1 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
*  above 50%
*  [cf:  56]
*  0.9 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
*  [cf:  56]
*  0.0 LOTS_OF_MONEY Huge... sums of money
*  2.0 AE_ADVICE_WITH_MONEY Has advice and mentions much money
*  1.0 MONEY_TO_NO_R Lots of money and bare, missing or undisclosed To
*  0.2 MONEY_INHERIT Lots of money from a dead guy
X-Spam-Relay-Country: US US US
X-Spam-Status: Yes, score=5.4 required=5.0 tests=AE_ADVICE_WITH_MONEY,
AE_FRAUD_ADVICE,LOTS_OF_MONEY,MILLION_USD,MONEY_INHERIT,MONEY_TO_NO_R,
RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK,
RCVD_IN_DNSWL_MED,RCVD_IN_JMF_W,RELAYCOUNTRY_US shortcircuit=no
autolearn=disabled version=3.2.5

Thanks,
Alex


Re: Problems with high spam

2009-09-18 Thread MySQL Student
Hi,

 also if using amavisd make its temp dir on ram speed up scanning and it
 considered safe, mta have it on disk for the backup :)

How about mounting /var with noatime? Does anyone do that? Do you
think it helps? What Linux filesystem is best suited for this? ext4?

Thanks,
Alex


Re: URL rule creation question

2009-09-12 Thread MySQL Student
 \s is the proper way to represent whitespace.

 lol, yes, I know that; I was actually trying to match 's' and the
 slash is the start of the pattern match.

 I wasn't referring to the beginning of the RE.

Yeah, I realized that just after I sent this, if anyone cares :-)

Thanks again,
Alex


Re: URL rule creation question

2009-09-11 Thread MySQL Student
Hi,

 The 'doubleheadedrover' domain currently shows up in Razor(E8),
 uribl_black, surbl_jp, and invaluement.

 But it wasn't in all of those when he first started posting about it.

Yes, that's correct. Thanks for your help. That's already caught a
few. I have another that I thought you could help with.

I'd like to create a rule that matches a specific letter and up to 5
spaces after it, repeated ten times. I'm thinking something like this:

/s\ {5}o\ {5}n\ {5}i\ {5}c\ {5}\ m\ {5}e\ {5}d\ {5}i\ {5}a/i

I'm still learning regex's, so hopefully this isn't too far off. The
opportunities for rules are coming faster than my ability to learn.

Thanks,
Alex


Re: JMF whitelist and RAZOR conflict

2009-09-11 Thread MySQL Student
Hi,

 I have several emails that are tagged with RCVD_IN_JMF_W,
 SPF_SOFTFAIL, and RAZOR2_CHECK such as this one:
 http://pastebin.com/m4a4d990e

 why accept SPF_SOFTFAIL ?

 cant this be solved ?

I don't understand. I'm still learning how the SPF rules work.
Shouldn't I be adding points for an SPF_FAIL? This indicates a spoof
attempt, no?

 are you recieving forwarded emails from spf domains ?

If I understand correctly, no. I have no relationship with any
external source and their SPF records.

 if so add the forward ip to trusted_networks (so spf will be disabled from
 this hosts)

Do you mean to avoid the processing overhead? IOW, don't bother
checking SPF records for trusted domains?

 Is the criteria for being listed on the JMF_W simply that it
 contains a domain that is whitelisted, despite whether it
 contains another URL that is blacklisted?

 this is spamassassin working, if there is a blacklisted domain add it to
 your uribl_skip_domain list

Ah, you mean if the domain is erroneously on the blacklist, right?

 Would I be advised to make the JMF_W score very low, or create a
 meta that doesn't really whitelist it unless it isn't also blacklisted?

 this is ip and not domains

On a somewhat related note, how does BOTNET differ from RDNS_NONE?
What is the logic behind the BOTNET rule? Is there some known list
that it's checking, or is it just likely to be a dynamic IP or
compromised host if it doesn't have a reverse DNS entry?

Thanks so much for the clarification, and confirmation about Gevalia/Kraft.

Thanks,
Alex


URL rule creation question

2009-09-10 Thread MySQL Student
Hi all,

I've seen this pattern in spam quite a bit lately:

href=http://doubleheaderover.com/jazert/html/?39.6d.3d.31.66.67.6b.79.77.63.77.63.65.6e.74.69.6e.6e.69
.61.6c.5f.68.31.33.33.2e.6f.39.39.41.4d.2e.30.30.45.33.39.2e.30.32.30.61.64.6b.37.61.76.61.67.63.31.66.
62.2e.6a.61.7a.65.72.74.2e.68.74.6d.6c3az8fO

Would it be reasonable to create a rule that looks for this two-char
then dot pattern, or is it reasonable that it might appear in a
legitimate email too frequently? If possible, how would you create a
rule to capture this?

Thanks,
Alex


JMF whitelist and RAZOR conflict

2009-09-10 Thread MySQL Student
Hi,

I have several emails that are tagged with RCVD_IN_JMF_W,
SPF_SOFTFAIL, and RAZOR2_CHECK such as this one:

http://pastebin.com/m4a4d990e

Is the criteria for being listed on the JMF_W simply that it contains
a domain that is whitelisted, despite whether it contains another URL
that is blacklisted?

Would I be advised to make the JMF_W score very low, or create a meta
that doesn't really whitelist it unless it isn't also blacklisted?

meta META_NOT_JMF_RAZOR(RCVD_IN_JMF_W  !RAZOR2_CHECK)

It also appears to spoof the kraftfoods.com mail server, correct? Is
there a possible rule to be created here?

Thanks,
Alex


Re: JMF whitelist and RAZOR conflict

2009-09-10 Thread MySQL Student
Hi,

 http://pastebin.com/m4a4d990e

 Is the criteria for being listed on the JMF_W simply that it contains
 a domain that is whitelisted, despite whether it contains another URL
 that is blacklisted?

 I'm not sure what you are saying here, it's not as if the people
 running the whitelist could lookup the IP address on razor.

I'm saying that it appears odd that it would be listed on both RAZOR
and JMF_W, unless the JMF_W found the kraftfoods.com URL and the RAZOR
rules found the bogus
http://ADSENSETREASUREONLINE.yolasite.com URL. Unless the yolasite.com
is a legitimate kraftfoods site?

 meta META_NOT_JMF_RAZOR    (RCVD_IN_JMF_W  !RAZOR2_CHECK)

 Why RAZOR2_CHECK? Why not other positive scoring rules? The trouble is
 that the whitelist rule is then pointless. Set it's score at a value
 that's commensurate with it's effectiveness on your email.

Does my question now make sense? I was looking at it from more of a
validation point of view for JMF_W, because of the apparent conflict
with RAZOR.

 It also appears to spoof the kraftfoods.com mail server, correct? Is
 there a possible rule to be created here?

 No, it was almost certainly sent through kraftfoods.com. It's based on
 an IP address recorded by your trusted network.

Maybe I should have used a better example. Can I ask you to look at this one?

http://pastebin.com/m7d61b26f

This uses IP 66.132.135.108 as its URL (xybersleuth.com), and unless
that's not a spammer's site, then there's something wrong. This email
includes JMF_W and RAZOR2_CF_RANGE_51_100 and URIBL_BLACK in the same
message, although it has a very low bayes score. Which is correct?

Thanks,
Alex


Shortcircuit info

2009-08-31 Thread MySQL Student
Hi all,

I'm trying to understand how shortcircuit works to ease some of the
load on the severs. First, does anyone have any recommended metas that
they use in their environment that might help?

Can I add shortcircuit to an existing rule, or does the rule have to
be designed to be used with shortcircuit? In other words, I have a
meta that combines spamcop with spamhaus:

metaMETA_HAUS_COP   (RCVD_IN_BL_SPAMCOP_NET  RCVD_IN_XBL)
describe META_HAUS_COP  Contains SPAMHAUS XBL and SPAMCOP
score   META_HAUS_COP   0 4.0 0 4.0
shortcircuit META_HAUS_COP  spam

In order for it to be actually shortcircuited, however, I have to make
the score 100, correct?

Thanks,
Alex


Re: Porn-portal spammers

2009-08-29 Thread MySQL Student
Hi,

 I am getting rather tired from messages spamming porn-portals. They typically
 originate from hotmail.com, and advertise a porn-portal based on
 google.com/groups, google.com/reader, groups.yahoo.com, pipes.yahoo.com,
 spaces.live.com, docs.google.com, sites.google.com and livejournal.com.

This was posted by Martin a week or so ago in response to a similar
question by me:

This should catch your set and more:

uri  LOC_YAHOO /^http:.{1,40}\.yahoo[.,]com/i
scoreLOC_YAHOO 0 1.5 0 1.5
describe LOC_YAHOO Contains *.yahoo.com uri

Or, if you want to be more specific, try this:

uri  LOC_YAHOO /^http:\/\/(groups|profile|personals)\.yahoo[.,]com/i
scoreLOC_YAHOO 0 1.5 0 1.5
describe LOC_YAHOO Contains yahoo.com groups/profile/personals uri

Does this help?

Best regards,
Alex


Re: 3.3.0 alpha 2 on production mail servers / clusers ???

2009-08-29 Thread MySQL Student
Hi,

 On Saturday August 29 2009 19:47:32 R-Elists wrote:
 have many, or any of you folks on the list migrated your production servers
 to the 3.3.0 alpha 2 or later release?

 We are certainly one of them (actually running CVS head,
 which is pretty close to alpha2). About 1000 users here.

Do we have an idea of a timeline for the next release and/or
production release currently?

How about dependencies? Will perl-5.8 work okay? What modules will
need to be updated? How about for use with amavis? Will I need to
upgrade that?

A list of the top five best new features would also be great! *salivates* :-)

I'm trying to anticipate what I can do ahead of time to get it into
place as soon as possible.

Thanks,
Alex


Google/Yahoo Spam

2009-08-27 Thread MySQL Student
Hi all,

I'm seeing an increase in Google Reader and yahoo
groups/personals/profile spam. Here's an example of the Google Reader
spam:

http://pastebin.com/m1021fc5f

Any ideas on how to catch this one? For the Yahoo spam (with links to
yahoo sites ending in '/1', I've created these:

uriLOC_YAHOO1 m{http://groups\.yahoo\.com\/}i
score  LOC_YAHOO1 0 1.5 0 1.5
describe   LOC_YAHOO1 Contains groups.yahoo.com uri

uriLOC_YAHOO2 m{http://profile\.yahoo\.com\/}i
score  LOC_YAHOO2 0 1.5 0 1.5
describe   LOC_YAHOO2 Raw body contains profile.yahoo

uriLOC_YAHOO3 m{http://personals\.yahoo\.com\/}i
score  LOC_YAHOO3 0 1.5 0 1.5
describe   LOC_YAHOO3 Raw body contains personals.yahoo

They're somewhat paired down because I'm not very good at pattern
matching, so thought someone could improve on this?

Thanks,
Alex


Converting spam to email message

2009-08-27 Thread MySQL Student
Hi all,

I thought I understood, but I'm still having trouble converting a
message in the quarantine back into a normal email message that I can
forward on to a recipient. Does anyone know how to do this?

Thanks so much.
Best regards,
Alex


Re: Converting spam to email message

2009-08-27 Thread MySQL Student
Hi,

 I thought I understood, but I'm still having trouble converting a
 message in the quarantine back into a normal email message that I can
 forward on to a recipient. Does anyone know how to do this?

 Maybe I missed something, but SpamAssassin doesn't have a quarantine.

 http://wiki.apache.org/spamassassin/SpamQuarantine

Yes, my apologies. I guess it would then be amavisd-new that's
managing the quarantine.

I didn't realize that amavisd manipulated the mail in that way.
Hopefully someone can still help.

Thanks,
Alex


Training spam as ham and forwarding

2009-08-26 Thread MySQL Student
Hi SA users,

I have a few messages found in the quarantine that I need to train as
ham because they were marked as spam incorrectly. To do this, I added
the following to the top of the file so it becomes a normal email:

 From DUMMY-LINE Thu Jan  1 00:00:00 1970

Is this correct? (without the leading spaces)

I can now accurately access and index it using pine, whereas before it
didn't acknowledge it as a normal email. I'd also now like to forward
it to the intended recipient as an attachment, but the recipient isn't
able to read it as a normal email, but instead as plain text. How can
I accomplish this?

Are there mail tools, like procmail or formail, I believe, that were
designed to automate this?

Does anyone request ham from their users to be trained by bayes, or
is autolearning typically the only way (or only real effective way) to do this?

Also, on another note, how can I have all email destined for a
particular user sent to them, including spam? This is what all_spam_to
is for, correct?

Thanks,
Alex


Re: lottery message scored hammy by bayes

2009-08-25 Thread MySQL Student
Hi,

 If you're using autolearning, what are your learning thresholds?

What do you recommend for thresholds? I'm considering using
autolearning, but very concerned about corrupting the database. I
think I would use something like +15 for spam.

There are FNs on occasion in the 2.x range with low bayes numbers (or
BAYES_50) that I wouldn't want to be tagged as ham. Should that be a
concern?

Even mail that has been whitelisted could also contain spam, so would
a ham threshold of like -100 work, or present the same problem?

Thanks,
Alex


Re: spam mail with flagged style images

2009-08-21 Thread MySQL Student
Hi,

 mimeheader AS_090508_CTYP_PNG Content-Type =~ /image\/png/
 mimeheader AS_090508_CTYP_JPG Content-Type =~ /image\/jpg/
 mimeheader AS_090508_CTYP_JPEG Content-Type =~ /image\/jpeg/

 All scored the same. Can be written as a single rule.

I've spent some time and tried to refine my rules based on your
advice, guenther. Can I ask you to check them over again and see if
this is any better, or at least more inclusive?

mimeheader LOC_CDIS_INLINE  Content-Disposition =~ /inline/
score  LOC_CDIS_INLINE  0.1
describe   LOC_CDIS_INLINE  Content-Disposition: inline

mimeheader LOC_CTYP_IMG  ((Content-Type =~ /image\/png/) ||
(Content-Type =~ /image\/jpg/) || (Content-Type =~ /image\/jpeg/) ||
(Content-Type =~ /^application\/octet-stream.\.rtf/))
score  LOC_CTYP_IMG 0.1
describe   LOC_CTYP_IMG  Content-Type: PNG-JPG-JPEG-RTF

meta   LOC_IMGSPAM  ((LOC_CDIS_INLINE  LOC_CTYP_IMG)
score  LOC_IMGSPAM  0.1
describe   LOC_IMGSPAM  Probably inline image

meta   LOC_BOTNET_IMG   ((BOTNET  LOC_IMGSPAM) || (BAYES_99 
LOC_IMGSPAM))
score  LOC_BOTNET_IMG   1.5
describe   LOC_BOTNET_IMG   Probably inline image spam

 Generally, no.  A spam advertising body part enhancers also has
 correctly spelled words. Training them doesn't poison Bayes either.
 And there usually are still useful tokens around.

That's great, thanks!

Thanks,
Alex


Re: spam mail with flagged style images

2009-08-21 Thread MySQL Student
Hi,

 mimeheader LOC_CTYP_IMG  ((Content-Type =~ /image\/png/) ||
 (Content-Type =~ /image\/jpg/) || (Content-Type =~ /image\/jpeg/) ||

I thought this passed through my --lint, but I only caught it the
second time. I was looking around for the (new) right way to do it,
and found this in 80_additional.cf:

mimeheader __ANY_IMAGE_ATTACH   Content-Type =~ /image\/(?:gif|jpeg|png)/

Now I know. Does the rest look like it will work as expected?

Thanks,
Alex


Re: spam mail with flagged style images

2009-08-20 Thread MySQL Student
Hi,

 Text added to e-mail is a bogus one, never repeated, same as the old styled
 spam mail with attached images. The OCR doesn't detect nothing, I understand
 because of flagged effect. Also, image file name changes, if it have.

A few of these have slipped through on my systems, but for the most
part, these rules have worked here:

mimeheader AS_090505_CDIS_INLINE  Content-Disposition =~ /inline/
score  AS_090505_CDIS_INLINE  0.5
describe   AS_090505_CDIS_INLINE  Rule by AS: Content-Disposition: inline

mimeheader AS_090508_CTYP_PNG Content-Type =~ /image\/png/
score  AS_090508_CTYP_PNG 0.5
describe   AS_090508_CTYP_PNG Rule by AS: Content-Type: PNG

mimeheader AS_090508_CTYP_JPG Content-Type =~ /image\/jpg/
score  AS_090508_CTYP_JPG 0.5
describe   AS_090508_CTYP_JPG Rule by AS: Content-Type: JPG

mimeheader AS_090508_CTYP_JPEG Content-Type =~ /image\/jpeg/
score  AS_090508_CTYP_JPEG 0.5
describe   AS_090508_CTYP_JPEG Rule by AS: Content-Type: JPEG

meta   AS_090508_PNGSPAM  (AS_090505_CDIS_INLINE  AS_090508_CTYP_PNG)
score  AS_090508_PNGSPAM  0.5
describe   AS_090508_PNGSPAM  Rule by AS: Probably an Inline PNG spam

meta   AS_090508_JPGSPAM  (AS_090505_CDIS_INLINE  AS_090508_CTYP_JPG)
score  AS_090508_JPGSPAM  0.5
describe   AS_090508_JPGSPAM  Rule by AS: Probably an Inline JPEG spam

meta   AS_090508_JPEGSPAM  (AS_090505_CDIS_INLINE 
AS_090508_CTYP_JPEG)
score  AS_090508_JPEGSPAM  0.5
describe   AS_090508_JPEGSPAM  Rule by AS: Probably an Inline JPEG spam

meta   LOCAL_BOTNET_JPG(BOTNET  AS_090508_JPGSPAM)
score  LOCAL_BOTNET_JPG 1.5
describe   LOCAL_BOTNET_JPG Rule by AS: Probably an Inline JPEG spam

meta   LOCAL_BOTNET_JPEG(BOTNET  AS_090508_JPEGSPAM)
score  LOCAL_BOTNET_JPEG1.5
describe   LOCAL_BOTNET_JPEGRule by AS: Probably an Inline JPEG spam

The LOCAL_* are mine, adapted to others I found some time ago. I'd be
interested in people's input on these. Can they be simplified? Do you
agree with the scoring?

How about bayes poisoning? The messages also all have random text,
mostly spelled correctly, but nonsensical. If they are trained, could
it adversely affect my bayes db?

Thanks,
Alex


Junkmailfilter rules

2009-08-20 Thread MySQL Student
Hi,

I've been using the junkmailfilter rules for a few days now, and it's
doing quite well. It occurred to me that I might be able to use the
RCVD_IN_JMF_W rule filter whitelisted domain mail, and use that to
train bayes ham.

Would this work? There of course would be mail from
constantcontact.com, mailing list mail, newsletters, etc, that all
contain a lot of HTML and other components that could equally be seen
in spam.

How do people typically train bayes ham? I can't rely on my users not
to mix up spam and ham, surely corrupting the database.

I did find this in one of the emails, passed through delivery.net:

X-Spam-Status: No, hits=4.9 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=BAYES_50, BOTNET, DKIM_SIGNED, DKIM_VERIFIED, HTML_MESSAGE,
 RAZOR2_CF_RANGE_51_100, RAZOR2_CF_RANGE_E4_51_100, RAZOR2_CHECK,
 RCVD_IN_JMF_W, RELAYCOUNTRY_US, SPF_HELO_PASS, SPF_PASS

It was a citibank credit card email. How could it be in RAZOR and also
whitelisted, and BOTNET? Certainly there were no domains in there that
it was relayed through that were part of a botnet.

Ideas greatly appreciated.
Thanks,
Alex


Re: sa-update: stuck at 795855?

2009-08-19 Thread MySQL Student
Hi,

 The problem is that the spammers test with the SA rulesets as soon
 as they are released, which is why the rulesets become ineffective.

I'm not sure I agree with that. If this were the case, I would have a
lot less spam with scores of 50 or more, which obviously aren't even
trying to do something as easy as pass it through SA first.

Also, couldn't we then draw conclusions from this that, since vendors
like Symantec have rules which never are seen by spammers, that their
rules are better?

Incidentally, are there technologies that vendors like Symantec,
Proofpoint, Cisco, Google, etc, use that we don't have or don't have
access to?

Thanks,
Alex


Re: Assistence needed with spamassasin under RedHat 5.2

2009-08-19 Thread MySQL Student
Hi,

 spamassasin.  I have a test message which is genuine.  Running this through
 spamassasin with -t (test) mode as described below gives the output below:

 Running : spamassassin -t /tmp/rose2 gives at the bottom the following
 (edited for privacy) report.

Try adding some debugging output, and first look for something obviously wrong:

# spamassassin -D -t /tmp/rose2 21 | less

Go line-by-line looking for something that stands out as obviously wrong.

Consider obfuscating your message, replacing your domain with
example.com, for instance, and uploading it to pastebin.com. Then
post a link here so we can all view the message for further ideas.

Regards,
Alex


Re: gpgkey failures with sa-update

2009-08-19 Thread MySQL Student
Hi,

 list.  No errors reported then, and I've now forgotten the url. www.yerp.org
 now gets me a webmail login screen, so obviously that wasn't it.  Toss that
 url to me and I'll replay it again.

You should be able to search through your browser history, no?

With Firefox v3.5, you can also just type yerp in the location bar,
and it will do a more aggressive search through your previous URLs for
anything containing those letters.

Regards,
Alex


Re: Counting RAZOR2 hits

2009-08-17 Thread MySQL Student
Hi,

 You can also set your min_cf in your razor config files, which will
 affect when the RAZOR2_CHECK rule fires. This does work in SpamAssassin,
 as I have over-ridden the min_cf on my own system, and have done so for
 years.

Thanks to everyone for their great ideas thus far. I'm looking forward
to working through it to learn more.

I'm seeing a lot of FNs that include various RAZOR rules, but still
don't have enough points to be tipped. Are there meta rules that
people have created and can share that might help?

How about combining it with BOTNET? The ones that have BAYES_99 and
most of the SURBLS and RAZOR* are all properly tagged already, but
many only have BAYES_50.

Some have only RAZOR2_CHECK and contain an inline image.

X-Spam-Status: No, hits=4.1 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=BAYES_50, HTML_MESSAGE, RAZOR2_CF_RANGE_51_100,
 RAZOR2_CF_RANGE_E8_51_100, RAZOR2_CHECK, RDNS_NONE, RELAYCOUNTRY_US,
 SPF_HELO_PASS, SPF_PASS

score RAZOR2_CHECK 0 0.9 0 0.9
score RAZOR2_CF_RANGE_51_100 0 0.8 0 0.8
score RAZOR2_CF_RANGE_E4_51_100 0 1.8 0 1.8
score RAZOR2_CF_RANGE_E8_51_100 0 1.5 0 1.5

I see now that RAZOR2_RANGE_E8 should also be at least 1.8, which I've
now changed.

Does everyone do their own mass-checks these days? How do you go about
analyzing the FNs to figure out why they aren't caught and adjust the
scores? Of course they need to be looked at individually for
additional patterns, but how are the scores best personalized of the
rules that are triggered?

Thanks,
Alex


Re: Barracuda RBL in first place

2009-08-16 Thread MySQL Student
Hi,

 So perhaps instead of adding another RBL, maybe some admins need to
 consider adding in some HELO checking / rejection.

Can you explain a bit more here? What are you checking for, that the
host is valid?

Thanks,
Alex


Re: Barracuda RBL in first place

2009-08-15 Thread MySQL Student
Hi,

                            Unknown user 32.00% (32.00%)            87427696
                              Greylisted 24.88% (16.92%)            46225401
                               Throttled 11.03% (5.64%)             15399444
                     Relay access denied 0.01%  (0.00%)                 7034
                   Bogus DNS (Broadcast) 0.01%  (0.00%)                11692
              Bogus DNS (RFC 1918 space) 0.07%  (0.03%)                82135
                         Spoofed Address 0.26%  (0.12%)               319551
                      Unclassified Event 0.77%  (0.35%)               949388
                 Temporary Local Problem 0.01%  (0.00%)                 8165
             Require FQDN sender address 0.04%  (0.02%)                51022
          Require FQDN for HELO hostname 8.97%  (4.02%)             10988455

[...]

Can I ask how you produced those stats? They look very helpful.

Thanks,
Alex


Re: Barracuda RBL in first place

2009-08-15 Thread MySQL Student
Hi,

 What log script do you good people use to generate the list above ? Is it
 a home brew or one we can download so we can compare our own hits ?

 http://www.rulesemporium.com/programs/sa-stats.txt

Any chance someone knows where there is a compatible one that parses
amavisd instead of spamd? I've tried, but guess I don't know enough
perl to get it right.

Any chance someone has a bit of time to hack on it on this lazy
Saturday afternoon? :-)

Thanks,
Alex


Counting RAZOR2 hits

2009-08-15 Thread MySQL Student
Hi,

I thought grep -c RAZOR2_CHECK through my mail logs would give me a
good approximation of the number of times RAZOR2 was consulted, but
that doesn't seem to be the case. There are some mails that don't have
it listed in the tests= section.

I've also tried the razor-* commands, and they don't appear to be able
to help here either. What am I missing?

Does RAZOR2_CHECK mean that it was found in the RAZOR2 db, or that it
merely consulted the db?

Thanks,
Alex


Elusive spam

2009-08-12 Thread MySQL Student
Hi,

I'm having trouble catching a particular type of spam, and hoped
someone had some time to take a look:

http://pastebin.com/d57336542

It doesn't match RAZOR2, or any of the URI lists, and it's only
BAYES_50. I have a pretty well-established BAYES db, so I'm surprised
it's only BAYES_50. What can I do to block spam like this in the
future?

Thanks,
Alex


Re: Elusive spam

2009-08-12 Thread MySQL Student
Hi,

 Maybe this will sound dumb but wouldn't it be perfectly
 safe to blacklist example.com after all, that isn't a
 domain your ever going to get mail from.

 I could be wrong, but I'm guessing the example.com is the OP's munging.

Yes, that's correct. My apologies.

Best,
Alex


Re: Elusive spam

2009-08-12 Thread MySQL Student
Hi,

 Are we to make guesses on what else might be munged?
 Is just example.com munged or the 172.0.0.1 also munged?

Just the domain was munged. Thanks for the info. I should have been
able to figure that out.

Thanks,
Alex


Re: Elusive spam

2009-08-12 Thread MySQL Student
Hi,

 it hits spamhaus, and spamcop, what more do you want ?

 meta haus_cop (spamhaus  spamcop)
 score haus_cop 5

X-Spam-Status: No, hits=4.8 tagged_above=-300.0 required=5.0 use_bayes=1
 tests=BAYES_50, DATE_IN_PAST_03_06, RCVD_IN_BL_SPAMCOP_NET,
 RCVD_IN_SORBS_WEB, RCVD_IN_XBL, RELAYCOUNTRY_US, URI_HEX

50_scores.cf:score RCVD_IN_BL_SPAMCOP_NET 0 2.188 0 1.960 # n=0 n=2
50_scores.cf:score RCVD_IN_XBL 0 2.896 0 3.033 # n=0 n=2
70_relay_country.cf:score   RELAYCOUNTRY_US 0.1
50_scores.cf:score RCVD_IN_SORBS_WEB 0 1.117 0 0.619 # n=0 n=2
50_scores.cf:score BAYES_50 0 0 0.001 0.001
50_scores.cf:score URI_HEX 1.777 1.316 1.395 0.368
50_scores.cf:score DATE_IN_PAST_03_06 2.299 1.394 1.306 0.044

Something doesn't seem right. Am I adding them wrong? It sure seems to
equal more than 5.0. Is it possible the rules are being scored
differently in another location?

The meta rule is a good one. I'll create that now.

Thanks,
Alex


Re: Elusive spam

2009-08-12 Thread MySQL Student
Hi,

 50_scores.cf:score RCVD_IN_BL_SPAMCOP_NET 0 2.188 0 1.960 # n=0 n=2
 50_scores.cf:score RCVD_IN_XBL 0 2.896 0 3.033 # n=0 n=2
 70_relay_country.cf:score           RELAYCOUNTRY_US 0.1
 50_scores.cf:score RCVD_IN_SORBS_WEB 0 1.117 0 0.619 # n=0 n=2
 50_scores.cf:score BAYES_50 0 0 0.001 0.001
 50_scores.cf:score URI_HEX 1.777 1.316 1.395 0.368
 50_scores.cf:score DATE_IN_PAST_03_06 2.299 1.394 1.306 0.044

 Something doesn't seem right. Am I adding them wrong? It sure seems to
 equal more than 5.0. Is it possible the rules are being scored
 differently in another location?

It does look like the XBL scores may have been modified in another
config file by a previous admin, ugh. Thanks, now I know.

Thanks,
Alex


Post trips pastebin spam filter

2009-08-12 Thread MySQL Student
Hi,

I have another spam message that is very elusive, and thought someone
might be able to take a look. I tried to post it to pastebin, and its
spam filter apparently catches it, and prevents me from posting. It's
definitely in the header.

Is there something else I can do to post it, or does someone know how
their spam filter works? I tried even obfuscating the spam URLs, but
it still catches it.

The spam has BAYES_99, and is also DKIM signed and verified, and
passes SPF, and despite having Congratulations!, Wal-Mart and
several URLs in the body, it's not caught.

Thanks,
Alex


Scores, razor, and other questions

2009-08-07 Thread MySQL Student
Hi,

After another day of hacking, I have a handful of general questions
that I hoped you could help me to answer.

- How can I find the score of a particular rule, without having to use
grep? I'm concerned that I might find it at some score, only for it to
be redefined somewhere else that I didn't catch. Something I can do
from the command-line?

- How do I find out what servers razor is using? What is the current
license now that it's hosted on sf, or are the query servers not also
running there? It doesn't list any restrictions on the web site.

- The large majority of the spam that I receive these days is a result
of a URL not being listed in one of the SBLs. I'm using SURBL, URIBL,
and spamcop. For example, I caught guadelumbouis.com several hours
ago, and it's still not listed in any of the SBLs. Am I doing
something wrong or am I missing an SBL? Has anyone else's spam with
URLs increased a lot lately?

Thanks,
Alex


RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

I'm trying to configure RelayCountry. I have it installed, and SA recognizes it:

# spamassassin --lint -D 21|grep -i country
[4278] dbg: diag: module installed: IP::Country::Fast, version 604.001
[4278] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayCountry from @INC
[4278] dbg: plugin:
Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9648) implements
'extract_metadata', priority 0
[4278] dbg: plugin:
Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9648) implements
'parsed_metadata', priority 0

I've loaded the plugin, and add_header according to the wiki page:

add_header all Relay-Country _RELAYCOUNTRY_
loadplugin Mail::SpamAssassin::Plugin::RelayCountry

I can create rules for each country I'd like to identify, and that
successfully adds it to the header:

header  RELAYCOUNTRY_RU X-Relay-Countries =~ /RU/
describeRELAYCOUNTRY_RU Relayed through Russian Federation
score   RELAYCOUNTRY_RU 2.0

I was hoping to also have the X-Spam-Countries header added, but that
doesn't seem to work. I'm using v3.2.5, so it has the
RelayCountries.pm patch to add that support. What am I missing?

Somewhat of a basic question, but once I do manage to get that header
working, I know I can parse that and make decisions based on it. Are
there any pre-written perl routines or utilities that can make that
information useful?

Also, I believe I read it adds bayes metadata to the email. Is that
just through the additional headers or is it supposed to add something
else?

Thanks,
Alex


Re: RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

 I don't know if it makes a difference, but I call it Relay-Countries to
 match the name of the pseudo-header used in the tests

 add_header all Relay-Countries          _RELAYCOUNTRY_

It doesn't appear to make a difference. I must be doing something else
wrong. Using spamassassin --lint -D 21 | less shows the
X-Relay-Countries header, but it's null:

# spamassassin --lint -D 21 | egrep -i 'relay|country|countries'

[23760] dbg: diag: module installed: IP::Country::Fast, version 604.001
[23760] dbg: config: read file /etc/mail/spamassassin/70_relay_country.cf
[23760] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayCountry from @INC
[23760] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayEval from @INC
[23760] dbg: Botnet: adding (\b|\d)relay(\b|\d) to botnet_serverwords
[23760] dbg: Botnet: adding (\b|\d)relay(\b|\d) to botnet_serverwords
[23760] dbg: metadata: X-Spam-Relays-Trusted:
[23760] dbg: metadata: X-Spam-Relays-Untrusted:
[23760] dbg: metadata: X-Spam-Relays-Internal:
[23760] dbg: metadata: X-Spam-Relays-External:
[23760] dbg: plugin:
Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9698) implements
'extract_metadata', priority 0
[23760] dbg: metadata: X-Relay-Countries:
[23760] dbg: plugin:
Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x8fb9698) implements
'parsed_metadata', priority 0
[23760] dbg: rules: ran eval rule NO_RELAYS == got hit (1)
[23760] dbg: Botnet: no trusted relays
[23760] dbg: check:
tests=MISSING_DATE,MISSING_HEADERS,MISSING_SUBJECT,NO_RECEIVED,NO_RELAYS,RELAYCOUNTRY_LOW

I've added your rules in 70_relay_country.cf, and they trigger in the
tests=, but the header isn't added.

I've added the add_header in init.pre, above the loadplugin line as
well as adding it in local.cf when it didn't work in init.pre.

I've also checked email that has actually been tagged by these rules,
and not just from a -D run, and it's not there either.

Thanks again,
Alex


Anti-Phishing and Spear-Phishing Version 2

2009-08-06 Thread MySQL Student
Hi,

Has anyone tried the phishing rules generated by  Julian Field and
developed by Google? It looks really neat:

http://www.jules.fm/Logbook/files/anti-phishing-v2.html

It's basically a list of 3.5k email addresses found in email thought
to be spam. Looks to be developed by Google, so it's safe?

Thanks,
Alex


Re: RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

 [23760] dbg: metadata: X-Relay-Countries:

 The --lint test is *NOT* valid for this. --lint is *ONLY* to verify your
 config files are parseable.

Yes, thanks, I should have known that, and I think I did. I mentioned
in the previous post that I tried it with a real message, and even
viewed a number already in quarantine, and the same result.

I found this message on nabble:

http://www.nabble.com/Question-about-RelayCountry-td18309349.html#a18339974

Same problem, back in'08, with no resolution. I even downgraded to the
IP::Fast released in Jan 09, and no difference.

Could this be a problem with one of the modules, or is this most
likely a configuration issue?

What I don't understand is that it knows which country its relayed
through, because it prints the rules in the tests= section:

X-Spam-Status: Yes, hits=21.8 tag1=-300.0 tag2=4.9 kill=4.9
 use_bayes=1 tests=BAYES_50, BODY_ENHANCEMENT, BOTNET,
FH_HELO_EQ_D_D_D_D, RDNS_NONE,  RELAYCOUNTRY_UK, SARE_ADULT2,
SARE_RECV_IP_FROMIP3, URIBL_AB_SURBL, URIBL_BLACK, []

Curiously, why doesn't it print them each in a column with
description, instead of all together?

Thanks,
Alex


Re: RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

 This is also why the plugin works and you do get the per-country rule
 hits, but don't get the SA Relay-Countries header.

Yes, you are correct. Thanks for the lead and the explanation. Here's
a thread that talks about how to add the header for amavisd:

http://www.mail-archive.com/amavis-u...@lists.sourceforge.net/msg12416.html

I'm not sure it's really necessary after all, though, because the
rules work without it, and it still doesn't print the header in
quarantined mail.

 char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
 main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
 (c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

How did you get line noise from your modem to look so much like perl code? :-)

Thanks,
Alex


Re: RelayCountry Config

2009-08-06 Thread MySQL Student
Hi,

 I find ordinary header and meta rules are all I need:

 http://pastebin.com/f5e5232d1

Among those rules you have:

meta RELAYCOUNTRY_MED   ! RELAYCOUNTRY_HIGH  (
__RELAYCOUNTRY_AF || __RELAYCOUNTRY_AS || __RELAYCOUNTRY_EU_S ||
__RELAYCOUNTRY_OC_S || __RELAYCOUNTRY_AM_S )

It's probably hard to read, but doesn't this exclude the US?
RELAYCOUNTRY_AM_S are all the Americas except US and CA. If I
understand correctly, this says NOT RELAYCOUNTRY_HIGH and all
countries except US and CA, which means that RELAYCOUNTRY_MED would
trigger on all US and CA relays.

Thanks,
Alex


Upgrading bayes DB

2009-08-04 Thread MySQL Student
Hi,

I'm still working on my bayes training project, but also trying to
upgrade the bayes DB due to upgrading perl and all the associated
modules. I started with this output from sa-learn --dump magic

0.000  0  3  0  non-token data: bayes db version
0.000  0   1786  0  non-token data: nspam
0.000  0   3698  0  non-token data: nham
0.000  0 198349  0  non-token data: ntokens
0.000  0  929232460  0  non-token data: oldest atime
0.000  0 1249369370  0  non-token data: newest atime
0.000  0 1249369387  0  non-token data: last journal sync atime
0.000  0 1249342872  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire
reduction count

After the upgrade (sa-learn --sync -D), it zeroed the nham and nspam.
How could this happen? What could I have
done wrong? This is after the upgrade:

0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0 1249438016  0  non-token data: oldest atime
0.000  0 1249438016  0  non-token data: newest atime
0.000  0 1249438016  0  non-token data: last journal sync atime
0.000  0 1249438016  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire
reduction count

It seemed to indicate that it was upgrading from db version 0 to db
version 2, then db version 3, although the first sa-learn output shows
that it was already version 3.

Thanks,
Alex


Bayes training

2009-08-03 Thread MySQL Student
Hi,

We have accumulated quite a large list of whitelisted users, primarily
because they were previously tagged incorrectly. I've extracted a copy
of all whitelisted mail into a separate mbox.

Certainly there is some spam in there as well, but assuming I only
learn the ham, would it make sense to train bayes using the emails
from this folder? It's all business-related, but I'm concerned that it
may have things in the email that caused it to be tagged in the first
place, like excessive HTML, sent from a host with no reverse DNS, etc.
-- all the reasons for it being whitelisted in the first place.

Looking at the logs before the addresses were added to the whitelist,
I see quite a few that were BAYES_99, probably because they resemble
mailing lists, such as those from networkworld, for example. IOW, I
wouldn't want to whitelist an email from networkworld.com, but one of
the company's partners could send the company an email that had many
of those characteristics.

Someone may also send them a one-line email with a small GIF as an
attachment, such as their corporate logo in their signature. This
would be a valid email, but also very much resembles the
characteristics of a typical spam.

This is all being done to hopefully train bayes to better recognize
corporate email, and hopefully cut down on the number of whitelisted
senders that must be added in the future (or, corporate email that
gets tagged then must be whitelisted).

Ideas greatly appreciated.
Thanks,
Alex


Upgrading perl modules for SA

2009-07-30 Thread MySQL Student
Hi,

I recently upgraded perl from 5.6.0 to perl-5.10.0, along with all the
modules necessary for sa-3.2.5 and amavisd-new (an old version still).
I'm now having a problem that I really don't understand:

Jul 30 14:24:30 bigship amavis[1757]: (01757-175) TROUBLE in
check_mail: decoding2-get-file-types FAILED: 'file' utility
(/usr/bin/file) failed, status=1 (256 ) at /usr/sbin/amavisd line
4019.

Jul 30 14:24:30 bigship amavis[1757]: (01757-175) PRESERVING EVIDENCE
in /var/amavis/amavis-20090730T142430-01757

The amavisd children are running as a regular user. When I su to that
user and run /usr/bin/file with the files listed above, it
successfully returns the correct type of file. The lines in amavisd
surrounding 4019 are:

$file ne '' or die Unix utility file(1) not available, but is needed;
for my $part (@$partslist) {
my($filename) = $tempdir/parts/$part;
my($filetype) = '';
my($proc_fh) = run_command(undef, undef, $file, $filename);
while( defined($_ = $proc_fh-getline) ) { $filetype .= $_ }
my($err); $proc_fh-close or $err=$!; my($ret) = retcode($?);
 = 4019
$ret==0 or die 'file' utility ($file) failed, status=$ret ($? $err);

chomp($filetype); my($taint) = substr($filetype,0,0);
# remove file name
$filetype = $1.$taint  if $filetype=~/^.+?:[\t ](.*)$(?!\n)/s;
section_time('get-file-type');
local($_) = $filetype;  my($ty);

# try to classify some common types and give them short type name
# _last_ match wins!

Running spamassassin --lint returns no errors or warnings. Amavis
complains that I'm missing a few modules, like SPF, DKIM, and
IO::Socket::SSL, but I don't think they're related, and I guess they
weren't on there before when it was working fine.

Thanks,
Alex


Re: Upgrading perl modules for SA

2009-07-30 Thread MySQL Student
Hi,

 check_mail: decoding2-get-file-types FAILED: 'file' utility
 (/usr/bin/file) failed, status=1 (256 ) at /usr/sbin/amavisd line

 How's this a SA question?

Yes, my apologies. I don't know enough about amavis yet, and thought
it may be related to all the modules I upgraded, and not amavis
itself. I've since reverted my changes back to perl-5.6.0, and going
to subscribe to that list too.

I also upgraded Berkeley DB to db4 and have left db3, db2, and db1 on
the system too. However, now I'm having a problem with bayes:

[10496] dbg: bayes: tie-ing to DB file R/O /home/sscan/.spamassassin/bayes_toks
[10496] dbg: bayes: tie-ing to DB file R/O /home/sscan/.spamassassin/bayes_seen
[10496] dbg: bayes: found bayes db version 0
[10496] warn: bayes: bayes db version 0 is not able to be used,
aborting! at /usr/lib/perl5/site_perl/5.6.0/Mail/SpamAssassin/BayesStore/DBM.pm
line 196.

I guess I don't understand the logic, because around 196 is the
following, which appears to say that if $self-_check_db_version
doesn't equal zero, then fail, but we know it equals version zero from
what is stated above...

  $self-{db_version} = ($self-get_storage_variables())[6];
  dbg(bayes: found bayes db version .$self-{db_version});

  # If the DB version is one we don't understand, abort!
  if ($self-_check_db_version() != 0) {
warn(bayes: bayes db version .$self-{db_version}. is not able
to be used, aborting!);
$self-untie_db();
return 0;
  }

Thanks,
Alex


Re: Low Scoring Lotto Spam

2009-07-27 Thread MySQL Student
Hi,

        *  3.0 RCVD_IN_UCEPROTECT2 RBL: Received via a relay in
        *      dnsbl-2.uceprotect.net
        *      [81.202.69.68 listed in dnsbl-2.uceprotect.net]
        *  2.0 RCVD_IN_UCEPROTECT3 RBL: Received via a relay in
        *      dnsbl-3.uceprotect.net
        *      [81.202.69.68 listed in dnsbl-3.uceprotect.net]

How successful have you been with the UCEPROTECT lists? Seems like a
nice project. How come more people aren't using it?

IOW, you seemed to be the only one of the four or five people that
posted their output from this lotto spam. Why such a disparity in the
rules that people use?

Thanks,
Alex


Re: whitelist_from questions

2009-07-27 Thread MySQL Student
Hi,

I'm looking an email that appears to be one of the users from the
whitelist, but instead was from:

   From probesqt...@segunitb1.freeserve.co.uk  Mon Jul 27 19:49:19 2009

Why can't a comparison be made between the From: info and the actual
sender? Is this because of virtual domains and/or users?

Thanks,
Alex


Re: Lotto/Money email address spam

2009-07-23 Thread MySQL Student
Hi,

 Please don't paste examples to this list.

 Please post them to pastebin (or a similar service) and then include the
 link.
..

Yes, understood. FWIW, I know enough to not post an entire message
with headers to the list -- I'm sure half the time it would be
filtered anyway. This time it was just a snippet, but in the future
I'll post even those online, too.

Thanks,
Alex


Re: Lotto/Money email address spam

2009-07-23 Thread MySQL Student
Hi,

 sa-update lint checks the rules in a sandbox, and does not update the
 local channel, if there are any issues. Moreover, do NOT copy these
 updates to your site config dir -- but keep it in the update dir where
 sa-update puts them [1]. SA knows how to use them instead of the
 install-time default conf.

Okay, great. That is what I have now done. I actually have multiple
mail servers, none of which have direct access to the Internet other
than inbound SMTP, so I have sa-update running on another box, which
creates a tarball, which is then scp'd to the mail servers and
extracted.

For me, this now means the sa-update channels are in
/var/lib/spamassassin/3.0005/ and my local site-config is
/etc/mail/spamassassin, where local.cf and init.pre reside.

I also spent much of the day reading docs. I've worked with Linux now
for many years, and have been involved with SA, just not to the level
that I'm involved now.

 It's a rather bizarre picture I'm sensing here. From your recent posts I
 understand you are running a mail server for a large organization. Yet
 there is this cannonade with rather basic questions...

guenther, I knew you were a smart guy :-)

Yes, there is a bigger picture; hopefully I get some cred for trying
to tackle this on my own (with the help of others more experienced).

Anyway, I'm trying to use sa-update to install the SOUGHT rules, and
linting them shows this:

[17021] warn: config: invalid regexp for rule __SEEK_AY2NNY: /This
place is so exclusive, how did you get an invite\x{e2}\x{80}\x{a6} /:
/This place is so exclusive, how did you get an
invite\x{e2}\x{80}\x{a6} /: Can't use \x{} without 'use utf8'
declaration

I'm using perl-5.6.0; is that the cause?

Thanks again,
Alex


Re: whitelist_from questions

2009-07-23 Thread MySQL Student
Hi,

 Firstly, before you convert all these to whitelist_from_rcvd, perhaps you
 ought to ask yourself whether you really need 1000 entries on your
 whitelist.

I'm surprised you were the first to make that very comment, so thanks.

 Does mail from these addresses actually get miscategorised as
 spam, or would SA get it right without the whitelist?

Mail was being tagged as spam, and the organization became concerned
that others would be tagged, so it seemed anytime there was a
high-profile external business contact that they couldn't risk being
tagged, they had it added to the whitelist.

The list used to be much larger until we spent quite a while (months
and months) going through it with them to prune it.

I don't doubt that if we removed a substantial amount of them that SA
would do what's right, but there doesn't seem to be any scientific way
to do that successfully.

 Secondly, don't forget about whitelist_from_spf. If a domain has an SPF
 record, this is a better solution than whitelist_from_rcvd as it avoids the
 need for *you* to work out which are the outgoing servers.

Is there a way to script that for the 1000 or so entries, to see which
have SPF records?

 Lastly, if you do use whitelist_from_rcvd, remember that there may be
 multiple outgoing servers for a given domain, and worse they may change over
 time.

Yeah, I thought of that too, so it doesn't sound like that's going to
work well here.

Thanks,
Alex


Eliminating unnecessary rules

2009-07-22 Thread MySQL Student
Hi,

I have created a routine where I can enter a string into a text file
and it gets converted into a set of rules that form a cf file. They
are all of the form LOCAL_RULE_N, where N is a random 6-digit number.
Two points are added if the rule is triggered. There are now about
3800 of these rules, dating back chronologically about a year or so.

I've learned a lot over the past year, and I now think some of these
patterns may be catching valid mail, so I'd like to figure out how
best to prune at least the ones that are no longer triggered or are
triggered but don't cause the email to become spam. IOW, the message
would be spam regardless of whether the rule fired.

What is the best way to do this? An awk script on mail.log over the
past few weeks? How can I wildcard the script with so many rules, and
when they have random numbers at the end?

I'm still surprised how many are hitting for things like Acai Berry
or PO Box 1845 | Ft. Worth | TX, for example.

Thanks for any ideas.
Alex


Re: Spam troubleshooting

2009-07-22 Thread MySQL Student
 How effective are razor/pyzor and SPF/DKIM?

 very effective, razor/pyzor altogether with DCC.

 SPF also helps much, although it should be implemented at SMTP level and
 refuse all messages that cause (hard) fail.

 While DKIM is currently in SA, the only place it currently applies is
 whitelisting, since it has scores of +/-0.001. Different scores were
 mentioned here, but not incorporated into SA scores yet.

 I've always been a bit hesitant
 to use any of those.

 Why?

Because how often do spammers have DNS entries with valid SPF or DKIM
information? How often do spammers use compromised hosts with valid
SPF or DKIM information?

Will they help with emails that only contain a random URL and a line
or two of text, like:

ma...@myhost.com: Get your Nursing Degree here
http://spamsite.com/

Or would that be DCC? Often times these types of emails get through,
apparently before the URL is listed in spamcop, SURBL, or URIBL_BLACK?

Can I also ask where the best place to start with to implement razor
and/or pyzor in SA3.2 on Linux with postfix?

Thanks,
Alex


  1   2   >