Re: Multiple test failures

2024-04-04 Thread Loren Wilton
I haven't had a chance yet to read this thread carefully, but spamd when 
run as root in tests will, at least in some cases, set itself to run as 
user "nobody". If you do that in a subdirectory of your non-nobody user's 
HOME, the usual permission configuration will not provide read access to 
nobody and the test will fail.


Would it be worth adding some sort of test for this kind of thing that could 
make a reasonably explicit "don't run install as root" or "incorrect 
directory permissions" or some such, to make it more obvious what is going 
wrong? I seem to remember at least one or two very similar postings here in 
the last year or so, so it is something that does happen and confuses 
people.




Re: Beginner Setting up Spam Assassin

2023-12-30 Thread Loren Wilton
SpamAssassin cannot block or eliminate spam. It does not have the facilities to 
do that. SA can only score potential spam. 

Whatever method you used to glue SA into your mail path needs to parse the 
score SA assigned in the returned mail, and do whatever routing it thinks is 
appropriate. 

We do not know what glue you are using to put SA into your mail path, so it is 
hard to give suggestions on how to set that unknown software up. With more 
details of your setup we may be able to help.

We can suggest rules to assign a score to mail if it comes from a particular 
account. But something other than SA will then have to deal with that score and 
do the routing.


  - Original Message - 
  From: FalconChristopher 
  To: Michael Grant ; users@spamassassin.apache.org 
  Sent: Saturday, December 30, 2023 2:48 AM
  Subject: Re: Beginner Setting up Spam Assassin


  Hi, can I not ask how to set up Spam Assassin in this mailing group it is a 
group for Spam Assassin.




  On 12/30/2023 4:30 AM, Michael Grant wrote:

Can you ban this user in whatever your equivalent of the access file is so 
instead of putting the messages into a spam folder, you reject messages from 
that address at delivery time (SMTP)?





On 30 December 2023 04:08:17 CET, FalconChristopher 
 wrote:
  Anyone know how I can check and setup SpamAssassin so that I can 
eliminate some spam from coming in from a email account ? On 12/28/2023 2:24 
AM, Matus UHLAR - fantomas wrote: > On 27.12.23 16:53, Fal Sangu verification: 
ⓘ No issues found, please report it if otherwise  

  Request analyst action Verified by Sangu 
  Anyone know how I can check and setup SpamAssassin so that I can 
  eliminate some spam from coming in from a email account ? 


  On 12/28/2023 2:24 AM, Matus UHLAR - fantomas wrote: 
  > On 27.12.23 16:53, FalconChristopher wrote: 
  >> Hi, I want to setup Spam Assassin so that any email that Spam 
  >> Assassin flags as spam 
  > 
  > this is spamassassin's job 
  > 
  >> gets placed into a folder for a specific SMTP or IMAP email account. 
  > 
  > this is not spamassassin's job. 
  > It's job of mail delivery agent - procmail, maildrop, sieve 
  > 
  >> Then if Spam Assassin flags emails that are not spam I can tell it 
  >> which of those emails to not place into the spam folder for the 
  >> specific email client. Until it gradually learns which emails are 
  >> spam and which are not. 
  > 
  > dovecot (imap/pop3 server) has plugins that support training of 
  > spam/ham, if you move the mail from/to spam folder. 
  > 
  > https://doc.dovecot.org/configuration_manual/spam_reporting/ 
  > 
  >> I've done a little research and I have access with my distribution to 
  >> a mail directory as well as the local.cf file for which 
  >> configurations are for Spam Assassin but I don't know how to setup 
  >> what I mentioned above ? 
  > 



Re: My apologies

2023-08-02 Thread Loren Wilton

I've blocked him on my mail server, as well.


Reindl now and then says something useful, but as you have noticed his 
people skills are somewhere in the negative 200 score level. I don't know 
that I'd block him, but you do need to take anything he says witha few 
horselicks of salt.




Re: Sudden surge in spam appearing to come from my email address

2023-07-16 Thread Loren Wilton
> header __FROM_THOMAS_1 From =~ //i 

You can simplify this. The parenthesized grouping was only necessary when there 
was more than one possible string, in my case .com and .net. Since you only 
have .com you can remove the (:? and ) and make the regex a little more 
efficient:

> header __FROM_THOMAS_1 From =~ //i 


Re: Sudden surge in spam appearing to come from my email address

2023-07-16 Thread Loren Wilton
> Am I correct? Sorry if I'm being dense. I'm just a sysadmin, not a developer, 
> so I'm not super clear on how macros and expansions work in perl.

You have the concepts right. I'd try the rules you posted and see if they seem 
to be producing correct results. You can run a spam thru SA with the -t switch 
and see which rules hit, and hopefully the NOT_FROM_ rule will hit. Send 
yourself a test mail and see that it doesn't hit. If that all works it is time 
to add the rules for the family. If it doesn't work, look at how the From 
header is formatted in the mail you sent to yourself.

Loren


Re: Sudden surge in spam appearing to come from my email address

2023-07-14 Thread Loren Wilton
I am suddenly getting hammered by a BUNCH of spam that appears to be from 
me. It scores low, and even though I keep feeding it to Bayes, it's still 
not hitting the threshold to be marked as spam.


When I check the headers, it's coming from multiple random email servers, 
but many appear to originate from hotmail/outlook.com. So from 
outlook.com, through some unsecured email server, then to my server.


SA can't block this trash by itself, but if something post the SA invocation 
can look at the headers you might be able to block it. You can certainly 
mark it as spam.

For instance:

#
# Ok, catch 'from me' when it isn't

header __FROM_ME_1 From =~ //i
header __FROM_ME_2 From =~ /\"First Last\" /
header __FROM_ME_3 From =~ /First Last /
meta NOT_FROM_ME __FROM_ME_1 && !(__FROM_ME_2 || __FROM_ME_3)
score NOT_FROM_ME 10
describe NOT_FROM_ME Spammer faking the mail from me!

Mind the backslash on the quotes and at sign. Depending on versions of 
things these are necessary, and don't hurt if they are not necessary.




Re: Best practice for adding headers?

2023-07-09 Thread Loren Wilton

I've patched spamass milter to let any previously added "X-Spam"
headers untouched


Its generally considered bad practice to pass thru X-Spam headers from an 
unkonwn source.
Like most anything else in an email header, a spammer could inject his own 
headers, probably populated with items designed to generate a negative 
score.




Re: Help with rule

2023-06-05 Thread Loren Wilton
> meta FROM_CLIENT_TEST from FROM_CLIENT_EMAIL && FROM_CLIENT_IP

Is that a typo when you were making this mail, or is it actually how the line 
is coded? There is an extra "from" there.

Even if you fix that, you won't get the results you expect. Both 
FROM_CLIENT_EMAIL and  FROM_CLIENT_IP will score as 1 point each if they hit, 
so your final adjusted score will be +1, not -1.

You can fix that in several ways:

header FROM_CLIENT_EMAIL From =~ /client@client\.com/i

scoreFROM_CLIENT_EMAIL 0.01

header FROM_CLIENT_IP Received =~ /from 138\.31\230\.222/

scoreFROM_CLIENT_IP  0.01



Or



header FROM_CLIENT_EMAIL From =~ /client@client\.com/i

header FROM_CLIENT_IP Received =~ /from 138\.31\230\.222/

meta FROM_CLIENT_TEST FROM_CLIENT_EMAIL && FROM_CLIENT_IP

score FROM_CLIENT_TEST -3.0



Or the probably best way once you have the tests debugged and you know they 
both hit correctly:

 

header __FROM_CLIENT_EMAIL From =~ /client@client\.com/i

header __FROM_CLIENT_IP Received =~ /from 138\.31\230\.222/

meta FROM_CLIENT_TEST __FROM_CLIENT_EMAIL && __FROM_CLIENT_IP

score FROM_CLIENT_TEST -1.0



The double underscore on the front of the rule will keep it from contributing a 
score of it's own, and it will not show in the list of hit rules. Thus you will 
only see the result of the meta.



Loren






Re: authres missing when ran from spamass-milter

2023-06-01 Thread Loren Wilton

This is not an area I know anything about, so I may be completely wrong.
That said, I seem to remember a conversation very like this some years back.
If I remember correctly, someone found some switch that could be set to get 
spamass-milter to add the Received header before calling the other milters.
Even if there isn't a switch, maybe it would only take a few lines of code 
change in spamass-milter to put out the Received header earlier.




Re: comparing sender domain against recipient domain

2023-05-12 Thread Loren Wilton

But I was more interested if SA already has something like that?


It does not.


Weren't there a whole set of "FUZZY" rules once? I'm pretty sure that they 
looked for words in in the subject and maybe body of the email that had 
exactly this sort of obfuscation. I don't think they were applied to domain 
names, and certinaly not matching two fields in different headers. But if 
the code for the fuzzy rules is still around, it possibly could be adapted 
for this use.




How can I detect a text/plain base64 email message with no other text parts?

2023-04-15 Thread Loren Wilton
I get a lot of spams, and a major characterisitc is they only have 
text/plain that is base-64 encoded.
Since I live in an area where base-64 encoding is basically never necessary, 
almost all base-64 encoded text parts are major spam signs.


Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: base64

Thanks,

   Loren



Re: BAYES scores

2023-02-28 Thread Loren Wilton

From: "Bill Cole" 

It is my understanding that an automated rescoring job was run quite some 
time ago (before I was on the PMC) to generate the Bayes scores, which 
determined that to be the best supplemental score to give to the greater 
certainty.


I was around in those days. My memory isn't the greatest anymore, but what I 
recall was that they did automatic rescoring, and then manually tweaked a 
few of the values, basically to make them look pretty by rounding off long 
fractions. BAYES_999 may have been scored almost completely manually, I 
can't quite recall.


   Loren



Re: Strange findings debugging bayes results

2023-02-20 Thread Loren Wilton

From: "Reindl Harald" 
in other words a system for morons - morons which will drag mails to spam 
instead click on "unsubscribe"


per-user bayes don't work well, never


Well Harald, you are certainly welcome to your opinion. It would be nicer if 
you had kept it yourself though.
The system works just fine with the userbase it has. It probably wouldn't 
work for AOL or *.online.




Re: Strange findings debugging bayes results

2023-02-20 Thread Loren Wilton
This is a home system with only a few users. All users have "Spam" and "Ham" 
folders showing up in their email program of choice, and they just drag 
messages they do or don't like into the appropriate folders. There are "Oldham" 
and "Oldspam" mboxes, and the new spam and ham (respectively) get merged into 
these folders after learning, and removed from the current Spam and Ham folders.
  - Original Message - 
  From: Michael Grant 
  To: users@spamassassin.apache.org ; Loren Wilton ; hg user 
  Sent: Monday, February 20, 2023 12:47 PM
  Subject: Re: Strange findings debugging bayes results


  On 20 February 2023 12:28:00 CET, Loren Wilton  wrote:
  >
  > A cron job that will harvest Spam and Ham mboxes and feed them to sa-learn 
once a day, then archive the learned messages. Per-user bayes and learning. 
Mail is hand-moved into the spam and ham learning folders, and for my  personal 
account, I do this rarely, generally only when a message is mis-categorized. 
Although messages being mis-categorized as spam is often the result of a lot of 
quite aggressive local rules I have rather than a Bayes mis-classification.

  When you "harvest" ham from mboxes, what do you consider ham?

  You also, additionally, have a Ham folder for your users then? Interesting. 
Did you manage to train your users to use it easily? Does it grow unbounded or 
are old messages removed from it?  If so, how to know they can be deleted like 
from the Spam folder.

  It's an interesting idea, just wondering about the details.  Getting my users 
to train spamassassim has always been impossible for me.

Re: Strange findings debugging bayes results

2023-02-20 Thread Loren Wilton
> Can you please give me some details on your bayes setup? 
> Headers exclusion, bayes_token_sources, how do you "sa-learn" messages...

Standard options on Bayes. No autolearn. A cron job that will harvest Spam and 
Ham mboxes and feed them to sa-learn once a day, then archive the learned 
messages. Per-user bayes and learning. Mail is hand-moved into the spam and ham 
learning folders, and for my  personal account, I do this rarely, generally 
only when a message is mis-categorized. Although messages being mis-categorized 
as spam is often the result of a lot of quite aggressive local rules I have 
rather than a Bayes mis-classification.


Re: Strange findings debugging bayes results

2023-02-19 Thread Loren Wilton
> The real question is: has bayes still its use case in 2023 ? Is it still used 
> with important scores or just to flag messages for a review?

It works fine for me here.


Re: BAYES_00 BODY. Negative score?

2023-02-17 Thread Loren Wilton

They receive wildly different BAYES scores.
* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*  [score: 0.0002]
*  2.2 BAYES_20 BODY: Bayes spam probability is 5 to 20%
*  [score: 0.0881]


This looks like you have per-user Bayes databases, and the messaage type has 
been trained differently in each.


Also, it looks like there are per-user rules, since BAYES_50 has a normal 
score of 0.2, and there is no reason BAYES_20 (indicating much less spammy) 
should have a score of 2.2.




Re: Seeing big (>1MB) spam

2023-02-14 Thread Loren Wilton

I started seeing some spam today in the 1-1.5 MB range.


It's been over a year now, but for a while I was getting a huge number of 
spams that were either 1143 KB or 3831 KB.
The 3831 KB variant used the same obfuscation payload as the 1143 KB spams, 
they just put it in twice in a row.


   Loren



Re: BAYES_00 BODY. Negative score?

2023-02-13 Thread Loren Wilton
Have some annoying SPAM that consistently shows a negative score on BAYES. 
Is the default scoring or influenced by BAYES in some way?


*-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*  [score: 0.]


The score is reasonable for guaranteed ham, which is what your Bayes thinks 
this spam email is

Of course the score isn't reasonable for spam, but Bayes thinks it is ham.

In addition to being cautious of autolearn as Benny descriped, yes, you need 
to retrain your Bayes, because it is very clearly confused on this point.


   Loren



Re: New rule wanted

2023-02-07 Thread Loren Wilton
I believe 3MB is above the default scan size for SA, so likely it won't even 
look at the file.

Loren
  - Original Message - 
  From: Rupert Gallagher 
  To: users@spamassassin.apache.org 
  Sent: Tuesday, February 07, 2023 2:26 AM
  Subject: Re: New rule wanted


  Note: Both client and server are not Windows. The attached file type is a 
generic "data" on unix. On a Windows client the file runs as executable. A SA 
rule should merely detect that the file type is a generic "data" file.
   Original Message 
  On Feb 7, 2023, 11:15, Rupert Gallagher < r...@protonmail.com> wrote:

I received a spam with score -1. Well written, looks legit commercial, 
asking for a quotation, with details in the attachment, a 3MB file with unknown 
extension ".one".

The file turns out to be a Windows Trojan:


https://www.virustotal.com/gui/file/f4d587f60f2d34add9f77fcbd8c3c0df3ca51cfaecd9de85c45d25647eaac40b

Both SA and ClamAV passed it as legit.

We should have a SA rule that says: "attached file with unknown data type". 



Re: Rule Help - not sure what is wrong with my syntax

2023-01-14 Thread Loren Wilton
> header TO_SPECIFIC_DOMAIN To:addr =~ /\@(test\.com|test\.net)$/

That for efficiency really should use a non-capturing grouping:

header TO_SPECIFIC_DOMAIN To:addr =~ /\@(?:test\.com|test\.net)$/

Note the "?:" after the left parend.

Loren


Re: Rule Help - not sure what is wrong with my syntax

2023-01-11 Thread Loren Wilton
Why not do a simple rule rather than inventing some Perl code?

header TO_SPECIFIC_EMAIL To:addr ~= 
'(?:\bus...@example.com|\bus...@example.com|\bus...@example.com)'
describe TO_SPECIFIC_EMAIL Mail to a specific email address
score TO_SPECIFIC_EMAIL -2

header TO_SPECIFIC_DOMAIN To:addr '(?:'\@example1\.com | \@example2\.com | 
\@example3\.com)'
describe TO_SPECIFIC_DOMAIN Mail to specific email domain
score TO_SPECIFIC_DOMAIN -2

or possibly

header TO_SPECIFIC_DOMAIN To:addr '\@(?:example1\.com | example2\.com | 
example3\.com)$'


Loren
  - Original Message - 
  From: Joey J 
  To: users@spamassassin.apache.org 
  Sent: Wednesday, January 11, 2023 3:39 PM
  Subject: Rule Help - not sure what is wrong with my syntax


  Hello All,


  I created this rule to check for email addresses matching a list to get added 
some negative value.
  I also tried it with just domains so it would be more efficient, but I can't 
seem to get them to run.
  Any suggestions?


  header TO_SPECIFIC_EMAIL eval:check_to_specific_email()
  describe TO_SPECIFIC_EMAIL Mail to a specific email address


  score TO_SPECIFIC_EMAIL -2


  sub check_to_specific_email {
  my ($self) = @_;
  my $to = lc($self->get('To:addr'));
  my $list_of_address = 
qr/us...@example.com|us...@example.com|us...@example.com/;
  if ($to =~ $list_of_address) {
  return 1;
  }
  return 0;
  }






  
  This version was to simply check for the domain matches, but can't seem to 
get it to work
  


  header TO_SPECIFIC_DOMAIN eval:check_to_specific_domain()
  describe TO_SPECIFIC_DOMAIN Mail to specific email domain


  score TO_SPECIFIC_DOMAIN -2


  sub check_to_specific_domain {
  my ($self) = @_;
  my $to = lc($self->get('To:addr'));
  if ($to =~ /\@example1\.com$|\@example2\.com$|\@example3\.com$/) {
  return 1;
  }
  return 0;
  }












  -- 

  Thanks!
  Joey



Re: local rule exclude all domains except "my list of approved"

2023-01-05 Thread Loren Wilton

You can simplify your rule code a little if you want:


header __LOCAL_FROM_BE  From =~ /.\.beauty/i
meta LOCAL_BE (__LOCAL_FROM_BE)
score  LOCAL_BE 2
describe LOCAL_BE from beauty domain


   to

header LOCAL_BE  From =~ /.\.beauty/i
score  LOCAL_BE 2
describe LOCAL_BE from beauty domain

The meta isn't really doing anything there, since it only has a single 
clause.
Metas are good when you want to combine the results of several matches with 
boolean logic.


You might also want to add a \b to the rule:

header LOCAL_BE  From =~ /.\.beauty\b/i

Without that the rule will match ".beauty", but also ".beautyrest".

Another thing you might want to consider is using "From:addr" rather than 
just "From". As it is, it will match ".beauty" both in the address and in 
the person's name description. So it would match:


   From: "janice.beautyfull" 

Maybe you want that, in wihich a bare "From" is fine.



Re: SA build from cpan fails under certain conditions

2022-12-21 Thread Loren Wilton
If this is on 4.0, perhaps a bug should be opened.
  - Original Message - 
  From: Shawn Iverson 
  To: SA Mailing list 
  Sent: Wednesday, December 21, 2022 10:05 AM
  Subject: SA build from cpan fails under certain conditions


  Hello SA Users,


  Just posting this in case anyone else runs into similar trouble...


  sudo cpan Mail::Spamassassin seems to only build properly on recent flavors 
of rhel under very specific conditions, notably:


  You are not root
  The cpan configuration is set to build specifically using local lib or sudo. 
sudo seems to work if you want to install SA for all users instead of in the 
home directory. The key is that the building/testing is happening in the user's 
home directory.


Re: Whitelist or add negative values for score

2022-12-20 Thread Loren Wilton
Personally I'd look at why BIGNUM_EMAILS_MANY is hitting and see if there is 
something the sender could do to avoid it. I'm pretty sure I've never seen that 
rule hit in any of my spam, so it must be something a bit unique.

Loren


Re: Problems matching the last word in multi-OR Regex

2022-12-15 Thread Loren Wilton
>  body__ANIMALS/cat|mouse|bird|dog/i

There is a possible problem with your rule. It probably isn't related to what 
you are seeing, but could be a problem for you anyway.

There is no word boundry in the regex, so 'cat' will match catamaran, 'mouse' 
will match mousehouse, 'bird' will match birddog, and so will 'dog'.

You can solve this by adding word boundries:

body__ANIMALS/\b(:?cat|mouse|bird|dog)\b/i

or

body__ANIMALS/\bcat\b|\bmouse\b|\bbird\b|\bdog\b/i




Re: spam subject marking

2022-11-15 Thread Loren Wilton
So the alternative is adding a header and move it to the spam folder 
automatically on the basis of the header?


Currently I just want to 'warn' users that the message is possible spam, 
they can decide to move such emails automatically to a spam folder by 
enabling a sieve rule.
What would be an alternative method to keep such functionality without 
altering the subject?


If SA sees the message and classifies it as spam, it normally adds (from an 
example)

X-Spam-Flag: YES
X-Spam-Level: 
X-Spam-Status: Yes, score=8.2 required=5.0 
tests=BAYES_50=0.8,DKIM_SIGNED=0.1,


It should be trivial to look for the "X-Spam-Flag: YES" line. 



Re: PDS_DBL_URL_TNB_RUNON

2022-11-13 Thread Loren Wilton
Pretty obviously a spam, I'm surprized that it didn't get a lot of "fake order" 
type of points.

Here is the (or at least one) double URL that it caught:

;" href=3D"ht= 
tps://nam02.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fsojaprote= 
in.rs


How do I check for a jpeg attachment?

2022-10-03 Thread Loren Wilton
I'm getting a bunch of spams from fake gmail accounts that consist of one 
short line of text and a 2 MB jpg file.

The subject and body text are pretty much random beyond that.

How do I check for the following?

--e345f305ea2680cd
Content-Type: image/jpeg; name="MMM.jpg"
Content-Disposition: attachment; filename="MMM.jpg"
Content-Transfer-Encoding: base64
Content-ID: 
X-Attachment-Id: f_l8t6clr50

I want to match on /^Content-Type: image\/jpeg;/ but I can't figure out how 
to do that. rawbody doesn't seem to work.


Thanks 



Re: Mail with image marked as spam

2022-09-25 Thread Loren Wilton
It sure seems to me like people are just using email to share pictures 
(licenses, l

egal docs, as well as pictures of the kids.)


Are these messages that are being sent by individuals from their phones?
Or is there some program that is sending these?
I can see it being too much work to type a subject if you are just texting a 
snapshot to someone else on another phone, but if it is a program, perhaps 
it could be trained to put a "Here is your photo!" subject in the mail and 
eliminate the problem.




Re: Matching on missing To field?

2022-07-21 Thread Loren Wilton
> The problem I'm having is that my To header rules aren't matching because 
> there is no To header, 
> and I'm otherwise unsure what to match on. The only occurrence of the 
> recipient in the entire email 
> is in that Received header.
> 

> It does match on "ALL", but I think I need to be more specific than that, to 
> avoid matching on "From:" 
> or Return-Path or EnvelopeFrom./

If you want to match on text in Received headers only, then just write a rule 
to check that header type:

header __TO_FRED_JOHNSON  To ~= /\bfred\.johnson@foo\.com\b/
header __RCVD_FRED_JOHNSONReceived ~= /\bfred\.johnson@foo\.com\b/
metaTO_FRED_JOHNSON   __TO_FRED_JOHNSON || __RCVD_FRED_JOHNSON
metaNOT_TO_ME   !TO_FRED_JOHNSON

You could do that with ALL, but this way is probably more efficient, and will 
be a lot less confusing regex.

Loren


Re: Matching on missing To field?

2022-07-20 Thread Loren Wilton


header __HDRS_MISSP ALL:raw =~ /^(?:Subject|From|To|Reply-To):\S/ism


That rule just says: look at all the raw header data and match if there's 
none

of Subject, From, To, Reply-To entries.
IE a really malformed message.


Hum. As I read it, that is "headers misspelled" (not "headers missing") and 
it is checking for any of the listed words at the start of a line, followed 
by a colon, and NOT followed by a space.


   Loren



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Loren Wilton

Minicomputers-Exhume: sides
Malthus-Films: 88976dea
Parasitic-Homogeneity: db5da28ba3e69a
Capitalizations-Grievously: oilers


It looks like the pattern is
   /[A-Z][a-z]{1,20}-[A-Z][a-z]{1.20}\:\s{1,10}[\w\d]{3,20}/
or something close to that.
Obviously it can mutate, but generally these are made by a tool, and until a 
new version of the tool comes along, they will be stable.


Try someting like
   header  LW_BOGUS_HEADERS ALL =~ 
/[A-Z][a-z]{1,20}-[A-Z][a-z]{1.20}\:\s{1,10}[\w\d]{3,20}\n/is 



Re: Another evil number

2022-05-03 Thread Loren Wilton
Fascinating thread I just stumbled on. Yes, in early parts of the phone 
system, the letters were geographic and referenced the street for where 
the central office was located switching those calls. For example, in 
Arlington VA, my grandfathers number was 533-9389 which was referred to as 
JE3-9389 and the CO was on Jefferson St. I'm pretty sure this fell apart 
rapidly as the system grew.


Whether it referenced the CO or not was a regional thing at best. For 
instance my home number when growing up was YU-71314, where YU was Yukon. 
For the first year or two that we had a phone it was only YU-1314, the 7 
came along later when it became possible to direct dial more than the single 
CO you were attached to. There was absolutely nothing within several 
thousand miles that was named Yukon, Alaska, or anything else cold. The CO 
was on Foothill Blvd, which was a two-lane undivided street.


The main reason fo naming toll areas was memorability. It wasn't easy to 
remember a 7 digit number, but a prefix followed by 3, 4, or 5 digits 
(depending on the era and where you lived) was much easier to remember, or 
at least so Ma Bell thought at the time.


Named toll codes stayed around until the mid to late 1960s. What finished 
them off was the introduction of DDD - Direct Distance Dialing, or the area 
code system we are all familiar with.


   Loren



Re: Linting of local.cf

2022-04-14 Thread Loren Wilton

Is there a tool I can use to do a manual lint of the local.cf file ?


At command prompt: 
   spamassassin --lint



   Loren



Re: T_SCC_BODY_TEXT_LINE

2022-04-03 Thread Loren Wilton
What is the purpose of the rule named T_SCC_BODY_TEXT_LINE? On my servers, 
it hits nearly every spam and ham email.


Rules beginning with T_ are test rules, and should have a very small score. 
So someone is testing some concept there. I don't seem to have this rule, so 
I can't say any more about whhat it does.


   Loren



Re: rules for a sneaky SPEAR-VIRUS spam that gets past bayes

2022-03-03 Thread Loren Wilton
Just off the top of my head:

rawbodyONEDRIVE_DOWNLOADm'https://onedrive\.live\.com/download[?]cid='
score ONEDRIVE_DOWNLOAD0.5
describeONEDRIVE_DOWNLOADDownload link to a file on Onedrive

Personally I'd be inclined to put an i on the end of that.

body FILE_PWD_INFO/\b(?:Fil lösenord|File 
password):\s[A-Z]{2}\d{4}\b/
scoreFILE_PWD_INFO3
describe  FILE_PWD_INFOEmail has a password to an archive file

meta PWD_ONEDRIVE_DLOADONEDRIVE_DOWNLOAD && FILE_PWD_INFO
scorePWD_ONEDRIVE_DLOAD4
describe   PWD_ONEDRIVE_DLOADEmail contains download for passworded 
Onedrive file

Loren


Re: OT - Hotmail/Outlook.com marking most of our email as Junk

2022-02-19 Thread Loren Wilton

Cian is rumored to have said:

Anne, I am incredibly grateful for the offer.  I sent my emails to the
tester and to the support email.  Hopefully, they come up with
something actionable.


If you get a useful result it might be nice to summarize it to the list.

   Loren


Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread Loren Wilton

Are you talking about the use of m'' as the regex delimiter?


Yes.

It will probably work just fine for the foreseeable future, as long as the 
input validation of rules files is lenient.


I think you may have a very hard time removing the m matching 
delimiters from SA. I suspect there are at least hundreds of rules like that 
in the release database. I have about a hundred local rules of my own that 
use that.


Any time I have more than one backslash in a pattern, I use an alternate 
delimiter (usually single quote) so that I don't have to escape all the 
backslashes in the rule body. I'm not a fan of obfuscated rule bodies where 
it is impossible to tell what it is intended to match. My experience is that 
any time you have to write  or \\ multiple times in a rule body, you 
are almost guaranteed to get the number of backslahses wrong, and the rule 
won't work. But of course it may work in some cases (like the one you used 
to test it) while not working in general.


I don't have time in my life to deal with that sort of thing. It caused me 
enough grief when I started writing rules 20 years ago, which is why I 
started using m'.


BTW, that particular rule dates from RulesEmporium days, which was what, 
2005 or so?


   Loren



Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread Loren Wilton
No, I added that after observing multiple spams with random garbage after 
the closing HTML tag in the HTML body part. Presumably it was an attempt 
at Bayes poison, checksum avoidance, or some other filter evasion 
technique.


I'll tighten it up.


FWIW, here is the rule I use. It obviously could be better, but I haven't 
noticed that it misfires.


full __GOODEHTML1 m''i

full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime 
ending boundary


meta LW_BADEHTML1 (__GOODEHTML1 && !__GOODEHTML2)

describe LW_BADEHTML1 Bad ending - something after 

score LW_BADEHTML1 1





Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-07 Thread Loren Wilton

But, it had:

 *  2.5 CONTENT_AFTER_HTML More content after HTML close tag

but one was only text/plain and I could see nothing wrong.   reading
72_active.cf I found:

  rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i 
 >

which fires on a text/plain part that discusses html formatting!


Note you show __CONTENT_AFTER_HTML and CONTENT_AFTER_HTML, which are not the 
same rule. I suspect the meta for CONTENT_AFTER_HTML  contains some other 
things that should in theory make it not hit in this case.


I've personally never seen this rule hit, and didn't know it existed. Are 
you sure it isn't a local rule? I have a rule of my own that gives 1 point 
for extra trash after the /html end tag. I see it frequently on spam and UCE 
that has a tracking tag in the HTML section after the official end of the 
html.


   Loren



Re: resubmit mail or just delete

2022-02-07 Thread Loren Wilton
If they are more than a month or so old, just drop them would probably be 
appropriate. Spam changes with time, and learning old spam patterns may not 
do you much good.


If you aren't running Bayes, just dump all of them.

If you are running Bayes, it might be worth running the lst month or so thru 
SA-Learn as spam, as long as you are sure thay are all spam.


Just my opinion.

   Loren 



Re: Avoid processing upsteam trusted mail with X-Spam-Flag: YES?

2022-01-06 Thread Loren Wilton
that header should be on same host as the email clients read there mails, 
if its trusted outside of local mta, then its forged say X-Spam-Flag: NO


do we want to trust it ?


I have a somewhat similar situation where the mail provider for my personal 
account runs filtering software that usually puts classification headers in 
the mail I recieve from them. I ignore any statement they make about the 
mail being clean, but if they say it is spam, I add 2 points in SA. So far 
that has rarely resulted in FPs for me.


I think it is perfectly reasonable policy to trust a (semi)trusted upstream 
system that says the message is spam, and award points for that decision: 
either the upstream system's original score, or a new score on the closer 
host, as you wish. But unless it is a completely trusted host I'd never 
trust it's decision of ham.


Unfortunately it is difficult (without trickery of some sort) to do that if 
the upstream system is also SA, since SA removes the incoming X-Spam-* 
headers.


   Loren 



Re: Rawheader or Rawsubject? Or how to match UTF-8 Emoji in Header.

2021-12-14 Thread Loren Wilton

How do I do this? There is no rawheader or rawbody matcher as far as I
could determine.


There is 'rawbody', but it may or may not help you. I seem to recall the 
Subject is prepended to the body text, but I don't recall if it is prepended 
to rawbody. You could try it.


Short of that, you may have to fall back on 'full' and match for something 
like


full MY_SUB/\nSubject: \n/



Re: SPF_NONE scoring

2021-11-30 Thread Loren Wilton

So how is this score arrived at?


I believe that scores of 0.001 are generally manually set, and not intended 
to be anything other than a visible marker that the rule hit. That is 
probably the case here.


   Loren



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-13 Thread Loren Wilton
What would be helpful here would be logging of when a rule *starts* 
evaluation. Normally that would be painful, but for tracking a runaway it 
would be useful. Perhaps I can code up something to capture that and log 
it on a timeout...


Actually what sounds like it would be useful would be knowing the name of 
the rule that timed out. I'm presuming when the timeout occurs that there is 
still some indication of the current rule being processed so it can be 
killed. I'd think that should be enough to backtrack to the rule name. A 
modification to the timeout message could display the name of the rule and 
even how long it took to that point.


I guess there might be multiple rules running when the timeout occurs and 
not know which one really timed out, but that would still be a small number 
of rule names.


   Loren



Re: Fw: spam from gmail.com

2021-11-11 Thread Loren Wilton
I have to admit I'd never paid much attention to the RCVD_IN_DNSWL_* scores 
on spam before.

Looking at spam for last month, I don't have a single RCVD_IN_DNSWL_MED.

But I do have 12 pretty blatent spams that hit RCVD_IN_DNSWL_HI.
It makes me wonder just how useful a rule it is.

Especially when it includes sendgrid as part of the "HI" reputation senders.

[ 66. 70.136.180] mta1.bevocalforlocal.info
[ 88. 80.190.164] 88-80-190-164.ip.linodeusercontent.com
[107.175.219. 38] dhrf266.medley.com.de
[107.175.219. 54] dhrf2106.realatelier.xyz
[107.175.219.103] dhrf2208.rollrs.xyz
[139.162. 81.182] 139-162-81-182.ip.linodeusercontent.com
[167. 89. 10.203] o1678910x203.outbound-mail.sendgrid.net
[167. 89. 10.203] o1678910x203.outbound-mail.sendgrid.net
[172.104.183.201] 172-104-183-201.ip.linodeusercontent.com
[172.105.221. 77] li1875-77.members.linode.com
[178. 79.178. 52] li347-52.members.linode.com
[185. 51. 39.149] static-185-51-39-149.uludns.net



Re: Unicode considered harmful again

2021-11-04 Thread Loren Wilton
In v4.x, Unicode support will be better. That also means it may be easier 
to make this sort of attack quieter in the future, as non-ASCII rules 
won't be definitively wrong as they are now.


The question is whether non-ascii malicious rules could do anything more 
damaging than simply failing to match on the obvious strings "visible" in 
the rule, or alternately deliberately match on some string that should not 
be matched, in some form of DOS attempt.


It's hard to see how someone could inject Perl (or any other) code with 
screwy rules. There was a time Perl code was allowed in rules, that was 
disallowed many years ago:


   uri  LW_PRINTIT   /(^.*$)(?{ print "URI:\n$^N\nEnd URI\n\n" })/is

That was a real handy debugging rule once, but you can't get away with that 
anymore.


   Loren



Sometimes the spammers should render the plain text manually v2.0

2021-10-27 Thread Loren Wilton

Or maybe they should just write a plain text body:

Hello there, !

This is a test template...



Sometimes the spammers should render the plain text manually

2021-10-27 Thread Loren Wilton

Rather than just a direct translation of the obfuscated HTML:

Clipxuck thDe button belo2bw avnd ewnd the confiIrmtion stTodeps




Re: Disabling autolearn on given rule

2021-09-21 Thread Loren Wilton
(2) where would I go to look at building a plugin for this? Ideally 
something that ends up upstream, but though I can write code, I know no 
perl :).


Well, from the few I've seen, they all seem to have a relatively constant 
structure. Someone pointed you to a plugin that is at least dealing in this 
general area, that might be a good starting point, barring anyone else 
having a better suggestion.


While I wrote a little Perl a decade ago I've forgotten many of the 
pecularities, but there are some good web sites out there, and there is one 
of the animal books on the subject. Perl is a bit pecular in syntax and 
function compared to the C/C++ I did much of my career, but I didn't have 
much trouble picking up enough to make some local SA hacks long ago, so if 
you can program in most anything it probably won't be too much trouble.


I don't recall if Bayes itself is called from a plugin or from the main SA 
code, but I'm pretty sure it is only called if an internal 'autolearn' token 
is true for the message. If you make a plugin that runs late in the rule 
evaluation it should be able to look at the score and rule hits and items in 
the message header and body and decide if it wants to turn off the autolearn 
flag for the message. Hopefully there isn't something in main SA code that 
determines the value of this flag after all of the rules have run.


I guess one thing you might be able to do is implement a tflags flag of 
absolutely_no_autolearn or some such that would force-disable the autolearn 
decision if the rule had hit, but that might be something that would have to 
be put into the main SA code itself. Maybe Henrick will chime in here. This 
may be really trivial if you know where to look.


   Loren


---
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Disabling autolearn on given rule

2021-09-21 Thread Loren Wilton

None of these seem to accomplish disabling learning for a specific rule


I think the problem is that I believe Bayes works off of the total score, 
and probably only sees rule names as more tokens, if it sees them at all. If 
it indeed works off the total score, about all you can do is somehow tweak 
that score for a given rule or rule combination.


   Loren


---
This email has been checked for viruses by AVG.
https://www.avg.com



An interesting bit of HTML from a spam

2021-09-12 Thread Loren Wilton
I found this little wonder in a bunch of spams I've been getting for the 
last few days:


http://; http://; http://; http://; http://; http://; 
href="http:/mi.wey.vandalized655bccemetries.cleaning/id>">unsubscribe here


I have no idea if that actually works, since I'm not about to try it.

   Loren



---
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Does anyone know what generates these email headers?

2021-09-08 Thread Loren Wilton
The originating PHP script header helps people who run shared servers 
track down the source of problematic mail. The two most common cases are:


Does this look valid?

   X-PHP-Originating-Script: 48:class.phpmailer.php

Just looking at a dozen or so of the smpams I've gotten in the last couple 
days that match this pattern, they all have an x-originating-spam-status 
of -2.9, which makes me a little suspicious that that header is faked. Maybe 
the others are too.


   Loren


---
This email has been checked for viruses by AVG.
https://www.avg.com



Does anyone know what generates these email headers?

2021-09-08 Thread Loren Wilton

I'm getting a lot of mails with some very curious headers in them.
I tried searching with Google, and it has never heard of many of these 
strings.

Does anyone recognize what might be generating these headers?

X-EOPTenantAttributedMessage
X-EmailAdvisor
X-Mxtb-Transitionid
X-MG-Subscriptionuid
X-PHP-Originating-Script
X-EmailTransmit-type
CMM-X-SID-Result
CMM-X-AUTH-Result
CMM-X-Message-Status
X-OutGoing-Spam-Status
X-EmailTransmit-aid
X-rext

Thanks!

   Loren


---
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Website "help" spams

2021-07-28 Thread Loren Wilton

body NOT_INTERESTED=~ /“[Nn]ot\S{1,5}[Ii]nterested\.?â€/

Might also be an interesting test. I assume the gibberish on the front and 
back is quotes in some character set or another, but they seem a little 
unlikely in a real mail.


   Loren


---
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Another evil "order response" number

2021-07-14 Thread Loren Wilton
And yet another rather amusing one from a crypto trading scam:


The BTC wallet which you have to send is:
1GF1DcYFpe MoA4Ttj6TeWPK sJFRV43JjYc (PLEASE REMOVE THE SPACES FROM THE WA=
LLET NUMBER)

Our trading system will automatically recognize your investment and start =
making profits for YOU!

If you need some additional info please text or contact me using Telegram:=
 +44 775 254 0482



Another evil "order response" number

2021-07-14 Thread Loren Wilton

Thanks  Regards,
Billing Team

Defender Firewall Protection 




+1, 888, 313, 1366




Re: number in sender name

2021-07-10 Thread Loren Wilton
Perhaps memory fails, but was there not, once, a standard rule that 
detected non alpha characters in

sender name?  The domain/provider is not of interest for this question.

I think there was, but I suspect that the spam/ham ratio would be about 
even, which is probably why it doesn't show up now. 



Another evil number

2021-06-25 Thread Loren Wilton

From a fake "subscription" spam:


You can reach out
  to our Customer Support Team+1 (800) 781 - 2511.



Re: Maybe it's time to revive EvilNumbers?

2021-06-17 Thread Loren Wilton
A number of the rules I passed along are generic "order" rules rather than 
Amazon specific. I had to go back to last month's spam to find an Amazon order 
spam, but I've gotten a dozen or so fake orders for other things this month, 
all of which hit on the LW_BOGUS_ORDER rule.

Loren
  - Original Message - 
  From: Mark London 
  To: users@spamassassin.apache.org 
  Sent: Thursday, June 17, 2021 8:52 AM
  Subject: Re: Maybe it's time to revive EvilNumbers?


  Loren - Unfortunately, the fake amazon shipment email that we received, 
doesn't contain the word Amazon in it's From or Subject headers. 

  Or even the word amazon in the text of the message!  Just the Amazon logo.

  And they've removed all the URLs, so the links don't work at the bottom.   
And they left the postal address of amazon, without the word amazon.

  I hate bogus spam that is so obviously bogus that it avoids filter rules. :) 
- Mark 


  On 6/17/2021 10:52 AM, users-digest-h...@spamassassin.apache.org wrote:

  Subject: Re: Maybe it's time to revive EvilNumbers? 
      From: "Loren Wilton"  
  Date: 6/16/2021, 8:18 PM 
  To:  


Here are a handful of rules that work for me. Feel free to try them. 
If you do, please let me know how they work for you. 

(Apologies for my mail client trashing the formatting. 
Be sure to check for possible line wrap on some of the rules!)




Re: Maybe it's time to revive EvilNumbers?

2021-06-16 Thread Loren Wilton

Here are a handful of rules that work for me. Feel free to try them.
If you do, please let me know how they work for you.

(Apologies for my mail client trashing the formatting.
Be sure to check for possible line wrap on some of the rules!)

   Loren


body  LW_PAYMENT  /You\s+sent\s+a\s+Payment\s+of/i
score  LW_PAYMENT  0.5
describe LW_PAYMENT  You sent someone a payment

body  LW_ORDER  /\b(?:order|purchase)\s+(?:number|ID|date|description)\b/i
score  LW_ORDER  0.5
describe LW_ORDER  Contains order information


header  __LW_SUB_INVOICE Subject =~ /\b(?:invoice|order)\b/
header  __LW_FROM_INVOICE From =~ /\b(?:invoice|order)\b/
header  __LW_ABC_LISTID List-Id =~ /\w{13}\s+\, some 

meta  LW_BOGUS_ORDER (__LW_SUB_INVOICE || __LW_FROM_INVOICE) && 
__LW_ABC_LISTID

score  LW_BOGUS_ORDER 5
describe LW_BOGUS_ORDER Fake order or invoice

meta  LW_SPAM_LISTID __LW_ABC_LISTID
score  LW_SPAM_LISTID 1
describe LW_SPAM_LISTID The List_Id header seems to indicate spam


meta  LW_FREEMAIL_ORDER FREEMAIL_FROM && (LW_ORDER || LW_PAYMENT)
score  LW_FREEMAIL_ORDER 4
describe LW_FREEMAIL_ORDER An order receipt from a free email address


header  __LW_SUB_AMZ_ORDER Subject =~ /^Your Amazon\.com order 
\#\d{3}-\d{7}-\d{7}\s*$/
header  __LW_FROM_AMZ_ORDER  From  =~ 
/\"Amazon\.com\"\s+/

header  __LW_REP_AMZ_ORDER   Reply-To =~ /^no-reply\@amazon\.com\s*$/
body __LW_BODY_AMZ_ORDER  /Amazon.com Order Confirmation/

meta LW_REAL_AMZ_ORDER__LW_SUB_AMZ_ORDER && __LW_FROM_AMZ_ORDER 
&& __LW_REP_AMZ_ORDER && __LW_BODY_AMZ_ORDER

scoreLW_REAL_AMZ_ORDER-2
describe LW_REAL_AMZ_ORDER   Amazon order confirmation

header  __LW_FROM_AMZ  From  =~ /\bamazon\b/i
header  __LW_SUB_ORDER Subject =~ /\border\b/i

meta LW_FAKE_AMZ_ORDER   __LW_FROM_AMZ && __LW_SUB_ORDER && 
!LW_REAL_AMZ_ORDER

scoreLW_FAKE_AMZ_ORDER   7
describe LW_FAKE_AMZ_ORDER   Amazon order phish





Re: Maybe it's time to revive EvilNumbers?

2021-06-15 Thread Loren Wilton
My site is getting a lot of spam that is getting past spamassassin. 
Because it has a hone number to call, and rather than a link to login 
using username and password.   Mostly fake amazon purchases.   They are 
getting past a lot of URL block lists because of that.   FWIW. - Mark


I have a number of "purchase" rules that add about 30 points for fake Amazon 
(and other) scams. I haven't had one get thru in the last couple of months 
since I instituted them, but I only have a personal account and not a whole 
site, so YMMV. None of them look for phone numbers, but I do have a set of 
rules for a handful of stolen business addresses commonly used in spams I 
get. They add a few points when those show up.


   Loren



Re: Header exists with a dollar sign in it

2021-05-26 Thread Loren Wilton

You could try

headerX_SWITCHALL=~ /^X-\$switch\b/sm

   Loren



Re: Bayes autolearn: how does it resolve whether rules are body or header related?

2021-05-09 Thread Loren Wilton

so you don't have points from body rules.

your mentioned URI_DEOBFU_INSTR is a meta rule:

meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST

so maybe it's not considered.


They are treated as header, or ignored if marked as net.


I think a bug report should be submitted for this.

Either they should be treated split 50/50 as header and body score, or when 
the metas are built they shoudl have a "body rule" flag, and that used to 
determine where the score goes.


I tried, but for some reason apache decided that I'm evil and blocked the 
submission attempt, so someone else can do it.


   Loren



Re: How do I search and capture text for use in a rule?

2021-05-08 Thread Loren Wilton
I think the OP was trying to find a way to match "To: " to 
"Hi user".


   Loren



Re: ExtractText and docx

2021-05-06 Thread Loren Wilton

I'm trying to use the latest ExtractText plugin, but the docx2txt
program the plugin references is no longer available from
http://docx2txt.sourceforge.net


The latest version appears to be 1.4 from several years ago.
I just tried downloading the 1.4 version and the CVS version, and in both 
cases was rewarded with an archive file.


   Loren 



Re: My 10 years old domain have a bad TLD

2021-05-03 Thread Loren Wilton

.pro have a -1 with SUSP_URI_NTLD_PRO.


Is that really minus 1? Negative scores are good, they counteract spammy 
scores, which are positive.


   Loren



Re: SA seems powerless against marketing emails for SEO/web development

2021-04-22 Thread Loren Wilton
I could add another point between BAYES_999 and BAYES_99 scores but that 
seems reactionary. Is there a better way? Should I thrown in another point 
for certain keywords in marketing emails like these?


For this specific message I might be inclined to add a rule to check for a 
URL in the subject and add a point for that. I can't think of very many 
legit mails I've ever received with a URL in the subject. A point or two for 
that should be safe enough if it isn't spam, but could trip it over the edge 
if it is.


headerURI_IN_SUBJECTSubject =~ 
/\b[-\w\._]+\@(?:[-\w_]\.)+(?:com|org|biz|cloud)\b/

score  URI_IN_SUBJECT1.5
describe URI_IN_SUBJECTA URI in the subject of the message

Something like that, maybe.

   Loren



Re: Spoofed amazon order email

2021-04-16 Thread Loren Wilton
While I haven't received a forged Amazon order email in this exact form, 
there is all kinds of stuff here that could be caught with appropriate 
rules.


   "In-case you require any
   change in order or like to cancel we recommend giving us call
   immediately at "

"In-case" is unlikely in mail, there should be no dash there.
"giving us call" is missing "a" and is bad grammer, but typical of 
non-English speaking spam.

"In case you require any change in order" is also poor phrasing.
The whole "call us immediately to change your order" concept rates 3 points 
on my mail system.

No phrase of any similar sort appears in a real Amazon order confirmation.


An actual Amazon order has a subject of the form

   Subject: Your Amazon.com order #114-2489974-7888243

The Subject here is

   Subject: IVK-1250703-9254770 | Apple Watch Series 6 Order Now Confirmed

The order number is in the wrong format.
The order number is in the wrong place in the subject text
The subject text is in the wrong format.


An actual Amazon order confirmation has the headers, in this order:

   From: "Amazon.com" 
   Reply-To: no-re...@amazon.com
   To: 
   Message-ID: <010001774af541dc-d38f4184-621e-4014-a295-c520285ae319-00 
0...@email.amazonses.com>

   Subject: Your Amazon.com order #114-2489974-7888242

This mail has

From: "or...@amazon.com" 
   X-Google-Original-From: "or...@amazon.com" 
   Content-Type: multipart/alternative;
   boundary="===2707982310301423984=="
   MIME-Version: 1.0
   Subject: IVK-1250703-9254770 | Apple Watch Series 6 Order Now Confirmed
   To: s...@dondley.com

The header order is completely different.
There is no Reply-To header
The From address is completely wrong.
There should be no X-Google-* headers.


There should also be a header:

   X-AMAZON-MAIL-RELAY-TYPE: notification

A real Amazon order receipt has Content-Type = multipart/alternative, but it 
only contains a text/plain part encoded in QP, with no HTML part. This 
message has an HTML part and should be getting MPART_ALT_DIFF.




   "This email was sent from a
   customer service address kindly write us back if you have any concern. "

This is bad grammar and a very unlikely form of robot sending account 
notice. A real Amazon order contains


   "This email was sent from a notification-only address that cannot accept 
inc=

   oming email. Please do not reply to this message."

This is a very stasndard phrasing for this sort of notice.


A real Amazon order confirmation does not contain an "unsubscribe" link. 
This phish does.



There is a lot of other stuff that could be caught by various rules, but a 
trivial set would be something like


#---
# 04/16/2021
# A bunch of rules to try to catch fake Amazon order confirmations, based on 
a

# message pasted to the SA Users list.

header __LW_SUB_AMZ_ORDER Subject =~ /^Your Amazon\.com order 
\#\d{3}-\d{7}-\d{7}\s*$/
header __LW_FROM_AMZ_ORDER From =~ 
/\"Amazon\.com\"\s+/

header __LW_REP_AMZ_ORDER Reply-To =~ /^no-reply\@amazon\.com\s*$/
body __LW_BODY_AMZ_ORDER /Amazon.com Order Confirmation/

meta LW_REAL_AMZ_ORDER __LW_SUB_AMZ_ORDER && __LW_FROM_AMZ_ORDER && 
__LW_REP_AMZ_ORDER && __LW_BODY_AMZ_ORDER

score LW_REAL_AMZ_ORDER -2
describe LW_REAL_AMZ_ORDER Amazon order confirmation

header __LW_FROM_AMZ From =~ /\bamazon\b/i
header __LW_SUB_ORDER Subject =~ /\border\b/i

meta LW_FAKE_AMZ_ORDER __LW_FROM_AMZ && __LW_SUB_ORDER && !LW_REAL_AMZ_ORDER
score LW_FAKE_AMZ_ORDER 7
describe LW_FAKE_AMZ_ORDER Amazon order phish

You might also like

body LW_PAYMENT /You\s+sent\s+a\s+Payment\s+of/i
score LW_PAYMENT 0.5
describe LW_PAYMENT You sent someone a payment

body LW_ORDER /\b(?:order|purchase)\s+(?:number|ID|date|description)\b/i
score LW_ORDER 0.5
describe LW_ORDER Contains order information
?
meta LW_FREEMAIL_ORDER FREEMAIL_FROM && (LW_ORDER || LW_PAYMENT)
score LW_FREEMAIL_ORDER 4
describe LW_FREEMAIL_ORDER An order receipt from a free email address
? 



Re: Re: LANSET, do they create anything but SPAM?

2021-04-13 Thread Loren Wilton

Examples: https://pastebin.com/pF6Nmquc


Well, I can see a couple of simple rules that would catch these two, but I 
don't know if they would also trip on legit mail.


   List-Unsubscribe: m'http://180e977\.olink1\.xyz'
   X-Mailer-SID: m'\b180e977_18\b' 



Re: Using spamassassin to thwart sharepoint phishing attacks

2021-04-11 Thread Loren Wilton

 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.5 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]


I have 


5.0 BAYES_99   BODY: Bayes spam probability is 99 to 100%
   [score: 1.]
0.2 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
   [score: 1.]

I suggest raising BAYES_99 to at least 5.

   Loren



Re: gmail hotmail picture and a lot of spam-rubish

2021-04-09 Thread Loren Wilton
We would need to see the original headers from the spam, or ideally the whole 
spam before we could say anything. It would also be helpful to see the rules it 
hit on your system.

Loren

Re: Are X-MC-xxx headers legit?

2021-03-29 Thread Loren Wilton
I would not be so broad with that. I have 49 messages in my personal 
archives with X-MC-User headers, none of which I have classified as spam.


Bill, do you see multiple X-MC- headers in the mails that come thru 
MailChimp?

As in, "multiple many" or "multiple 2 or 3"? Or just the Users header?

I can't tell from the MailChimp documentation whether the headers will be 
generally filtered from the final email message, or passed through. The 
majority of them are instructions to MailChimp to do something in either the 
headers or body of the message, so really it makes little sense to leave 
them in the final message.


I can write rules to detect bogus values for quite a few of the headers, but 
the allowed text for a lot of the headers is moderately complicated, so gets 
to be a big and expensive regex. It would be a lot easier to just add points 
if there are say 3 or more X-MC headers in a row. But of course that is no 
good if MC does just pass all the direction headers through to the final 
messages it generates.


Thanks,

   Loren



Re: Are X-MC-xxx headers legit?

2021-03-29 Thread Loren Wilton
Ah, OK. Looking at the MailChimp page, it appears that these headers appear on 
a message being sent to MC, and then it extracts them, most likely removes them 
from the final generated email, and uses them as processing instructions on how 
to generate the email or sequence of emails. In any case it seems rather 
unlikely that they should ever appear in a received email message.

And whether they should or not, the values given for about 90% of the headers 
is simply invalid according to the MC page describing them. The headers that 
are valid are direct copies of the examples given on the MC page, and would not 
likely work for any real email campaign.

I'd say that presence of X-MC-xxx headers in a received message is a 100% 
guarantee of a targeted advertizing message, and a 99% guarantee that the 
message is a spam. If the values given for the options in the headers are 
obviously invalid, that rises to a 100% chance that the message is spam.

I'd call these headers a great spam sign.

Loren



Are X-MC-xxx headers legit?

2021-03-28 Thread Loren Wilton
I've started seeing a number of spams with the following block of X headers 
in it. I've never seen these before. While these look really fake to me 
(from the content of most of them), does any real tool or site make headers 
like this, or are they just from some spam tool and I can use them as a 
guarantee of spam?


x-mcpf-jobid: mc.us6_13712451.1216993.5a2d921a72084.full_03
X-MC-User: 6b669534c4be2b401d8744486
X-MC-Tags:45829
X-MC-Track:456
X-MC-Autotext:global
X-MC-AutoHtml: format
X-MC-Template: smiley
X-MC-MergeVars: {"_rcpt": "emailadr...@domain.com", "fname": "John", 
"lname":"Smith"}

X-MC-GM-Analytics: normal
X-MC-GoogleAnalyticsCampaign: good
X-MC-Metadata: { "user_id": "45829", "location_id": "111" }
X-MC-URLStripQS:   link to 
X-MC-PreserveRecipients: 123
X-MC-InlineCSS:  used
X-MC-Subaccount: sendgrid
X-MC-ViewContentLink: {uurl}
X-MC-BccAddress: {ourl}
X-MC-Important: notes
X-MC-IpPool: 99%
X-MC-ReturnPathDomain: server1.tech
X-MC-SendAt: "AsianBeauties Team"
X-MC-MergeLanguage: {all language}
X-MC-MergeVars: {"var1": "global value 1"}
X-MC-MergeVars: {"_rcpt": "emailadr...@domain.com", "fname": "John", 
"lname":"Smith"}

X-MC-GoogleAnalytics: www.domain.com, domain.
X-MC-Metadata: { "user_id": "45829", "location_id": "111" }
X-MC-Metadata: { "group_id": "users_active" }
X-MC-Metadata: { "_rcpt": "f...@example.com", "user_id": "123" }
X-MC-Metadata: { "_rcpt": "b...@example.com", "user_id": "456" }
x-mcda: TRUE




Re: ReturnPath rule renaming

2021-03-26 Thread Loren Wilton
In order to bring the SenderScore/ReturnPath DNS reputation and blocklist 
rules up-to-date with their current ownership and administration, the 
rules are being renamed:


  RCVD_IN_RP_CERTIFIED -> RCVD_IN_VALIDITY_CERTIFIED
  RCVD_IN_RP_SAFE -> RCVD_IN_VALIDITY_SAFE
  RCVD_IN_RP_RNBL -> RCVD_IN_VALIDITY_RPBL


John, you might add this text to the comment you made on Bug 6247. I read 
through you comment there, then went and scanned the entire comment stream 
in the bug (most all from 2009) to try to figure out what was being changed, 
and finally came up empty. There was no description of what the ownership 
change was, nor the administration change, nor any mention of what exactly 
had been changed in the rules.


   Loren



No rule for fake payPal messages?

2021-03-19 Thread Loren Wilton

I just got this little wonder, and was surprised that it got thru as ham.

   From: "PayPal Billing" 

I've fixed that locally, but I'd think SA ought to have a rule for "PayPal" 
that doesn't come from paypal.





Has anyone heard of these people?

2021-03-17 Thread Loren Wilton

I just got what appears to be a legit email from my ISP.

It has a tracker tag pointing to 102.122.207.net.

Note that is a site name and not a dotquad.

Somehow this doesn't make me real comfortable with the possible veracity of 
the email.


Has anyone come across 102.122.2O7.net before?

   Loren



Re: Points for improbable Received header date?

2021-02-12 Thread Loren Wilton

why is date important ?, spamassassin do test it already

DATE_IN_PAST *


Well, the date is a spam sign. That is good enough for me to be important.
And the DATE_IN_PAST * rules don't hit these spams. 


   Loren



Re: Points for improbable Received header date?

2021-02-11 Thread Loren Wilton
and if you want to become an hero patches to document those evals are 
always

welcome ;-)


Well, if I use undocumented code I have to figure out, I always do my own 
documentation, since my memory these days is about five minutes long. The 
trick for me will be figuring out how I could submit those changes as a 
patch, since I'm an old mainframe and now Windows guy, and not used to 
creating Unix patch files. I guess there must be a tool somewhere to diff 
two file versions and create a proper format file.


   Loren



Points for improbable Received header date?

2021-02-11 Thread Loren Wilton
I'm getting a lot of spams that all have a series of completely bogus 
Received headers in them. A characteristic of these headers is a rather 
improbable datestamp, considering today's date:


Received: from 69-171-232-143.mail-mail.facebook.com ([69.171.232.143])
by oxsus1nmtai03p.internal.vadesecure.com with ngmta
id 0574d1a8-1628c15907fbaba1; Thu, 06 Aug 2020 18:30:56 +

Note that this message must have been in flight for about a year and a half 
according to that header.


Anyone know an easy way to check for a Received header date more than say a 
week old and add some points?


   Loren



ViraLife

2021-01-27 Thread Loren Wilton
Has anyone been getting spams from "ViraLife"? They have slowly started, one 
by one, hitting all of my email inboxes. It shows up about once a week as a 
"newletter". It claims to be from a legit email hosting company I know 
nothing about, and I certainly have never signed up for this spam.


Here are some headers for what I just got:

Received: from ver-vp1.custonews.net ([192.250.230.87])
by oxsus1nmtai04p.internal.vadesecure.com with ngmta
id 3cd7cc16-165e208a6ec4ede9; Wed, 27 Jan 2021 15:31:36 +
Received: from localhost ([127.0.0.1]) by ver-vp1.custonews.net with SMTP; 
Wed, 27 Jan 2021 10:31:36 -0500 (EST)

Date: Wed, 27 Jan 2021 10:31:36 -0500 (EST)
Subject: Some articles from this week that might interest you..
From: "ViraLife" 
Message-ID: 





Re: More undetected hidden test spam signs

2020-12-22 Thread Loren Wilton
Right, but __STY_INVIS is currently tag-blind (it only looks for the 
style="" clause), so it hits that, and if lots of ham is hiding tracking 
images that way that might explain the poor S/O.


I suspect that might be the case.

The vast majority of invisible garbage I see is hidden in a  ... 
 pair, typically two per spam and about 50K in each one. Looking at 
the definition of the 

Re: More undetected hidden test spam signs


On 16 Dec 2020, at 23:21, Loren Wilton  wrote:

I just got a batch of spams containing




Such rules are there. Unfortunately, for whatever reason, lots of ham 
uses "invisible" text so it's not useful as a spam sign by itself and 
it's hard to come up with any useful combination rules.


I think I may have figured it out - tracking images. Like:

style="visibility: hidden !important; display:none !important; max-height: 
0; width: 0; line-height: 0; mso-hide: all;">


Note in your example the display:none is in a contained tag and not in an 
opening tag of a span. The tag is probably fairly long because the URL is 
probably huge, but it is still the one item that is hidden.


I put in a local rawbody rule for
   m'.{100,}(?:$|)'is
and so far I haven't gotten any hits on ham.

Of course that is a pretty heavy rule, but it would seem to indicate that 
hidden spans may not be that common in ham.




More undetected hidden test spam signs


I just got a batch of spams containing



That was followed by about 2K bytes of garbage containing GUIDs and links to 
putatively some youtube video. The span was then terminated correctly, the 
body of the spam, and then the same garbage for about another 2KB.


The small font rules didn't seem to catch this.

   Loren



Re: Possible spam sign


That probably should have hit at least one scored base rule:

  https://ruleqa.spamassassin.org/?rule=%2FFROM_2_


Nope. I think my rules are up to date, but maybe not.


Feel free to pastebin it and I'll take a look.


https://drive.google.com/file/d/1WQ0Mm1iUsKhTj51mFJwwehuTatSm8Nux/view?usp=sharing



Re: Possible spam sign


That probably should have hit at least one scored base rule:

  https://ruleqa.spamassassin.org/?rule=%2FFROM_2_


Nope. I think my rules are up to date, but maybe not.



Possible spam sign


I just received a spam with this interesting From address:

From: "VA Rate Guide" 



I wonder if it is worth checking for mail from more than one sender at once?

   Loren



Are these valid email headers?

I don't have a Faceboox account and don't know anyone on Facebook that would 
send me mail (and don't want to!), so I have absolutely no idea if these 
headers from recent spams are completely made up out of the air (and thus 
spam signs) or are valid headers.


Can anyone tell me if this stuff is valid or obviously fake?

X-Facebook: from 2401:db00:1050:208b:face:0:4f:0 ([MTI3LjAuMC4x])
by www.facebook.com with HTTPS (ZuckMail);
X-Priority: 3
X-Mailer: ZuckMail [version 1.00]
X-Facebook-Notify: skipped_password_change; 
mailid=5ac39662d1c08G5af32c89e396G5ac39afc31edaG569

Feedback-ID: 509:skipped_password_change:Facebook
X-FACEBOOK-PRIORITY: 0
X-Auto-Response-Suppress: All
Require-Recipient-Valid-Since: gouldi...@earthlink.net; Sunday, 29 Nov 2009 
00:17:08 +



Thanks,

   Loren



Re: Apache SpamAssassin and Spammers 1st Amendment Rights

Keep in mind that freedon of speech says that you can stand in the park on a 
soapbox and shout.

It does NOT say that passers-by are forced to stand there and listen to you 
until you run out of voice. They can walk away any time they want to.

It also does not say that the local newspaper is required to take down 
everything you say and print it on the front page for all their readers. (But 
freeedom of the press says that they can, and you can't sue them for copyright 
infringement or much of anything else for doing so, even though people and 
organizations now try that.)

Whether any organization has an obligation to convey the voice in the park to 
all of its customers, or even a selection of them, is an interesting question. 
By and large, organiizations in the US have the rights of individuals, and that 
includes the right to walk away and stop listening. The major possible 
exception would be a "common carrier", which cannot disconnect a call because 
they don't like what is being said. 

But ISPs are NOT common carriers. So even if the spammer is addressing spam to 
specific individuals, the ISPs, not being common carriers, do not have an 
obligation to deliver the spam.



Re: Problem with matching regex against long body

You may also want to stick optional whitespace in there to avoid trivial 
bypass:
There's also the possibility of adding a typeface or other options to the 
 tag, which would bypass your simple rule. And HTML is not 
case-sensitive. And avoid * on complex stuff when matching arbitrarily 
long texts, which can lead to runaway backtracking and scan timeouts.


Thanks. This spammer is prolific, but seems to be very stupid and pattern 
based, hardly ever varying what he puts in some parts of the message. I've 
been seeing this pattern without change for about 3 months now. I almost 
never have to tweak a rule for his stuff to account for a possible 
variation.


It would be interesting (at least to me) to run a set of test rules against 
the SA corpus to try to determine the optimial cutoff point for a good S/O 
as regards length of 0-point text. I personally have absolutely no idea what 
a "reasonable" size is for 0-point text in an email. Personally I'd be 
inclined to say that any 0-point text isn't reasonable, but mass marketers 
seem to believe otherwise.




Re: Problem with matching regex against long body


See rawbody_part_scan is the docs.


Also the chunking of the rawbody into  2-4 kB blocks, may make a
difference.


I wasn't able to find rawbody_part_scan in any of the docs that I managed to 
find, but after digging into the source I found the chunking logic and dug 
out the 2K limit. I'm not sure why I was hitting a limit at just under 1K, I 
can only guess that the header was included in the first rawbody chunk, 
which seems a little unlikely.


I was able to get the rule to work using a full rule, but I sure hated to do 
that, since I lose the base64 decoding of the body, and full rules are ugly 
and potentially dangerously inefficient. But at least it worked. Fortunately 
these spams are plain text encoded.




Re: Problem with matching regex against long body


basics of escaping at least *anything* won't do any harm

php > echo preg_quote('[^<]*<');
\\[\^\<\]\*\<


Well, escaping the [^<]* part certianly will do harm, since it will turn it 
from a group match into individual characters that don't exist in the text 
to be matched.


But I've tried escaping the standaline characters like <, =, :, etc, and 
that doesn't help. I have many regex patterns without these escaped, so I'm 
pretty sure they work as expected normally, so should here too.




Problem with matching regex against long body

I'm getting lots of spams that are about 100+K long. The spam body contains 
two blocks of random news text copied from fox news or msnbc or the like, 
enclosed in a zero-point font block. I'm trying to match this simple pattern 
to give some extra points, but I can't seem to get it to work. I'm wondering 
if there is some buffer limit in SA that is preventing the match from 
working.


If I try

   rawbody LONG_HIDDEN m'[^<]*<'s

I don't get a match, even though I know there is a  about 50K into 
the message.


But if I try

   rawbody LONG_HIDDEN m'[^<]*'s

I do get a match. Note all I've done is remove the final "<" from the match 
text.


If I try

   rawbody LONG_HIDDEN m'[^<]{990,}'s

I get a match.

but if I try

   rawbody LONG_HIDDEN m'[^<]{997,}'s

I don't  get a match, but I know there is over 100K of text after that font 
tag.


Can anyone see something I'm doing wrong, or know of some limitation in SA 
that will prevent these long matches from working?


Thanks,

   Loren



Re: The most efficient SPAM implementation ever

> Can you please tell me how to generate that report?

I believe he is asking for the results of something like

spamassassin -t 

Re: blacklisting the likes of sendgrid, mailgun, mailchimp etc.


https://krebsonsecurity.com/2020/08/sendgrid-under-siege-from-hacked-accounts/
also sheds light on the issue too.


. SendGrid knows (or should konw) that it has compromised accounts. 
It could find out what some of them are for free by downloading Rob's list 
of 25 or so compromised accounts. It could find out what some of the other 
400 are for $15 each, and could find out what some of the major offenders 
are for $400 each. Let's see, 400 compromised accounts times $400 is $16,000 
dollars. SendGrid or Twillio can't afford a $16,000 cash outlay to find the 
account names of the major compromised accounts? Their head of security 
probably gets that much a month in salary and bonuses. It would be a trivial 
expense.


So what could they do once they knew which acocunts are compromised?
Are they helpless, and can only wring their hands and issue press releases 
saying They Have A Plan?


No. They can SHUT THE DAMN ACCOUNTS DOWN. Issue refunds to the owners if 
they feel generous. Tell the owners to open new accounts with 2FA.


But they won't do this, because they get their money from sending spam.

   Loren



Re: Amazon, dhl, fedex, etc. phishing

> We are regularly getting phishes from dhl, fedex, usps, amazon, netflix,
> spotify that fakes the from (eg. amazon  wants
> to send me a amadon-legit.pdf). Usually these are previously unknown to
> pyzor, dcc, rbls, and domain reputation doesn't really exist[0].
> 
> I'm wondering if anyone has made a rule that looks to see if the From
> contains amazon, but it is not amazon.com/.ca/.jp (all their TLDs), then
> score them up, if it wants to also drop a psd, or a tar.xz, or a png, or
> a pdf or whatever, then light them on fire.

I have rules similar to that to catch other things. I just made one for you to 
catch a spam that claims to be from USPS but is not. Simple modifications will 
catch other putative senders.

#---
# 08/24/2020

# Someone on the SA mailing list is upset about spams that claim to be from some
# reputable company, usually a package transfer company, but actually aren't.
# I have an example in today's spam, though it is caught by lots of other rules:
#
# From: USPS 

header  NOT_FROM_USPS From =~ 
/\bUSPS\b[^<]*<[\w\-.]+\@[\w\-.]*\b(?!usps\.com)\s{0,3}>/
score   NOT_FROM_USPS 1
describeNOT_FROM_USPS Claims to be from USPS, but isn't
 


I'm also including two general rules that catch this sort of stuff most of the 
time.

#---
# 01/21/08

# Return-Path: 
# Message-Id: <20080121072522.16582.qmail@comp2>
# From: 
# 
# The from and the return-path should match
# The from host and the message-id host should match

header  __FROM_SENDER   ALL =~ 
m'Return-Path:\s+<([^\n>]+)>.*\nFrom:(?:[^<\n]+<\1>|\s+\1$)'si
header  __NULL_SENDER   Return-Path =~ /<>/
metaNOT_FROM_SENDER !__FROM_SENDER && !__NULL_SENDER
score   NOT_FROM_SENDER 1
describeNOT_FROM_SENDER Not from putative sender

# Return-Path: 
# Message-ID: <7a9a01c85ca2$0fcbc910$c0a80102@Ricky>

header __SENDER_MSGID   ALL =~ 
m'Return-Path:[^\@\n]+\@([^>.]+).*\nMessage-Id:[^\@\n]+\@[\w.]{0,30}\1'si
meta   NOT_SENDER_MSGID !__SENDER_MSGID && !__NULL_SENDER
score  NOT_SENDER_MSGID 0.5
describe   NOT_SENDER_MSGID Sender host doesn't match message-id host
 

Re: Zero-point garbage text that isn't caught by the small-font rules

I've seen mail containing ONLY the text mentioned above, in which case 
it's

strange.  From the original mail I got feeling that the mails also contain
mentioned text only...


The original mails I clipped the original obfuscation text from were using 
it to hide a phishing attempt. I have not seen it used with no other content 
in my mail stream.  However, from time to time I see a mal-formed spam that 
lacks content and just has the formatting. Perhaps that is what you are 
seeing.


   Loren



Re: SendGrid (Was: Re: Freshdesk (again))

money should not make the emails go around, like wize no pressident should 
be elected by money


Well, no judge nor congressman should be elected by money either. But we 
changed the rules some dacades back and legalized bribery, specifically in 
the payment of money to elect your favorite candidate. So, as functionally 
implemented in the current US Govenrment, all elected officials *should* be 
elected by money, because that is the law.


But that is very off topic, hopefully.

   Loren



  1   2   3   4   5   6   7   8   9   10   >