Re: Spam from compromised accounts scoring just under block threshold

2018-03-30 Thread Amir Caspi
Following up on this... I've been consistently seeing a lot of spam like this, 
with multi-dot usernames.  Sometimes with "person.from.spam" but more often 
just a punctuated phrase like "some.spammy.item.sold" or whatever.  Most often 
only two dots (three words), sometimes four or more.

Has anyone been testing this as a meta rule?

Cheers.

--- Amir

> On Mar 6, 2018, at 9:37 AM, John Hardin  wrote:
> 
> On Mon, 5 Mar 2018, Amir Caspi wrote:
> 
>> On Mar 5, 2018, at 11:13 PM, John Hardin  wrote:
>>> 
>>> *before* the @ sign.
>>> 
>>> It may be perfectly valid to do that, but if it happens more often in spam 
>>> than in legitimate mail it is useful to us.
>> 
>> I’m seeing a lot of spam lately with usernames like 
>> “bob.from.somespamcompany”. Could definitely be at least a meta rule.
> 
> ...or potentiallyfrom:addr =~ /[^@]*\.from\.[^@]*@/if ".from." is 
> literally in the username part.
> 
> -- 
> John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
> jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
> key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> ---
>  Failure to plan ahead on someone else's part does not constitute
>  an emergency on my part. -- David W. Barts in a.s.r
> ---
> 5 days until Daylight Saving Time begins in U.S. - Spring Forward



Bayes and hyphens

2018-03-30 Thread Amir Caspi
Hi all,

Does Bayes tokenize on word boundaries and hence would ignore hyphens?  Or does 
it include them?  I've seen a lot of spam lately inserting random hyphens 
between key spammy words (like "economic-crisis"), presumably in an attempt to 
bypass word filters and/or Bayes.  So would word1-word2 get tokenized as a 
single item or as two words?

If hyphens are currently included, then perhaps Bayes should be updated to 
ignore hyphens and/or tokenize at word boundaries?

Cheers.

--- Amir



Re: Lots of money, score of 0??

2018-03-30 Thread RW
On Thu, 29 Mar 2018 08:50:48 -0700 (PDT)
John Hardin wrote:

> On Thu, 29 Mar 2018, RW wrote:
> 
> > The rule is matching on "$10.99 o" and "£1.70 2 6" respectively.  
> 
> Sadly that's kind of unavoidable given spammer obfuscation and the
> fact that cultures differ on what character to use for the decimal
> point and thousands separator.
> 
> > I've seen other types too, e.g.
> >
> > https://example.com/?f=a37688909bc4f6
> >
> > £20 M voucher  
> 
> *that* is a bit unexpected...

It's understandable though because it's "£20 M" followed by a word
boundary.

The other one could be seen as a bug, __LOTSA_MONEY_01 is an ordinary
body rule, so a "=a3" that represent a "£" should have already been
decoded.