Re: More undetected hidden test spam signs

2020-12-22 Thread Loren Wilton
Right, but __STY_INVIS is currently tag-blind (it only looks for the 
style="" clause), so it hits that, and if lots of ham is hiding tracking 
images that way that might explain the poor S/O.


I suspect that might be the case.

The vast majority of invisible garbage I see is hidden in a  ... 
 pair, typically two per spam and about 50K in each one. Looking at 
the definition of the 

Re: More undetected hidden test spam signs


On Tue, 22 Dec 2020, Loren Wilton wrote:


On 16 Dec 2020, at 23:21, Loren Wilton  wrote:

I just got a batch of spams containing




Such rules are there. Unfortunately, for whatever reason, lots of ham uses 
"invisible" text so it's not useful as a spam sign by itself and it's hard 
to come up with any useful combination rules.


I think I may have figured it out - tracking images. Like:

style="visibility: hidden !important; display:none !important; max-height: 
0; width: 0; line-height: 0; mso-hide: all;">


Note in your example the display:none is in a contained tag and not in an 
opening tag of a span. The tag is probably fairly long because the URL is 
probably huge, but it is still the one item that is hidden.


Right, but __STY_INVIS is currently tag-blind (it only looks for the 
style="" clause), so it hits that, and if lots of ham is hiding tracking 
images that way that might explain the poor S/O.



I put in a local rawbody rule for
  m'.{100,}(?:$|)'is
and so far I haven't gotten any hits on ham.


How much spam hits that very simple case? I had a __SPAN_INVIS rule 
(currently commented out) but IIRC it also had poor S/O. It wasn't as 
simple as yours, though - perhaps I'm allowing for too many 
syntactically-valid cases to try to avoid trivial avoidance by spam?



Of course that is a pretty heavy rule


It would be lighter if you didn't look for the tag closing. Is there a 
reason you care about the closing for that?


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 3 days until Christmas


Re: More undetected hidden test spam signs


On 16 Dec 2020, at 23:21, Loren Wilton  wrote:

I just got a batch of spams containing




Such rules are there. Unfortunately, for whatever reason, lots of ham 
uses "invisible" text so it's not useful as a spam sign by itself and 
it's hard to come up with any useful combination rules.


I think I may have figured it out - tracking images. Like:

style="visibility: hidden !important; display:none !important; max-height: 
0; width: 0; line-height: 0; mso-hide: all;">


Note in your example the display:none is in a contained tag and not in an 
opening tag of a span. The tag is probably fairly long because the URL is 
probably huge, but it is still the one item that is hidden.


I put in a local rawbody rule for
   m'.{100,}(?:$|)'is
and so far I haven't gotten any hits on ham.

Of course that is a pretty heavy rule, but it would seem to indicate that 
hidden spans may not be that common in ham.




Re: More undetected hidden test spam signs


On Thu, 17 Dec 2020, John Hardin wrote:


On Thu, 17 Dec 2020, @lbutlr wrote:


On 16 Dec 2020, at 23:21, Loren Wilton  wrote:

I just got a batch of spams containing




Interesting. I remember in the early days of html spam there were various 
rules to tag messages as spam when they had content that did not display. 
(Possibly pre-SpamAssasin or at least pre my use of SpamAssasin).


Such rules are there. Unfortunately, for whatever reason, lots of ham uses 
"invisible" text so it's not useful as a spam sign by itself and it's hard to 
come up with any useful combination rules.


I think I may have figured it out - tracking images. Like:



The src link gets visited to retrieve the image so the message is tracked, 
but the display of the image is suppressed.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 3 days until Christmas


Re: More undetected hidden test spam signs

On Thu, 17 Dec 2020 08:58:07 -0800 (PST)
John Hardin wrote:

> On Thu, 17 Dec 2020, @lbutlr wrote:
> 
> > On 16 Dec 2020, at 23:21, Loren Wilton 
> > wrote:  
> >> I just got a batch of spams containing
> >>
> >>   
> >
> > ... various rules to tag messages as spam when they had content that
> > did not display.
>
> Such rules are there. Unfortunately, for whatever reason, lots of ham
> uses "invisible" text so it's not useful as a spam sign by itself and
> it's hard to come up with any useful combination rules.

The trouble with this kind of thing is that you can make anything look
marginally useful with the right meta rule - even something like
__RCVD_ON_MONDAY. 

rawbody rules are relatively expensive, if they don't show some kind of
initial promise, they aren't worth pursuing IMO. 

> Perhaps this would be useful if it hits bayes but not hard enough to
> push it over the threshold:
> 
>meta   INVIS_TEXT_BAYES   __STY_INVIS && (BAYES_80 || BAYES_95 ||
> BAYES_99 || BAYES_999)

__STY_INVIS has an S/O of 0.122 in QA hitting 6.4% of ham. In my corpus
the semi-colon doesn't make much difference to the historic numbers.
Unless __STY_INVIS is dominating spam I wouldn't do the above. If it
works it's most likely a sign that Bayes itself is underscored. 

Strangely the S/O is even worst for __STY_INVIS_MANY (__STY_INVIS > 5)



Re: More undetected hidden test spam signs

On 17 Dec 2020, at 09:58, John Hardin  wrote:
> Such rules are there. Unfortunately, for whatever reason, lots of ham uses 
> "invisible" text so it's not useful as a spam sign by itself and it's hard to 
> come up with any useful combination rules.

In the "Archive" folder on my work email there are 76,200 emails and 113,566 
incidents of the string "display:\s*none". Who knew?

One archived email I noticed had 24 occurrences of the string, about a third of 
them followed by "!important".

I used to have a dehtmlizer tool that stripped the HTML down to bare text and 
links by piping the html mime part pf the messages through lynx --dump, but 
that proved to be problematic in its own way and I haven't gotten pipes working 
with sieve anyway.ZZ


-- 
I AM ZOMBOR! (kelly) ZOMBOR!



Re: More undetected hidden test spam signs


On Thu, 17 Dec 2020, @lbutlr wrote:


On 16 Dec 2020, at 23:21, Loren Wilton  wrote:

I just got a batch of spams containing




Interesting. I remember in the early days of html spam there were various rules 
to tag messages as spam when they had content that did not display. (Possibly 
pre-SpamAssasin or at least pre my use of SpamAssasin).


Such rules are there. Unfortunately, for whatever reason, lots of ham uses 
"invisible" text so it's not useful as a spam sign by itself and it's hard 
to come up with any useful combination rules.


  https://ruleqa.spamassassin.org/?rule=%2Fsty_invis

Perhaps this would be useful if it hits bayes but not hard enough to push 
it over the threshold:


  meta   INVIS_TEXT_BAYES   __STY_INVIS && (BAYES_80 || BAYES_95 || BAYES_99 || 
BAYES_999)


N.B.: I just fixed a minor error in __STY_INVIS that made it fail to see 
that specific form of "invisible text".


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 8 days until Christmas


Re: More undetected hidden test spam signs

On Wed, 16 Dec 2020 22:21:12 -0800
Loren Wilton wrote:

> I just got a batch of spams containing
> 
> 
> 
> That was followed by about 2K bytes of garbage containing GUIDs and
> links to putatively some youtube video. The span was then terminated
> correctly, the body of the spam, and then the same garbage for about
> another 2KB.
> 
> The small font rules didn't seem to catch this.

There is an existing sub-rule that just misses this:

 rawbody   __STY_INVIS
 
/\bstyle\s*=\s*"[^">]{0,80}(?:visibility\s*:\s*hidden\s*;|display\s*:\s*none\s*;)/i

It's looking for a ";" after the "none".


Re: More undetected hidden test spam signs

On 16 Dec 2020, at 23:21, Loren Wilton  wrote:
> I just got a batch of spams containing
> 
> 

Interesting. I remember in the early days of html spam there were various rules 
to tag messages as spam when they had content that did not display. (Possibly 
pre-SpamAssasin or at least pre my use of SpamAssasin).

-- 
>You are forgetting something: the Nazgul are immune to non-magical
>weapons.
>
"Any sufficiently advanced technology is indistinguishable from magic."