Re: More undetected hidden test spam signs
Right, but __STY_INVIS is currently tag-blind (it only looks for the style="" clause), so it hits that, and if lots of ham is hiding tracking images that way that might explain the poor S/O. I suspect that might be the case. The vast majority of invisible garbage I see is hidden in a ... pair, typically two per spam and about 50K in each one. Looking at the definition of the
Re: More undetected hidden test spam signs
On Tue, 22 Dec 2020, Loren Wilton wrote: On 16 Dec 2020, at 23:21, Loren Wilton wrote: I just got a batch of spams containing Such rules are there. Unfortunately, for whatever reason, lots of ham uses "invisible" text so it's not useful as a spam sign by itself and it's hard to come up with any useful combination rules. I think I may have figured it out - tracking images. Like: style="visibility: hidden !important; display:none !important; max-height: 0; width: 0; line-height: 0; mso-hide: all;"> Note in your example the display:none is in a contained tag and not in an opening tag of a span. The tag is probably fairly long because the URL is probably huge, but it is still the one item that is hidden. Right, but __STY_INVIS is currently tag-blind (it only looks for the style="" clause), so it hits that, and if lots of ham is hiding tracking images that way that might explain the poor S/O. I put in a local rawbody rule for m'.{100,}(?:$|)'is and so far I haven't gotten any hits on ham. How much spam hits that very simple case? I had a __SPAN_INVIS rule (currently commented out) but IIRC it also had poor S/O. It wasn't as simple as yours, though - perhaps I'm allowing for too many syntactically-valid cases to try to avoid trivial avoidance by spam? Of course that is a pretty heavy rule It would be lighter if you didn't look for the tag closing. Is there a reason you care about the closing for that? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never does quite what I want. I wish Christopher Robin was here." -- Peter da Silva in a.s.r --- 3 days until Christmas
Re: More undetected hidden test spam signs
On 16 Dec 2020, at 23:21, Loren Wilton wrote: I just got a batch of spams containing Such rules are there. Unfortunately, for whatever reason, lots of ham uses "invisible" text so it's not useful as a spam sign by itself and it's hard to come up with any useful combination rules. I think I may have figured it out - tracking images. Like: style="visibility: hidden !important; display:none !important; max-height: 0; width: 0; line-height: 0; mso-hide: all;"> Note in your example the display:none is in a contained tag and not in an opening tag of a span. The tag is probably fairly long because the URL is probably huge, but it is still the one item that is hidden. I put in a local rawbody rule for m'.{100,}(?:$|)'is and so far I haven't gotten any hits on ham. Of course that is a pretty heavy rule, but it would seem to indicate that hidden spans may not be that common in ham.
Re: More undetected hidden test spam signs
On Thu, 17 Dec 2020, John Hardin wrote: On Thu, 17 Dec 2020, @lbutlr wrote: On 16 Dec 2020, at 23:21, Loren Wilton wrote: I just got a batch of spams containing Interesting. I remember in the early days of html spam there were various rules to tag messages as spam when they had content that did not display. (Possibly pre-SpamAssasin or at least pre my use of SpamAssasin). Such rules are there. Unfortunately, for whatever reason, lots of ham uses "invisible" text so it's not useful as a spam sign by itself and it's hard to come up with any useful combination rules. I think I may have figured it out - tracking images. Like: The src link gets visited to retrieve the image so the message is tracked, but the display of the image is suppressed. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never does quite what I want. I wish Christopher Robin was here." -- Peter da Silva in a.s.r --- 3 days until Christmas
Re: More undetected hidden test spam signs
On Thu, 17 Dec 2020 08:58:07 -0800 (PST) John Hardin wrote: > On Thu, 17 Dec 2020, @lbutlr wrote: > > > On 16 Dec 2020, at 23:21, Loren Wilton > > wrote: > >> I just got a batch of spams containing > >> > >> > > > > ... various rules to tag messages as spam when they had content that > > did not display. > > Such rules are there. Unfortunately, for whatever reason, lots of ham > uses "invisible" text so it's not useful as a spam sign by itself and > it's hard to come up with any useful combination rules. The trouble with this kind of thing is that you can make anything look marginally useful with the right meta rule - even something like __RCVD_ON_MONDAY. rawbody rules are relatively expensive, if they don't show some kind of initial promise, they aren't worth pursuing IMO. > Perhaps this would be useful if it hits bayes but not hard enough to > push it over the threshold: > >meta INVIS_TEXT_BAYES __STY_INVIS && (BAYES_80 || BAYES_95 || > BAYES_99 || BAYES_999) __STY_INVIS has an S/O of 0.122 in QA hitting 6.4% of ham. In my corpus the semi-colon doesn't make much difference to the historic numbers. Unless __STY_INVIS is dominating spam I wouldn't do the above. If it works it's most likely a sign that Bayes itself is underscored. Strangely the S/O is even worst for __STY_INVIS_MANY (__STY_INVIS > 5)
Re: More undetected hidden test spam signs
On 17 Dec 2020, at 09:58, John Hardin wrote: > Such rules are there. Unfortunately, for whatever reason, lots of ham uses > "invisible" text so it's not useful as a spam sign by itself and it's hard to > come up with any useful combination rules. In the "Archive" folder on my work email there are 76,200 emails and 113,566 incidents of the string "display:\s*none". Who knew? One archived email I noticed had 24 occurrences of the string, about a third of them followed by "!important". I used to have a dehtmlizer tool that stripped the HTML down to bare text and links by piping the html mime part pf the messages through lynx --dump, but that proved to be problematic in its own way and I haven't gotten pipes working with sieve anyway.ZZ -- I AM ZOMBOR! (kelly) ZOMBOR!
Re: More undetected hidden test spam signs
On Thu, 17 Dec 2020, @lbutlr wrote: On 16 Dec 2020, at 23:21, Loren Wilton wrote: I just got a batch of spams containing Interesting. I remember in the early days of html spam there were various rules to tag messages as spam when they had content that did not display. (Possibly pre-SpamAssasin or at least pre my use of SpamAssasin). Such rules are there. Unfortunately, for whatever reason, lots of ham uses "invisible" text so it's not useful as a spam sign by itself and it's hard to come up with any useful combination rules. https://ruleqa.spamassassin.org/?rule=%2Fsty_invis Perhaps this would be useful if it hits bayes but not hard enough to push it over the threshold: meta INVIS_TEXT_BAYES __STY_INVIS && (BAYES_80 || BAYES_95 || BAYES_99 || BAYES_999) N.B.: I just fixed a minor error in __STY_INVIS that made it fail to see that specific form of "invisible text". -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never does quite what I want. I wish Christopher Robin was here." -- Peter da Silva in a.s.r --- 8 days until Christmas
Re: More undetected hidden test spam signs
On Wed, 16 Dec 2020 22:21:12 -0800 Loren Wilton wrote: > I just got a batch of spams containing > > > > That was followed by about 2K bytes of garbage containing GUIDs and > links to putatively some youtube video. The span was then terminated > correctly, the body of the spam, and then the same garbage for about > another 2KB. > > The small font rules didn't seem to catch this. There is an existing sub-rule that just misses this: rawbody __STY_INVIS /\bstyle\s*=\s*"[^">]{0,80}(?:visibility\s*:\s*hidden\s*;|display\s*:\s*none\s*;)/i It's looking for a ";" after the "none".
Re: More undetected hidden test spam signs
On 16 Dec 2020, at 23:21, Loren Wilton wrote: > I just got a batch of spams containing > > Interesting. I remember in the early days of html spam there were various rules to tag messages as spam when they had content that did not display. (Possibly pre-SpamAssasin or at least pre my use of SpamAssasin). -- >You are forgetting something: the Nazgul are immune to non-magical >weapons. > "Any sufficiently advanced technology is indistinguishable from magic."