Re: Interesting rule combo results

2016-03-09 Thread Ian Zimmerman
On 2016-03-09 07:12 -0800, Marc Perkel wrote:

> >>HAM RULES:
> >>...
> >>   80056 HTML_MESSAGE
> >
> >What's happening here? This seems to imply that  HTML_MESSAGE only
> >appears in ham.
> >
> >
> 
> I think my results are a little strange in that I might not be
> training off all the data but just that which gets past all my other
> filters. I'm still working on this but thought I'd share what it came
> up with for better or worse.

If I take your explanation in the OP verbatim, what happens here is that
HTML_MESSAGE _without any other rule hits_ only appears in ham.  Which
seems entirely plausible, even if perhaps not very useful.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Rule 420: All persons more than eight miles high to leave the court.


Re: Interesting rule combo results

2016-03-09 Thread Marc Perkel



On 03/09/16 07:33, Dave Funk wrote:

On Tue, 8 Mar 2016, Marc Perkel wrote:


This is the for what it's worth department.

I've generated the following rules combination lists.

The ham list are rule combinations  sorted by the number of ham hits 
that have 0 spam hits.
The spam list are rule combinations  sorted by the number of spam 
hits that have 0 ham hits.


There are some of my personal rules mixed in.

Just posting this just to see if anyone sees any value in this.

SPAM RULES:

11648 HTML_MESSAGE RAZOR2_CF_RANGE_51_100 SUBJ_GROUP
11308 HTML_MESSAGE RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
11212 RAZOR2_CF_RANGE_51_100 RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
10749 RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK SUBJ_GROUP
10646 RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK SUBJ_GROUP
 5042 DKIM_VALID MIME_HTML_ONLY MISSING_DATE
 5024 DKIM_VALID_AU MIME_HTML_ONLY MISSING_DATE

[snip..]


HAM RULES:

   132983 DKIM_SIGNED MAILTO_LINK RDNS_DYNAMIC
   132558 DKIM_VALID MAILTO_LINK RDNS_DYNAMIC
   131916 DKIM_VALID_AU MAILTO_LINK RDNS_DYNAMIC

[snip..]

80056 HTML_MESSAGE
78472 DKIM_SIGNED MAILTO_LINK UNPARSEABLE_RELAY
77994 DKIM_VALID MAILTO_LINK UNPARSEABLE_RELAY
77635 DKIM_VALID_AU MAILTO_LINK UNPARSEABLE_RELAY
76959 HTML_MESSAGE RDNS_DYNAMIC UNPARSEABLE_RELAY
72949 MAILTO_LINK RDNS_DYNAMIC UNPARSEABLE_RELAY
59189 DKIM_SIGNED
56792 DKIM_VALID

[snip..]

Marc,

Maybe I'm misunderstanding your list but it looks like you've got 
HTML_MESSAGE by itself in the HAM RULES (IE zero spam hits on 
HTML_MESSAGE)
but you've also got a rule combo of HTML_MESSAGE 
RAZOR2_CF_RANGE_51_100 SUBJ_GROUP
as the top SPAM RULES (which implies that there is SPAM that hits 
HTML_MESSAGE too).


Similar situation for DKIM_SIGNED & DKIM_VALID

Also how can you have 132983 hits on the combo of DKIM_SIGNED 
MAILTO_LINK RDNS_DYNAMIC

but only 59189 hits on DKIM_SIGNED by itself?



That's a valid observation. In the learner I'm working on I'm 
experimenting with and interesting forgetter that wipes out and restarts 
some of the keys. Part of the process of getting rid of bad data takes 
some good data with it and usually the good data recovers over time. 
This is still very experimental. I'm just applying my new filter to just 
the rule names coming out of SA and completely ignoring the scoring or 
even if it's a spam or ham rule. I just wanted to see what the result 
would be. To see if I can generate SA rules from my data.


So far - crude at best.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Interesting rule combo results

2016-03-09 Thread Dave Funk

On Tue, 8 Mar 2016, Marc Perkel wrote:


This is the for what it's worth department.

I've generated the following rules combination lists.

The ham list are rule combinations  sorted by the number of ham hits that 
have 0 spam hits.
The spam list are rule combinations  sorted by the number of spam hits that 
have 0 ham hits.


There are some of my personal rules mixed in.

Just posting this just to see if anyone sees any value in this.

SPAM RULES:

11648 HTML_MESSAGE RAZOR2_CF_RANGE_51_100 SUBJ_GROUP
11308 HTML_MESSAGE RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
11212 RAZOR2_CF_RANGE_51_100 RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
10749 RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK SUBJ_GROUP
10646 RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK SUBJ_GROUP
 5042 DKIM_VALID MIME_HTML_ONLY MISSING_DATE
 5024 DKIM_VALID_AU MIME_HTML_ONLY MISSING_DATE

[snip..]


HAM RULES:

   132983 DKIM_SIGNED MAILTO_LINK RDNS_DYNAMIC
   132558 DKIM_VALID MAILTO_LINK RDNS_DYNAMIC
   131916 DKIM_VALID_AU MAILTO_LINK RDNS_DYNAMIC

[snip..]

80056 HTML_MESSAGE
78472 DKIM_SIGNED MAILTO_LINK UNPARSEABLE_RELAY
77994 DKIM_VALID MAILTO_LINK UNPARSEABLE_RELAY
77635 DKIM_VALID_AU MAILTO_LINK UNPARSEABLE_RELAY
76959 HTML_MESSAGE RDNS_DYNAMIC UNPARSEABLE_RELAY
72949 MAILTO_LINK RDNS_DYNAMIC UNPARSEABLE_RELAY
59189 DKIM_SIGNED
56792 DKIM_VALID

[snip..]

Marc,

Maybe I'm misunderstanding your list but it looks like you've got 
HTML_MESSAGE by itself in the HAM RULES (IE zero spam hits on HTML_MESSAGE)

but you've also got a rule combo of HTML_MESSAGE RAZOR2_CF_RANGE_51_100 
SUBJ_GROUP
as the top SPAM RULES (which implies that there is SPAM that hits HTML_MESSAGE 
too).

Similar situation for DKIM_SIGNED & DKIM_VALID

Also how can you have 132983 hits on the combo of DKIM_SIGNED MAILTO_LINK 
RDNS_DYNAMIC
but only 59189 hits on DKIM_SIGNED by itself?

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Interesting rule combo results

2016-03-09 Thread Marc Perkel



On 03/09/16 06:45, RW wrote:

On Tue, 8 Mar 2016 22:25:09 -0800
Marc Perkel wrote:


This is the for what it's worth department.

I've generated the following rules combination lists.

The ham list are rule combinations  sorted by the number of ham hits
that have 0 spam hits.
The spam list are rule combinations  sorted by the number of spam
hits that have 0 ham hits.
...
...
HAM RULES:
...
   80056 HTML_MESSAGE


What's happening here? This seems to imply that  HTML_MESSAGE only
appears in ham.




I think my results are a little strange in that I might not be training 
off all the data but just that which gets past all my other filters. I'm 
still working on this but thought I'd share what it came up with for 
better or worse.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: Interesting rule combo results

2016-03-09 Thread RW
On Tue, 8 Mar 2016 22:25:09 -0800
Marc Perkel wrote:

> This is the for what it's worth department.
> 
> I've generated the following rules combination lists.
> 
> The ham list are rule combinations  sorted by the number of ham hits 
> that have 0 spam hits.
> The spam list are rule combinations  sorted by the number of spam
> hits that have 0 ham hits.
>...
> ...
> HAM RULES:
>... 
>   80056 HTML_MESSAGE


What's happening here? This seems to imply that  HTML_MESSAGE only
appears in ham.


Re: Interesting rule combo results

2016-03-08 Thread Matthias Leisi

> I've generated the following rules combination lists.
> 
> The ham list are rule combinations  sorted by the number of ham hits that 
> have 0 spam hits.
> The spam list are rule combinations  sorted by the number of spam hits that 
> have 0 ham hits.

You’re sort of reinventing wheels. See 
https://wiki.apache.org/spamassassin/HitFrequencies 
, especially the section 
about „overlap“

— Matthias



Interesting rule combo results

2016-03-08 Thread Marc Perkel

This is the for what it's worth department.

I've generated the following rules combination lists.

The ham list are rule combinations  sorted by the number of ham hits 
that have 0 spam hits.
The spam list are rule combinations  sorted by the number of spam hits 
that have 0 ham hits.


There are some of my personal rules mixed in.

Just posting this just to see if anyone sees any value in this.

SPAM RULES:

 11648 HTML_MESSAGE RAZOR2_CF_RANGE_51_100 SUBJ_GROUP
 11308 HTML_MESSAGE RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
 11212 RAZOR2_CF_RANGE_51_100 RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
 10749 RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK SUBJ_GROUP
 10646 RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK SUBJ_GROUP
  5042 DKIM_VALID MIME_HTML_ONLY MISSING_DATE
  5024 DKIM_VALID_AU MIME_HTML_ONLY MISSING_DATE
  4160 DKIM_SIGNED DKIM_VALID_AU RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK
  4154 DKIM_VALID DKIM_VALID_AU RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK
  4153 MIME_HTML_ONLY VACATION_SCAM
  4042 DKIM_SIGNED DKIM_VALID_AU RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK
  4038 DKIM_VALID DKIM_VALID_AU RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK
  3929 DKIM_VALID_AU RAZOR2_CF_RANGE_51_100 
RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK

  3087 DKIM_SIGNED PYZOR_CHECK RDNS_DYNAMIC
  3079 DKIM_VALID PYZOR_CHECK RDNS_DYNAMIC
  3054 DKIM_VALID_AU PYZOR_CHECK RDNS_DYNAMIC
  2922 DKIM_VALID_AU HTML_IMAGE_ONLY_24 MIME_HTML_ONLY
  2860 DKIM_VALID_AU LOTS_OF_MONEY RAZOR2_CF_RANGE_51_100
  2822 HTML_IMAGE_RATIO_02 HTML_MESSAGE NIXSPAM_IXHASH
  2802 DKIM_VALID_AU LOTS_OF_MONEY RAZOR2_CF_RANGE_E8_51_100
  2679 DKIM_VALID_AU LOTS_OF_MONEY RAZOR2_CHECK
  2633 DKIM_VALID_AU US_DOLLARS_3
  2596 HK_RANDOM_FROM HTML_MIME_NO_HTML_TAG
  2536 HK_RANDOM_FROM HTML_MESSAGE HTML_MIME_NO_HTML_TAG
  2517 HK_RANDOM_FROM HTML_MIME_NO_HTML_TAG MIME_HTML_ONLY
  2469 HK_RANDOM_ENVFROM HK_RANDOM_FROM HTML_MIME_NO_HTML_TAG
  2394 DKIM_SIGNED MIME_HTML_ONLY NIXSPAM_IXHASH
  2348 HK_RANDOM_ENVFROM MISSING_DATE
  2345 HK_RANDOM_FROM MISSING_DATE
  2322 DKIM_VALID MIME_HTML_ONLY NIXSPAM_IXHASH
  2310 HTML_IMAGE_RATIO_02 HTML_MESSAGE MIME_HTML_ONLY PYZOR_CHECK
  2297 HK_RANDOM_ENVFROM HK_RANDOM_FROM MISSING_DATE
  2254 DKIM_VALID_AU MIME_HTML_ONLY NIXSPAM_IXHASH
  2048 DKIM_SIGNED HTML_MESSAGE MIME_HTML_ONLY NIXSPAM_IXHASH
  2004 HTML_IMAGE_RATIO_02 MIME_HTML_ONLY NIXSPAM_IXHASH
  1762 HTML_TAG_BALANCE_BODY MIME_HTML_ONLY PYZOR_CHECK
  1621 GENERIC_IXHASH UNPARSEABLE_RELAY
  1565 DKIM_VALID_AU HK_RANDOM_ENVFROM HK_RANDOM_FROM
  1549 DKIM_VALID MIME_HTML_ONLY RAZOR2_CF_RANGE_51_100
  1484 DKIM_VALID MIME_HTML_ONLY RAZOR2_CF_RANGE_E8_51_100
  1421 HK_RANDOM_FROM HTML_MESSAGE MIME_HTML_ONLY
  1383 DKIM_SIGNED PYZOR_CHECK RAZOR2_CF_RANGE_51_100
  1370 HK_RANDOM_ENVFROM HK_RANDOM_FROM MIME_HTML_ONLY
  1369 HK_RANDOM_ENVFROM HTML_MIME_NO_HTML_TAG
  1356 DKIM_VALID TVD_PH_SEC
  1343 DKIM_VALID_AU TVD_PH_SEC
  1326 DKIM_VALID_AU MIME_HTML_ONLY RAZOR2_CF_RANGE_51_100
  1323 HTML_IMAGE_RATIO_04 MIME_HTML_ONLY PYZOR_CHECK
  1320 HK_RANDOM_ENVFROM HTML_MESSAGE HTML_MIME_NO_HTML_TAG
  1309 HK_RANDOM_ENVFROM HTML_MIME_NO_HTML_TAG MIME_HTML_ONLY
  1298 DKIM_SIGNED PYZOR_CHECK RAZOR2_CF_RANGE_E8_51_100
  1266 DKIM_VALID_AU MIME_HTML_ONLY RAZOR2_CF_RANGE_E8_51_100
  1263 HTML_IMAGE_RATIO_02 NIXSPAM_IXHASH PYZOR_CHECK
  1248 MIME_HTML_ONLY NIXSPAM_IXHASH PYZOR_CHECK
  1238 GENERIC_SENDERHASH HTML_IMAGE_ONLY_24
  1229 DKIM_VALID GENERIC_SENDERHASH RAZOR2_CF_RANGE_51_100
  1169 DKIM_VALID_AU GENERIC_SENDERHASH RAZOR2_CF_RANGE_51_100
  1169 DKIM_SIGNED HTML_IMAGE_ONLY_24 RAZOR2_CF_RANGE_51_100
  1153 DKIM_SIGNED HTML_IMAGE_ONLY_24 RAZOR2_CF_RANGE_E8_51_100
  1125 DKIM_VALID_AU MIME_HTML_ONLY RAZOR2_CHECK
  1124 GENERIC_SENDERHASH HTML_IMAGE_ONLY_24 HTML_MESSAGE
   DKIM_SIGNED DKIM_VALID_AU US_DOLLARS_3
  1109 DKIM_VALID DKIM_VALID_AU US_DOLLARS_3
  1103 GENERIC_SENDERHASH HTML_MESSAGE HTML_TAG_BALANCE_HEAD
  1071 DKIM_SIGNED GENERIC_SENDERHASH HTML_IMAGE_ONLY_24
  1049 DKIM_VALID_AU LOTS_OF_MONEY US_DOLLARS_3
  1027 DKIM_VALID_AU GENERIC_SENDERHASH HTML_IMAGE_ONLY_32
  1016 DKIM_VALID PYZOR_CHECK RAZOR2_CF_RANGE_51_100
  1010 HTML_IMAGE_RATIO_02 HTML_MESSAGE MAILTO_LINK PYZOR_CHECK
  1001 DKIM_VALID FROM_EXCESS_BASE64 HTML_IMAGE_RATIO_02

HAM RULES:

132983 DKIM_SIGNED MAILTO_LINK RDNS_DYNAMIC
132558 DKIM_VALID MAILTO_LINK RDNS_DYNAMIC
131916 DKIM_VALID_AU MAILTO_LINK RDNS_DYNAMIC
 87477 DKIM_VALID_AU UNPARSEABLE_RELAY
 84371 DKIM_SIGNED DKIM_VALID_AU UNPARSEABLE_RELAY
 84302 DKIM_SIGNED HTML_MESSAGE UNPARSEABLE_RELAY
 84223 DKIM_VALID DKIM_VALID_AU UNPARSEABLE_RELAY
 83594 DKIM_VALID HTML_MESSAGE UNPARSEABLE_RELAY
 82729 DKIM_VALID_AU HTML_MESSAGE UNPARSEABLE_RELAY
 80056 HTML_MESSAGE
 78472 DKIM_S