Re: JM_SOUGHT_FRAUD

2010-08-12 Thread Bowie Bailey
 On 8/11/2010 7:15 PM, John Hardin wrote:
 On Wed, 11 Aug 2010, Bowie Bailey wrote:

 In case anyone else is following this...

 The sa-update process made things a bit more complex than simply
 renaming the file after updates.  If that's all you do, then
 sa-update loses track of the file and will download a new copy on
 every run. What I had to do is this:

 1) Rename the .cf file back to the original name so sa-update can
 find it
 2) Run sa-update
 3) Rename the .cf file to z_sought_rules_yerp_org.cf

 How about symlinking z_sought_rules_yerp_org.cf to
 sought_rules_yerp_org.cf? That might spare you performing the filename
 shuffle every time you want to update...

I thought about that, but didn't like the side-effect of loading the
whole rule set a third time.  The filename shuffle is in my update
script now, so it just happens.  I don't see a couple of mv operations
causing a problem in this instance.

-- 
Bowie


Re: JM_SOUGHT_FRAUD

2010-08-11 Thread Karsten Bräckelmann
On Wed, 2010-08-11 at 10:59 -0400, Bowie Bailey wrote:
 I was looking through some of the spam rules, and I noticed that the
 JM_SOUGHT_FRAUD rules are included in the main SA updates channel for SA
 3.3.1, but the scores are all 0.  Is there a reason for this?

Yes, an explicit request by Justin to zero them out, specifically aiming
at the rescore runs to prevent biasing the scores too much, AFAIK.

The Sought Fraud rules are expected to be enabled locally -- that is,
assign them a proper score in the site config cf files.


In other words: To my knowledge and experience, they are rather safe to
use. They have *not* been zeroed out due to bad performance.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: JM_SOUGHT_FRAUD

2010-08-11 Thread Bowie Bailey
 On 8/11/2010 11:46 AM, Karsten Bräckelmann wrote:
 On Wed, 2010-08-11 at 10:59 -0400, Bowie Bailey wrote:
 I was looking through some of the spam rules, and I noticed that the
 JM_SOUGHT_FRAUD rules are included in the main SA updates channel for SA
 3.3.1, but the scores are all 0.  Is there a reason for this?
 Yes, an explicit request by Justin to zero them out, specifically aiming
 at the rescore runs to prevent biasing the scores too much, AFAIK.

 The Sought Fraud rules are expected to be enabled locally -- that is,
 assign them a proper score in the site config cf files.


 In other words: To my knowledge and experience, they are rather safe to
 use. They have *not* been zeroed out due to bad performance.

I thought I had enabled them by using the sought.rules.yerp.org
sa-update channel, but apparently not, since that gets overridden by the
copy in the main updates channel.

Are the rules in the main updates channel being updated as often as the
ones in the sought channel?  A quick comparison shows that the two sets
of rules on my system are different.  I thought the sought rules were
highly dynamic, so I'm surprised to see one of them show up in the main
channel.

-- 
Bowie


Re: JM_SOUGHT_FRAUD

2010-08-11 Thread Karsten Bräckelmann
On Wed, 2010-08-11 at 11:57 -0400, Bowie Bailey wrote:
 On 8/11/2010 11:46 AM, Karsten Bräckelmann wrote:
  On Wed, 2010-08-11 at 10:59 -0400, Bowie Bailey wrote:
   I was looking through some of the spam rules, and I noticed that the
   JM_SOUGHT_FRAUD rules are included in the main SA updates channel for SA
   3.3.1, but the scores are all 0.  Is there a reason for this?
  
  Yes, an explicit request by Justin to zero them out, specifically aiming
  at the rescore runs to prevent biasing the scores too much, AFAIK.
 
  The Sought Fraud rules are expected to be enabled locally -- that is,
  assign them a proper score in the site config cf files.
 
 
  In other words: To my knowledge and experience, they are rather safe to
  use. They have *not* been zeroed out due to bad performance.
 
 I thought I had enabled them by using the sought.rules.yerp.org
 sa-update channel, but apparently not, since that gets overridden by the
 copy in the main updates channel.

Uhm, yeah -- IIRC that's alphabetical order, and stock *u*pdates gets
parsed after *s*ought channel.


 Are the rules in the main updates channel being updated as often as the
 ones in the sought channel?  A quick comparison shows that the two sets
 of rules on my system are different.  I thought the sought rules were
 highly dynamic, so I'm surprised to see one of them show up in the main
 channel.

I think they are updated frequently, also in 3.3 stock. Details escape
me right now. Justin?

Update frequency of the dedicated sought channel has been changing a few
times in the past. Currently it's more than once a day for quite some
time.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: JM_SOUGHT_FRAUD

2010-08-11 Thread Bowie Bailey
 On 8/11/2010 12:17 PM, Karsten Bräckelmann wrote:
 On Wed, 2010-08-11 at 11:57 -0400, Bowie Bailey wrote:
 On 8/11/2010 11:46 AM, Karsten Bräckelmann wrote:
 On Wed, 2010-08-11 at 10:59 -0400, Bowie Bailey wrote:
 I was looking through some of the spam rules, and I noticed that the
 JM_SOUGHT_FRAUD rules are included in the main SA updates channel for SA
 3.3.1, but the scores are all 0.  Is there a reason for this?
 Yes, an explicit request by Justin to zero them out, specifically aiming
 at the rescore runs to prevent biasing the scores too much, AFAIK.

 The Sought Fraud rules are expected to be enabled locally -- that is,
 assign them a proper score in the site config cf files.


 In other words: To my knowledge and experience, they are rather safe to
 use. They have *not* been zeroed out due to bad performance.
 I thought I had enabled them by using the sought.rules.yerp.org
 sa-update channel, but apparently not, since that gets overridden by the
 copy in the main updates channel.
 Uhm, yeah -- IIRC that's alphabetical order, and stock *u*pdates gets
 parsed after *s*ought channel.

I think you're right.  In any case, I confirmed with a debug run that
updates is loaded after sought.  Maybe I should run a script after the
sa-update that renames the file to z_sought... so it'll be loaded last.

 Are the rules in the main updates channel being updated as often as the
 ones in the sought channel?  A quick comparison shows that the two sets
 of rules on my system are different.  I thought the sought rules were
 highly dynamic, so I'm surprised to see one of them show up in the main
 channel.
 I think they are updated frequently, also in 3.3 stock. Details escape
 me right now. Justin?

 Update frequency of the dedicated sought channel has been changing a few
 times in the past. Currently it's more than once a day for quite some
 time.

Right.  And I'm checking for updates several times a day.  If the
updates channel is not keeping up with sought, I need to make sure I am
running the rules from the dedicated channel and not the updates channel
so I have the latest.

-- 
Bowie


Re: JM_SOUGHT_FRAUD

2010-08-11 Thread Bowie Bailey
 On 8/11/2010 3:30 PM, John Hardin wrote:
 On Wed, 11 Aug 2010, Bowie Bailey wrote:

 Right.  And I'm checking for updates several times a day.  If the
 updates channel is not keeping up with sought, I need to make sure I
 am running the rules from the dedicated channel and not the updates
 channel so I have the latest.

 The current situation is: automatic rule updates are only generated
 when the corpa of recent messages used in the nightly masscheck is
 sufficiently large (150k+ of both spam and ham, IIRC), and that's been
 difficult to achieve for a while due to ham starvation. More
 volunteers to perform nightly local masschecks of fresh ham corpora
 and upload the results, or to maintain and upload non-private ham
 corpora to the nightly masscheck server, would be most welcome!

 If you watch 72_scores.cf in SVN you can get an idea of when the last
 automatic update occurred.

 http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/72_scores.cf?view=log


 Manual updates typically only occur for OMG! bugs or when the last
 automatic update gets stale enough to be a problem.

 Moving the sought channel update to z_sought seems like a good idea to
 me, perhaps Justin should do that at the root so that everybody gets
 scores if they subscribe to the channel.

In case anyone else is following this...

The sa-update process made things a bit more complex than simply
renaming the file after updates.  If that's all you do, then sa-update
loses track of the file and will download a new copy on every run.  What
I had to do is this:

1) Rename the .cf file back to the original name so sa-update can find it
2) Run sa-update
3) Rename the .cf file to z_sought_rules_yerp_org.cf
4) Restart spamd

You don't have to mess with the directory, just rename the main
sought_rules_yerp_org.cf file.

-- 
Bowie


Re: JM_SOUGHT_FRAUD

2010-08-11 Thread RW
On Wed, 11 Aug 2010 17:30:40 -0400
Bowie Bailey bowie_bai...@buc.com wrote:

  On 8/11/2010 3:30 PM, John Hardin wrote:
  On Wed, 11 Aug 2010, Bowie Bailey wrote:
 
  Right.  And I'm checking for updates several times a day.  If the
  updates channel is not keeping up with sought, I need to make sure
  I am running the rules from the dedicated channel and not the
  updates channel so I have the latest.
 
  The current situation is: automatic rule updates are only generated
  when the corpa of recent messages used in the nightly masscheck is
  sufficiently large (150k+ of both spam and ham, IIRC), and that's
  been difficult to achieve for a while due to ham starvation. More
  volunteers to perform nightly local masschecks of fresh ham corpora
  and upload the results, or to maintain and upload non-private ham
  corpora to the nightly masscheck server, would be most welcome!
 
  If you watch 72_scores.cf in SVN you can get an idea of when the
  last automatic update occurred.
 
  http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/72_scores.cf?view=log
 
 
  Manual updates typically only occur for OMG! bugs or when the last
  automatic update gets stale enough to be a problem.
 
  Moving the sought channel update to z_sought seems like a good idea
  to me, perhaps Justin should do that at the root so that everybody
  gets scores if they subscribe to the channel.
 
 In case anyone else is following this...
 
 The sa-update process made things a bit more complex than simply
 renaming the file after updates.  If that's all you do, then sa-update
 loses track of the file and will download a new copy on every run.
 What I had to do is this:
 
 1) Rename the .cf file back to the original name so sa-update can
 find it 2) Run sa-update
 3) Rename the .cf file to z_sought_rules_yerp_org.cf
 4) Restart spamd
 
 You don't have to mess with the directory, just rename the main
 sought_rules_yerp_org.cf file.
 

Would it not be simpler just to do something like this

grep -E ^score 
/var/db/spamassassin/*/sought_rules_yerp_org/20_sought_fraud.cf  
/usr/local/etc/mail/spamassassin/sought_fraud_scores.cf


or simply paste the scores into local.cf


Sought dedicated AND stock channel (was: Re: JM_SOUGHT_FRAUD)

2010-08-11 Thread Karsten Bräckelmann
 On Wed, 11 Aug 2010 17:30:40 -0400 Bowie Bailey bowie_bai...@buc.com wrote:
   On 8/11/2010 3:30 PM, John Hardin wrote:

   The current situation is: automatic rule updates are only generated
   when the corpa of recent messages used in the nightly masscheck is
   sufficiently large (150k+ of both spam and ham, IIRC), and that's
   been difficult to achieve for a while due to ham starvation. More
   volunteers to perform nightly local masschecks of fresh ham corpora
   and upload the results, or to maintain and upload non-private ham
   corpora to the nightly masscheck server, would be most welcome!

  In case anyone else is following this...
  
  The sa-update process made things a bit more complex than simply
  renaming the file after updates.  If that's all you do, then sa-update
  loses track of the file and will download a new copy on every run.
  What I had to do is this:
  
  1) Rename the .cf file back to the original name so sa-update can
  find it 2) Run sa-update
  3) Rename the .cf file to z_sought_rules_yerp_org.cf
  4) Restart spamd
  
  You don't have to mess with the directory, just rename the main
  sought_rules_yerp_org.cf file.
 
 Would it not be simpler just to do something like this

Sadly, no.

The problem is, that not only the sub-rules change, but with them the
actually scored meta rules, combining these sub-rules by OR-ing them.

That means, stale meta rules in stock will override the fresh meta
rules, effectively discarding all fresh sub-rule matches.


Bottom line: If the dedicated sought channel is more up-to-date and
updated more frequently, using that one in *addition* to (faster) aging
stock rules currently needs careful hacks -- to preserver the precious
meta.

Or, well, any SA or channel provided method, to tweak the order. SA
method would be a new feature. Channel method would be a rename, to come
last in alphabetical order.

Bummer. :(  This is a real problem and worth filing a bug.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: JM_SOUGHT_FRAUD

2010-08-11 Thread Karsten Bräckelmann
On Wed, 2010-08-11 at 17:30 -0400, Bowie Bailey wrote:
 In case anyone else is following this...
 
 The sa-update process made things a bit more complex than simply
 renaming the file after updates.  If that's all you do, then sa-update
 loses track of the file and will download a new copy on every run.  What
 I had to do is this:
 
 1) Rename the .cf file back to the original name so sa-update can find it
 2) Run sa-update
 3) Rename the .cf file to z_sought_rules_yerp_org.cf
 4) Restart spamd
 
 You don't have to mess with the directory, just rename the main
 sought_rules_yerp_org.cf file.

Hmm, a simple symlink, albeit (once) manually set-up, should do, too.

  ln -s sought_rules_yerp_org.cf zzz.cf

This would parse anything from the more recent dedicated channel, after
the stock rules (and sought itself) have been parsed. Again.

Yes, this overwrites the sought rules' meta definitions a third time.
But that shouldn't be much of a problem, really. Also, any stale, unused
sub-rules (see my other follow-up) will be ignored by SA prior to
compiling the REs.

This should work as a do-it-once-and-forget workaround to the issue,
rather than relying on ghastly file renaming in the sa-updating cron
job. The latter even is prone to race errors, in rare and specific
circumstances.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: JM_SOUGHT_FRAUD

2010-08-11 Thread John Hardin

On Wed, 11 Aug 2010, Bowie Bailey wrote:


In case anyone else is following this...

The sa-update process made things a bit more complex than simply 
renaming the file after updates.  If that's all you do, then sa-update 
loses track of the file and will download a new copy on every run. 
What I had to do is this:


1) Rename the .cf file back to the original name so sa-update can find it
2) Run sa-update
3) Rename the .cf file to z_sought_rules_yerp_org.cf


How about symlinking z_sought_rules_yerp_org.cf to 
sought_rules_yerp_org.cf? That might spare you performing the filename 
shuffle every time you want to update...



4) Restart spamd

You don't have to mess with the directory, just rename the main
sought_rules_yerp_org.cf file.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #20: The faster you finish the fight,
  the less shot you will get.
---
 4 days until the 65th anniversary of the end of World War II


Re: JM_SOUGHT_FRAUD

2010-08-11 Thread John Hardin

On Wed, 11 Aug 2010, RW wrote:


1) Rename the .cf file back to the original name so sa-update can
find it
2) Run sa-update
3) Rename the .cf file to z_sought_rules_yerp_org.cf
4) Restart spamd


Would it not be simpler just to do something like this

grep -E ^score 
/var/db/spamassassin/*/sought_rules_yerp_org/20_sought_fraud.cf  
/usr/local/etc/mail/spamassassin/sought_fraud_scores.cf


Heh. Adding that to your sa-update cron job is a pretty good way too.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #20: The faster you finish the fight,
  the less shot you will get.
---
 4 days until the 65th anniversary of the end of World War II


Re: JM_SOUGHT_FRAUD

2010-08-11 Thread John Hardin

On Thu, 12 Aug 2010, Karsten Br?ckelmann wrote:


On Wed, 2010-08-11 at 17:30 -0400, Bowie Bailey wrote:

In case anyone else is following this...

The sa-update process made things a bit more complex than simply
renaming the file after updates.  If that's all you do, then sa-update
loses track of the file and will download a new copy on every run.  What
I had to do is this:

1) Rename the .cf file back to the original name so sa-update can find it
2) Run sa-update
3) Rename the .cf file to z_sought_rules_yerp_org.cf
4) Restart spamd

You don't have to mess with the directory, just rename the main
sought_rules_yerp_org.cf file.


Hmm, a simple symlink, albeit (once) manually set-up, should do, too.

 ln -s sought_rules_yerp_org.cf zzz.cf


That's what I get for responding as I read a backlog rather than waiting 
until I've read everything...


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #20: The faster you finish the fight,
  the less shot you will get.
---
 4 days until the 65th anniversary of the end of World War II