Re: JM_SOUGHT_FRAUD
On 8/11/2010 7:15 PM, John Hardin wrote: On Wed, 11 Aug 2010, Bowie Bailey wrote: In case anyone else is following this... The sa-update process made things a bit more complex than simply renaming the file after updates. If that's all you do, then sa-update loses track of the file and will download a new copy on every run. What I had to do is this: 1) Rename the .cf file back to the original name so sa-update can find it 2) Run sa-update 3) Rename the .cf file to z_sought_rules_yerp_org.cf How about symlinking z_sought_rules_yerp_org.cf to sought_rules_yerp_org.cf? That might spare you performing the filename shuffle every time you want to update... I thought about that, but didn't like the side-effect of loading the whole rule set a third time. The filename shuffle is in my update script now, so it just happens. I don't see a couple of mv operations causing a problem in this instance. -- Bowie
Re: JM_SOUGHT_FRAUD
On Wed, 2010-08-11 at 10:59 -0400, Bowie Bailey wrote: I was looking through some of the spam rules, and I noticed that the JM_SOUGHT_FRAUD rules are included in the main SA updates channel for SA 3.3.1, but the scores are all 0. Is there a reason for this? Yes, an explicit request by Justin to zero them out, specifically aiming at the rescore runs to prevent biasing the scores too much, AFAIK. The Sought Fraud rules are expected to be enabled locally -- that is, assign them a proper score in the site config cf files. In other words: To my knowledge and experience, they are rather safe to use. They have *not* been zeroed out due to bad performance. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: JM_SOUGHT_FRAUD
On 8/11/2010 11:46 AM, Karsten Bräckelmann wrote: On Wed, 2010-08-11 at 10:59 -0400, Bowie Bailey wrote: I was looking through some of the spam rules, and I noticed that the JM_SOUGHT_FRAUD rules are included in the main SA updates channel for SA 3.3.1, but the scores are all 0. Is there a reason for this? Yes, an explicit request by Justin to zero them out, specifically aiming at the rescore runs to prevent biasing the scores too much, AFAIK. The Sought Fraud rules are expected to be enabled locally -- that is, assign them a proper score in the site config cf files. In other words: To my knowledge and experience, they are rather safe to use. They have *not* been zeroed out due to bad performance. I thought I had enabled them by using the sought.rules.yerp.org sa-update channel, but apparently not, since that gets overridden by the copy in the main updates channel. Are the rules in the main updates channel being updated as often as the ones in the sought channel? A quick comparison shows that the two sets of rules on my system are different. I thought the sought rules were highly dynamic, so I'm surprised to see one of them show up in the main channel. -- Bowie
Re: JM_SOUGHT_FRAUD
On Wed, 2010-08-11 at 11:57 -0400, Bowie Bailey wrote: On 8/11/2010 11:46 AM, Karsten Bräckelmann wrote: On Wed, 2010-08-11 at 10:59 -0400, Bowie Bailey wrote: I was looking through some of the spam rules, and I noticed that the JM_SOUGHT_FRAUD rules are included in the main SA updates channel for SA 3.3.1, but the scores are all 0. Is there a reason for this? Yes, an explicit request by Justin to zero them out, specifically aiming at the rescore runs to prevent biasing the scores too much, AFAIK. The Sought Fraud rules are expected to be enabled locally -- that is, assign them a proper score in the site config cf files. In other words: To my knowledge and experience, they are rather safe to use. They have *not* been zeroed out due to bad performance. I thought I had enabled them by using the sought.rules.yerp.org sa-update channel, but apparently not, since that gets overridden by the copy in the main updates channel. Uhm, yeah -- IIRC that's alphabetical order, and stock *u*pdates gets parsed after *s*ought channel. Are the rules in the main updates channel being updated as often as the ones in the sought channel? A quick comparison shows that the two sets of rules on my system are different. I thought the sought rules were highly dynamic, so I'm surprised to see one of them show up in the main channel. I think they are updated frequently, also in 3.3 stock. Details escape me right now. Justin? Update frequency of the dedicated sought channel has been changing a few times in the past. Currently it's more than once a day for quite some time. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: JM_SOUGHT_FRAUD
On 8/11/2010 12:17 PM, Karsten Bräckelmann wrote: On Wed, 2010-08-11 at 11:57 -0400, Bowie Bailey wrote: On 8/11/2010 11:46 AM, Karsten Bräckelmann wrote: On Wed, 2010-08-11 at 10:59 -0400, Bowie Bailey wrote: I was looking through some of the spam rules, and I noticed that the JM_SOUGHT_FRAUD rules are included in the main SA updates channel for SA 3.3.1, but the scores are all 0. Is there a reason for this? Yes, an explicit request by Justin to zero them out, specifically aiming at the rescore runs to prevent biasing the scores too much, AFAIK. The Sought Fraud rules are expected to be enabled locally -- that is, assign them a proper score in the site config cf files. In other words: To my knowledge and experience, they are rather safe to use. They have *not* been zeroed out due to bad performance. I thought I had enabled them by using the sought.rules.yerp.org sa-update channel, but apparently not, since that gets overridden by the copy in the main updates channel. Uhm, yeah -- IIRC that's alphabetical order, and stock *u*pdates gets parsed after *s*ought channel. I think you're right. In any case, I confirmed with a debug run that updates is loaded after sought. Maybe I should run a script after the sa-update that renames the file to z_sought... so it'll be loaded last. Are the rules in the main updates channel being updated as often as the ones in the sought channel? A quick comparison shows that the two sets of rules on my system are different. I thought the sought rules were highly dynamic, so I'm surprised to see one of them show up in the main channel. I think they are updated frequently, also in 3.3 stock. Details escape me right now. Justin? Update frequency of the dedicated sought channel has been changing a few times in the past. Currently it's more than once a day for quite some time. Right. And I'm checking for updates several times a day. If the updates channel is not keeping up with sought, I need to make sure I am running the rules from the dedicated channel and not the updates channel so I have the latest. -- Bowie
Re: JM_SOUGHT_FRAUD
On 8/11/2010 3:30 PM, John Hardin wrote: On Wed, 11 Aug 2010, Bowie Bailey wrote: Right. And I'm checking for updates several times a day. If the updates channel is not keeping up with sought, I need to make sure I am running the rules from the dedicated channel and not the updates channel so I have the latest. The current situation is: automatic rule updates are only generated when the corpa of recent messages used in the nightly masscheck is sufficiently large (150k+ of both spam and ham, IIRC), and that's been difficult to achieve for a while due to ham starvation. More volunteers to perform nightly local masschecks of fresh ham corpora and upload the results, or to maintain and upload non-private ham corpora to the nightly masscheck server, would be most welcome! If you watch 72_scores.cf in SVN you can get an idea of when the last automatic update occurred. http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/72_scores.cf?view=log Manual updates typically only occur for OMG! bugs or when the last automatic update gets stale enough to be a problem. Moving the sought channel update to z_sought seems like a good idea to me, perhaps Justin should do that at the root so that everybody gets scores if they subscribe to the channel. In case anyone else is following this... The sa-update process made things a bit more complex than simply renaming the file after updates. If that's all you do, then sa-update loses track of the file and will download a new copy on every run. What I had to do is this: 1) Rename the .cf file back to the original name so sa-update can find it 2) Run sa-update 3) Rename the .cf file to z_sought_rules_yerp_org.cf 4) Restart spamd You don't have to mess with the directory, just rename the main sought_rules_yerp_org.cf file. -- Bowie
Re: JM_SOUGHT_FRAUD
On Wed, 11 Aug 2010 17:30:40 -0400 Bowie Bailey bowie_bai...@buc.com wrote: On 8/11/2010 3:30 PM, John Hardin wrote: On Wed, 11 Aug 2010, Bowie Bailey wrote: Right. And I'm checking for updates several times a day. If the updates channel is not keeping up with sought, I need to make sure I am running the rules from the dedicated channel and not the updates channel so I have the latest. The current situation is: automatic rule updates are only generated when the corpa of recent messages used in the nightly masscheck is sufficiently large (150k+ of both spam and ham, IIRC), and that's been difficult to achieve for a while due to ham starvation. More volunteers to perform nightly local masschecks of fresh ham corpora and upload the results, or to maintain and upload non-private ham corpora to the nightly masscheck server, would be most welcome! If you watch 72_scores.cf in SVN you can get an idea of when the last automatic update occurred. http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/scores/72_scores.cf?view=log Manual updates typically only occur for OMG! bugs or when the last automatic update gets stale enough to be a problem. Moving the sought channel update to z_sought seems like a good idea to me, perhaps Justin should do that at the root so that everybody gets scores if they subscribe to the channel. In case anyone else is following this... The sa-update process made things a bit more complex than simply renaming the file after updates. If that's all you do, then sa-update loses track of the file and will download a new copy on every run. What I had to do is this: 1) Rename the .cf file back to the original name so sa-update can find it 2) Run sa-update 3) Rename the .cf file to z_sought_rules_yerp_org.cf 4) Restart spamd You don't have to mess with the directory, just rename the main sought_rules_yerp_org.cf file. Would it not be simpler just to do something like this grep -E ^score /var/db/spamassassin/*/sought_rules_yerp_org/20_sought_fraud.cf /usr/local/etc/mail/spamassassin/sought_fraud_scores.cf or simply paste the scores into local.cf
Sought dedicated AND stock channel (was: Re: JM_SOUGHT_FRAUD)
On Wed, 11 Aug 2010 17:30:40 -0400 Bowie Bailey bowie_bai...@buc.com wrote: On 8/11/2010 3:30 PM, John Hardin wrote: The current situation is: automatic rule updates are only generated when the corpa of recent messages used in the nightly masscheck is sufficiently large (150k+ of both spam and ham, IIRC), and that's been difficult to achieve for a while due to ham starvation. More volunteers to perform nightly local masschecks of fresh ham corpora and upload the results, or to maintain and upload non-private ham corpora to the nightly masscheck server, would be most welcome! In case anyone else is following this... The sa-update process made things a bit more complex than simply renaming the file after updates. If that's all you do, then sa-update loses track of the file and will download a new copy on every run. What I had to do is this: 1) Rename the .cf file back to the original name so sa-update can find it 2) Run sa-update 3) Rename the .cf file to z_sought_rules_yerp_org.cf 4) Restart spamd You don't have to mess with the directory, just rename the main sought_rules_yerp_org.cf file. Would it not be simpler just to do something like this Sadly, no. The problem is, that not only the sub-rules change, but with them the actually scored meta rules, combining these sub-rules by OR-ing them. That means, stale meta rules in stock will override the fresh meta rules, effectively discarding all fresh sub-rule matches. Bottom line: If the dedicated sought channel is more up-to-date and updated more frequently, using that one in *addition* to (faster) aging stock rules currently needs careful hacks -- to preserver the precious meta. Or, well, any SA or channel provided method, to tweak the order. SA method would be a new feature. Channel method would be a rename, to come last in alphabetical order. Bummer. :( This is a real problem and worth filing a bug. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: JM_SOUGHT_FRAUD
On Wed, 2010-08-11 at 17:30 -0400, Bowie Bailey wrote: In case anyone else is following this... The sa-update process made things a bit more complex than simply renaming the file after updates. If that's all you do, then sa-update loses track of the file and will download a new copy on every run. What I had to do is this: 1) Rename the .cf file back to the original name so sa-update can find it 2) Run sa-update 3) Rename the .cf file to z_sought_rules_yerp_org.cf 4) Restart spamd You don't have to mess with the directory, just rename the main sought_rules_yerp_org.cf file. Hmm, a simple symlink, albeit (once) manually set-up, should do, too. ln -s sought_rules_yerp_org.cf zzz.cf This would parse anything from the more recent dedicated channel, after the stock rules (and sought itself) have been parsed. Again. Yes, this overwrites the sought rules' meta definitions a third time. But that shouldn't be much of a problem, really. Also, any stale, unused sub-rules (see my other follow-up) will be ignored by SA prior to compiling the REs. This should work as a do-it-once-and-forget workaround to the issue, rather than relying on ghastly file renaming in the sa-updating cron job. The latter even is prone to race errors, in rare and specific circumstances. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: JM_SOUGHT_FRAUD
On Wed, 11 Aug 2010, Bowie Bailey wrote: In case anyone else is following this... The sa-update process made things a bit more complex than simply renaming the file after updates. If that's all you do, then sa-update loses track of the file and will download a new copy on every run. What I had to do is this: 1) Rename the .cf file back to the original name so sa-update can find it 2) Run sa-update 3) Rename the .cf file to z_sought_rules_yerp_org.cf How about symlinking z_sought_rules_yerp_org.cf to sought_rules_yerp_org.cf? That might spare you performing the filename shuffle every time you want to update... 4) Restart spamd You don't have to mess with the directory, just rename the main sought_rules_yerp_org.cf file. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #20: The faster you finish the fight, the less shot you will get. --- 4 days until the 65th anniversary of the end of World War II
Re: JM_SOUGHT_FRAUD
On Wed, 11 Aug 2010, RW wrote: 1) Rename the .cf file back to the original name so sa-update can find it 2) Run sa-update 3) Rename the .cf file to z_sought_rules_yerp_org.cf 4) Restart spamd Would it not be simpler just to do something like this grep -E ^score /var/db/spamassassin/*/sought_rules_yerp_org/20_sought_fraud.cf /usr/local/etc/mail/spamassassin/sought_fraud_scores.cf Heh. Adding that to your sa-update cron job is a pretty good way too. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #20: The faster you finish the fight, the less shot you will get. --- 4 days until the 65th anniversary of the end of World War II
Re: JM_SOUGHT_FRAUD
On Thu, 12 Aug 2010, Karsten Br?ckelmann wrote: On Wed, 2010-08-11 at 17:30 -0400, Bowie Bailey wrote: In case anyone else is following this... The sa-update process made things a bit more complex than simply renaming the file after updates. If that's all you do, then sa-update loses track of the file and will download a new copy on every run. What I had to do is this: 1) Rename the .cf file back to the original name so sa-update can find it 2) Run sa-update 3) Rename the .cf file to z_sought_rules_yerp_org.cf 4) Restart spamd You don't have to mess with the directory, just rename the main sought_rules_yerp_org.cf file. Hmm, a simple symlink, albeit (once) manually set-up, should do, too. ln -s sought_rules_yerp_org.cf zzz.cf That's what I get for responding as I read a backlog rather than waiting until I've read everything... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #20: The faster you finish the fight, the less shot you will get. --- 4 days until the 65th anniversary of the end of World War II