Re: more efficent big scoring

2008-01-23 Thread Justin Mason
To clarify -- here's how the current code orders rule evaluation: - message metadata is extracted. - header DNSBL tests are started. - the decoded forms of the body text are extracted and cached. - the URIs in the message body are extracted and cached. - Iterates through each known priority

RE: more efficent big scoring

2008-01-23 Thread Robert - elists
Just wanted to point out, this topic came out when site dns cache service started to fail due to excessive dnsbl queries. My slowdown was due to multiple timeouts and/or delay, probably related to answering joe-job rbldns backscatter -- that's the reason I was looking for early exit on

Re: more efficent big scoring

2008-01-22 Thread George Georgalis
On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: On Sat, 19 Jan 2008, Loren Wilton wrote: I would not be terribly surprised to find out that on average there was no appreciable difference in running all rules of all types in priority order, over the current method; Neither am

Re: more efficent big scoring

2008-01-22 Thread John D. Hardin
On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score is rather small. The greatest optimization would be to not test REs you

Re: more efficent big scoring

2008-01-22 Thread Justin Mason
John D. Hardin writes: On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score is rather small. The greatest

Re: more efficent big scoring

2008-01-22 Thread Jim Maul
Justin Mason wrote: John D. Hardin writes: On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score is rather small. The

Re: more efficent big scoring

2008-01-22 Thread Justin Mason
Jim Maul writes: Justin Mason wrote: John D. Hardin writes: On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the fraction of defined rules that actually hit and affect the score

Re: more efficent big scoring

2008-01-22 Thread John D. Hardin
John D. Hardin writes: Loren mentioned to me in a private email: common subexpressions. Whoops! Matt Kettler mentioned it to me, not Loren. Sorry! -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]

Re: more efficent big scoring

2008-01-22 Thread George Georgalis
On Tue, Jan 22, 2008 at 05:24:00PM +, Justin Mason wrote: Jim Maul writes: Justin Mason wrote: John D. Hardin writes: On Tue, 22 Jan 2008, George Georgalis wrote: On Sun, Jan 20, 2008 at 09:41:58AM -0800, John D. Hardin wrote: Neither am I. Another thing to consider is the

Re: more efficent big scoring

2008-01-22 Thread Loren Wilton
John D. Hardin writes: Loren mentioned to me in a private email: common subexpressions. Whoops! Matt Kettler mentioned it to me, not Loren. Sorry! I was going to mention that I didn't think that had been me. Unless I was asleep when I wrote the reply. Which could have been the case. :-)

Re: more efficent big scoring

2008-01-22 Thread Loren Wilton
maybe if there was some way to establish a hierachy at startup which groups rule processing into nodes. some nodes finish quickly, some have dependencies, some are negative, etc. Just wanted to point out, this topic came out when site dns cache service started to fail due to excessive dnsbl

Re: more efficent big scoring

2008-01-20 Thread Matt Kettler
Loren Wilton wrote: Well, it looks like I need to spend some time reading the code to study exactly how SA runs rules, and see if it's doing something that pollutes the memory cache, which would cause the over-sorting to not matter.. As best I recall, it runs rules by type, and sorted by

Re: more efficent big scoring

2008-01-20 Thread John D. Hardin
On Sat, 19 Jan 2008, Loren Wilton wrote: I would not be terribly surprised to find out that on average there was no appreciable difference in running all rules of all types in priority order, over the current method; Neither am I. Another thing to consider is the fraction of defined rules

Re: more efficent big scoring

2008-01-19 Thread Justin Mason
Theo Van Dinter writes: Yes and no. There aren't many negative scored rules, which could easily be put into a low priority to run first. The issue, which is where Matt was going I believe, is that the reason score based short circuiting was removed is that it's horribly slow to keep

Re: more efficent big scoring

2008-01-19 Thread Matt Kettler
Robert - elists wrote: You can't run the rules in score-order without driving SA's performance into the ground. The key here is SA doesn't run tests sequentially, it runs them in parallel as it works its way through the body. this allows for good, efficient use of memory cache. By running

Re: more efficent big scoring

2008-01-19 Thread Matt Kettler
Matt Kettler wrote: No, I'm saying it breaks the emails into pieces, then for the first piece, it runs all the rules. Then it runs all the rules on the second piece, and the third, and the fourth, etc. Forcing score order causes it to run the whole message on one rule, then then whole

Re: more efficent big scoring

2008-01-19 Thread Loren Wilton
Well, it looks like I need to spend some time reading the code to study exactly how SA runs rules, and see if it's doing something that pollutes the memory cache, which would cause the over-sorting to not matter.. As best I recall, it runs rules by type, and sorted by priority within type.

more efficent big scoring

2008-01-18 Thread George Georgalis
Noticed today (again) how long some messages take to test. The first thing that comes to mind is some dns is getting overloaded answering joe-job rbldns backskatter, causing timeouts or slow responce times. Then I was thinking about how some tests are excluded because they generate too much

Re: more efficent big scoring

2008-01-18 Thread Matt Kettler
You can't run the rules in score-order without driving SA's performance into the ground. The key here is SA doesn't run tests sequentially, it runs them in parallel as it works its way through the body. this allows for good, efficient use of memory cache. By running rules in score-order,

Re: more efficent big scoring

2008-01-18 Thread Theo Van Dinter
Yes and no. There aren't many negative scored rules, which could easily be put into a low priority to run first. The issue, which is where Matt was going I believe, is that the reason score based short circuiting was removed is that it's horribly slow to keep checking the score after each rule

RE: more efficent big scoring

2008-01-18 Thread Robert - elists
You can't run the rules in score-order without driving SA's performance into the ground. The key here is SA doesn't run tests sequentially, it runs them in parallel as it works its way through the body. this allows for good, efficient use of memory cache. By running rules in

Re: more efficent big scoring

2008-01-18 Thread jdow
From: Robert - elists [EMAIL PROTECTED] Sent: Friday, 2008, January 18 21:14 You can't run the rules in score-order without driving SA's performance into the ground. The key here is SA doesn't run tests sequentially, it runs them in parallel as it works its way through the body. this allows