Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-06-17 Thread Dave Pooser
On 5/22/14 6:48 PM, Karsten Bräckelmann guent...@rudersport.de wrote: On Thu, 2014-05-22 at 18:34 -0500, David B Funk wrote: After doing some experimenting with that code I came up with something that I'd argue is more semantically correct: # if we've got a long series of blank lines,

Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote: In either case, having a sample would speed up this ping-pong style debugging. And I am curious. ;) Mind putting your sample up a pastebin? Ian sent me the original message off-list. It indeed contains about 16 consecutive newlines,

Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread David B Funk
On Thu, 22 May 2014, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote: [snip..] The number of continuation lines equals the number of newlines in the test-case. Well, up until 12, that is. :-/ Any number up to 11 of consecutive newlines can be matched

Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread David B Funk
On Thu, 22 May 2014, David B Funk wrote: On Thu, 22 May 2014, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote: [snip..] The number of continuation lines equals the number of newlines in the test-case. Well, up until 12, that is. :-/ Any number up to

Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 18:34 -0500, David B Funk wrote: After doing some experimenting with that code I came up with something that I'd argue is more semantically correct: # if we've got a long series of blank lines, limit them if (defined $start) { my $max_blank_lines = 20;

Re: Bayes refinement

2014-05-21 Thread Ian Zimmerman
On Thu, 15 May 2014 12:18:25 -0800 Kevin Miller kevin_mil...@ci.juneau.ak.us wrote: I implemented a rule that looks for multiple breaks for just that reason. Can't remember where I stole it from - probably some folks here helped me with it a few years ago. Can't remember who, but

Re: Bayes refinement

2014-05-21 Thread Martin Gregorie
On Wed, 2014-05-21 at 10:23 -0700, Ian Zimmerman wrote: I am trying to do a variant of this for text/plain, as that is the type I mostly face now. But I cannot get it to work. header __LOCAL_PLAIN_ASCII Content-Type =~ /text\/plain; *charset=us-ascii/i rawbody __LOCAL_MUCHO_BLANKS

Re: Bayes refinement

2014-05-21 Thread Ian Zimmerman
On Wed, 21 May 2014 19:08:51 +0100 Martin Gregorie mar...@gregorie.org wrote: rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m Martin Looking for newlines rather than whitespace? Does /\s{10,}/m Martin work any better? Nope, it doesn't :-( Anyway, looking for newlines was my intention, sorry for the

Re: Bayes refinement

2014-05-21 Thread John Hardin
On Wed, 21 May 2014, Ian Zimmerman wrote: On Wed, 21 May 2014 19:08:51 +0100 Martin Gregorie mar...@gregorie.org wrote: rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m Martin Looking for newlines rather than whitespace? Does /\s{10,}/m Martin work any better? Nope, it doesn't :-( Anyway, looking

Re: Bayes refinement

2014-05-21 Thread Karsten Bräckelmann
On Wed, 2014-05-21 at 10:23 -0700, Ian Zimmerman wrote: I am trying to do a variant of this for text/plain, as that is the type I mostly face now. But I cannot get it to work. rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m You don't need the or more quantifier at the end of your RE. That just

Re: Bayes refinement

2014-05-21 Thread Ian Zimmerman
On Wed, 21 May 2014 22:26:41 +0200 Karsten Bräckelmann guent...@rudersport.de wrote: Karsten Seriously, the above rule, the shorter /\n{10}/, as well as the Karsten variant posted by John without quantifier do exactly what you Karsten asked for. They match 10 consecutive \n newline chars in the

Re: Bayes refinement

2014-05-21 Thread Karsten Bräckelmann
On Wed, 2014-05-21 at 17:32 -0700, Ian Zimmerman wrote: The test message does not have that string. Maybe it uses DOS flavor \r\n. Or what appears to be a bunch of linebreaks actually has spaces mixed in. Well, no. I looked at the message (the same data I fed to s.a. --debug) with

Re: Bayes refinement

2014-05-17 Thread RW
On Fri, 16 May 2014 21:36:22 -0600 Bob Proulx wrote: David Jones wrote: James B. Byrne wrote: If you keep Bayes well trained (assuming you have enough ham to do so) Bayes poisoning is a myth. I'm not sure I agree with the myth statement. I just had to reset my Bayes DB after

Re: Bayes refinement

2014-05-16 Thread Bowie Bailey
On 5/14/2014 5:08 PM, James B. Byrne wrote: Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text meant to poison the tests but this gibberish always trails the main

Re: Bayes refinement

2014-05-16 Thread John Hardin
On Wed, 14 May 2014, James B. Byrne wrote: Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text meant to poison the tests but this gibberish always trails the main

Re: Bayes refinement

2014-05-16 Thread Ian Zimmerman
On Fri, 16 May 2014 07:22:56 -0400 David F. Skoll d...@roaringpenguin.com wrote: James Is there any way to limit Bayes content checking to only the James first X characters of the message body? I ask this because it is James clear that the spam messages getting through contain text meant James

RE: Bayes refinement

2014-05-16 Thread Kevin Miller
I implemented a rule that looks for multiple breaks for just that reason. Can't remember where I stole it from - probably some folks here helped me with it a few years ago. Can't remember who, but appreciated the assistance.

Re: Bayes refinement

2014-05-16 Thread Bowie Bailey
On 5/16/2014 2:24 PM, Ian Zimmerman wrote: On Fri, 16 May 2014 07:22:56 -0400 David F. Skoll d...@roaringpenguin.com wrote: James Is there any way to limit Bayes content checking to only the James first X characters of the message body? I ask this because it is James clear that the spam

Re: Bayes refinement

2014-05-16 Thread David F. Skoll
On Fri, 16 May 2014 11:24:29 -0700 Ian Zimmerman i...@buug.org wrote: On close inspection, I see that the hash-busting garbage appended is (faux) technical computing talk instead of the usual cookbooks or classical literature :-p That is, scrambled Stack Overflow discussions and the like.

Re: Bayes refinement

2014-05-16 Thread Axb
On 05/14/2014 11:08 PM, James B. Byrne wrote: Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text meant to poison the tests but this gibberish always trails the

RE: Bayes refinement

2014-05-16 Thread David Jones
On 05/14/2014 11:08 PM, James B. Byrne wrote: Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text meant to poison the tests but this gibberish always trails

Re: Bayes refinement

2014-05-16 Thread Karsten Bräckelmann
On Fri, 2014-05-16 at 11:24 -0700, Ian Zimmerman wrote: In the last few (~10) days, I have seen a marked increase in FNs, usually with Bayes values in the 50s and 60s. That's a neutral bayes classification. Other rules should be able to still identify the spam. On close inspection, I see that

Re: Bayes refinement

2014-05-16 Thread Ian Zimmerman
On Fri, 16 May 2014 16:20:21 -0400 Bowie Bailey bowie_bai...@buc.com wrote: Keep in mind that BAYES_50 and BAYES_60 still contribute positive scores by default. Though it is technically a neutral result, it still adds a point or two to the score. Rather than messing with Bayes, I would

Re: Bayes refinement

2014-05-16 Thread Bob Proulx
David Jones wrote: James B. Byrne wrote: If you keep Bayes well trained (assuming you have enough ham to do so) Bayes poisoning is a myth. I'm not sure I agree with the myth statement. I just had to reset my Bayes DB after years of it slowly drifting due to bad user input and such.

Re: Bayes refinement

2014-05-16 Thread David F. Skoll
On Wed, 14 May 2014 17:08:26 -0400 James B. Byrne byrn...@harte-lyne.ca wrote: Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text meant to poison the tests but