Re: Problem with matching regex against long body

2020-11-03 Thread RW
On Tue, 3 Nov 2020 13:39:47 -0800 Loren Wilton wrote: > >> See rawbody_part_scan is the docs. > > > > Also the chunking of the rawbody into 2-4 kB blocks, may make a > > difference. > > I wasn't able to find rawbody_part_scan in any of the docs that I > managed to find, but after digging

Re: Problem with matching regex against long body

2020-11-03 Thread Loren Wilton
You may also want to stick optional whitespace in there to avoid trivial bypass: There's also the possibility of adding a typeface or other options to the tag, which would bypass your simple rule. And HTML is not case-sensitive. And avoid * on complex stuff when matching arbitrarily long

Re: Problem with matching regex against long body

2020-11-03 Thread John Hardin
On Tue, 3 Nov 2020, Loren Wilton wrote: I'm getting lots of spams that are about 100+K long. The spam body contains two blocks of random news text copied from fox news or msnbc or the like, enclosed in a zero-point font block. I'm trying to match this simple pattern to give some extra points,

Re: Problem with matching regex against long body

2020-11-03 Thread Loren Wilton
See rawbody_part_scan is the docs. Also the chunking of the rawbody into 2-4 kB blocks, may make a difference. I wasn't able to find rawbody_part_scan in any of the docs that I managed to find, but after digging into the source I found the chunking logic and dug out the 2K limit. I'm not

Re: Problem with matching regex against long body

2020-11-03 Thread RW
On Tue, 3 Nov 2020 20:09:46 + RW wrote: > On Tue, 3 Nov 2020 10:32:41 -0800 > Loren Wilton wrote: > > > I'm getting lots of spams that are about 100+K long. The spam body > > contains two blocks of random news text copied from fox news or > > msnbc or the like, enclosed in a zero-point font

Re: Problem with matching regex against long body

2020-11-03 Thread RW
On Tue, 3 Nov 2020 10:32:41 -0800 Loren Wilton wrote: > I'm getting lots of spams that are about 100+K long. The spam body > contains two blocks of random news text copied from fox news or msnbc > or the like, enclosed in a zero-point font block. I'm trying to match > this simple pattern to give

Re: Problem with matching regex against long body

2020-11-03 Thread Loren Wilton
basics of escaping at least *anything* won't do any harm php > echo preg_quote('[^<]*<'); \\[\^\<\]\*\< Well, escaping the [^<]* part certianly will do harm, since it will turn it from a group match into individual characters that don't exist in the text to be matched. But I've tried

Problem with matching regex against long body

2020-11-03 Thread Loren Wilton
I'm getting lots of spams that are about 100+K long. The spam body contains two blocks of random news text copied from fox news or msnbc or the like, enclosed in a zero-point font block. I'm trying to match this simple pattern to give some extra points, but I can't seem to get it to work. I'm