Giampaolo Tomassoni wrote:
> > From: LuKreme [mailto:krem...@kreme.com]
> > 
> > On 25-Mar-2009, at 11:24, Giampaolo Tomassoni wrote:
> > > rawbody   LARGETABLE
> > > m'<tr\W(?:[^<]|<(?!t[dr]\W))*(?:<td\W(?:[^<]|<(?!t[rd]\W))*){30,}</
> > > tr'is
> > 
> > 
> > Just to be sure my parsing is working correctly, that is flagging if
> > there are 30 or more TDs in a single TR?
> 
> Right.
> 
> 
> > If so, couldn't that be
> > written a lot more compactly?
> 
> Probably yes. The problem is that a simple way like
> '<tr\W.*(?:<td\W.*){30,}</tr' would easily fail because the '*'
> operator would work "greedly" here, consuming <td>s and <tr>s which
> should instead be counted.

Then why not use the non-greedy version?

<tr\W.*?(?:<td\W.*?){30,}</tr

On the other hand, '.*' of any kind is usually a bad idea in a
SpamAssassin rule.  It should always be limited to avoid excessive
backtracking.

<tr\W.{,20}?(?:<td\W.{,20}?){30,}</tr

I pulled the 20 character limit out of thin air.  Change it to whatever
makes sense for this rule.

-- 
Bowie

Reply via email to