Re: Search expression spanning multiple lines

SysAdm Thu, 30 Oct 2008 11:48:28 -0700

Hi Andy,
  Thank you for trying!  A slightly modified version of your first
example you gave (/timesheet\_.\{-}Winsock) gets me right to the first
occurrence, so that definitely helps.


This one works, but I worry that the number of newlines might not stay
the same:
/FROM:<timesheet.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*Winsock

Thank you again for all your help!


On Oct 30, 12:30 pm, Andrew Long <[EMAIL PROTECTED]> wrote:
> On 27 Oct 2008, at 14:48, SysAdm wrote:
>
>
>
> > Hi Andrew,
> > Sure!  Here is a clipping (with identifyable info changed) that
> > contains a valid, delivered email, some spam and a timesheet record.
>
> I'm afraid that I'm gong to have to admit defeat on this one. I've
> been fighting with it for the last few nights, and I can't find a way to
> do it. At a  couple of points I thought I had a complicated solution,  
> but
> they all fell over under different test cases.
>
> My initial suggestion about non-greedy falls over because of a little
> gotcha documented in ':he non-greedy' (extracted below)
>
> *non-greedy*
> If a "-" appears immediately after the "{", then a shortest match
> first algorithm is used (see example below).  In particular, "\{-}" is
> the same as "*" but uses the shortest match first algorithm.  BUT: A
> match that starts earlier is preferred over a shorter match: "a\{-}b"
> matches "aaab" in "xaaab".
>
> This means that the match will always start at the earliest start point,
> and not stop until it finds the first end point. What we need for this
> solution to work is a 'bulimic' match operator that prefers the latest,
> rather that the earliest, start point before each stop point.
>
> My next thoughts were to use the zero-width match operators 'he:
> zero-width'. My idea was to use a start pattern identifying the 'mail
> from' header being non-greedy up until the SMTP 354 message, then use  
> the
> zero width non-matching operator to locate transactions that don't
> output an SMTP 250 message. This falls down because you can usually find
> a point after the 354 where the 250 doesn't match, even if there's a
> match a line or so later.
>
> The complicated solution here would be to join the 354 match to the !250
> match with a repeated group of all the possible lines between the two
> messages. In the 'simple' case this involves lines detailing the message
> file created, and the number of bytes transferred. In the 'complicated'
> case you have to cater for the anti-virus, ant-spam scanning that might
> be going on as well.
>
> But even using * or \+ on the repeating group didn't work - they're not
> quite greedy enough, and the zero-width operator stops on the line
> before the 250 message, leaving us with yet more false positives.
>
> Here's my attempt at a simple pattern (this IS going to wrap, I'm
> afraid)
>
> /^.\{23}:\s\+<--\s\+mail\s\+from:\s*<timesheet\>\_.\{-}\n.\{23}:\s\+-->
> \s\+354.*\n\%(.\{23}:\s\+message\>.*\)*\n\%(.\{23}:\s\+-->\s\+250\)[EMAIL 
> PROTECTED]
>
> See what I mean about 'simple?' Not exactly a pattern that trips off the
> tongue (or even fingers!) and this is without making sure that those
> first 23 characters on each line are in fact a time stamp.
>
> My only other thought was to write a syntax file for the log, which
> would let you highlight things like the socket errors as Error, and then
> just look for the timesheet addresses which are followed by an Error.
>
> regards, Andy
>
> --
> Andrew Long
> andrew dot long at mac dot com
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: Search expression spanning multiple lines

Reply via email to