RE: Search expression spanning multiple lines

Scott Wallace Thu, 30 Oct 2008 18:20:31 -0700

Hi Andy,
  I'm actually trying to find the portions in the mail log file of where
"[EMAIL PROTECTED]" attempted to send mail but it failed with a Winsock
error.  The entire mail transaction shows up in the logs separated by the
"-----" lines.  I'll have to look into the foldsearch.vim script.
Thanks!



-----Original Message-----
From: [email protected] [mailto:[EMAIL PROTECTED] On Behalf
Of Andy Wokula
Sent: Thursday, October 30, 2008 3:28 PM
To: [email protected]
Subject: Re: Search expression spanning multiple lines


SysAdm schrieb:
> Hi Andy,
>   Thank you for trying!  A slightly modified version of your first
> example you gave (/timesheet\_.\{-}Winsock) gets me right to the first
> occurrence, so that definitely helps.
> 
> This one works, but I worry that the number of newlines might not stay
> the same:
> /FROM:<timesheet.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*Winsock
> 
> Thank you again for all your help!
> 
> 
> On Oct 30, 12:30 pm, Andrew Long <[EMAIL PROTECTED]> wrote:
>> On 27 Oct 2008, at 14:48, SysAdm wrote:
>>
>>
>>
>>> Hi Andrew,
>>> Sure!  Here is a clipping (with identifyable info changed) that
>>> contains a valid, delivered email, some spam and a timesheet record.
>> I'm afraid that I'm gong to have to admit defeat on this one. I've
>> been fighting with it for the last few nights, and I can't find a way to
>> do it. At a  couple of points I thought I had a complicated solution,  
>> but
>> they all fell over under different test cases.
>>
>> My initial suggestion about non-greedy falls over because of a little
>> gotcha documented in ':he non-greedy' (extracted below)
>>
>> *non-greedy*
>> If a "-" appears immediately after the "{", then a shortest match
>> first algorithm is used (see example below).  In particular, "\{-}" is
>> the same as "*" but uses the shortest match first algorithm.  BUT: A
>> match that starts earlier is preferred over a shorter match: "a\{-}b"
>> matches "aaab" in "xaaab".
>>
>> This means that the match will always start at the earliest start point,
>> and not stop until it finds the first end point. What we need for this
>> solution to work is a 'bulimic' match operator that prefers the latest,
>> rather that the earliest, start point before each stop point.



>> My next thoughts were to use the zero-width match operators 'he:
>> zero-width'. My idea was to use a start pattern identifying the 'mail
>> from' header being non-greedy up until the SMTP 354 message, then use  
>> the
>> zero width non-matching operator to locate transactions that don't
>> output an SMTP 250 message. This falls down because you can usually find
>> a point after the 354 where the 250 doesn't match, even if there's a
>> match a line or so later.
>>
>> The complicated solution here would be to join the 354 match to the !250
>> match with a repeated group of all the possible lines between the two
>> messages. In the 'simple' case this involves lines detailing the message
>> file created, and the number of bytes transferred. In the 'complicated'
>> case you have to cater for the anti-virus, ant-spam scanning that might
>> be going on as well.
>>
>> But even using * or \+ on the repeating group didn't work - they're not
>> quite greedy enough, and the zero-width operator stops on the line
>> before the 250 message, leaving us with yet more false positives.
>>
>> Here's my attempt at a simple pattern (this IS going to wrap, I'm
>> afraid)
>>
>> /^.\{23}:\s\+<--\s\+mail\s\+from:\s*<timesheet\>\_.\{-}\n.\{23}:\s\+-->
>> \s\+354.*\n\%(.\{23}:\s\+message\>.*\)*\n\%(.\{23}:\s\+-->\s\+250\)[EMAIL 
>> PROTECTED]
>>
>> See what I mean about 'simple?' Not exactly a pattern that trips off the
>> tongue (or even fingers!) and this is without making sure that those
>> first 23 characters on each line are in fact a time stamp.
>>
>> My only other thought was to write a syntax file for the log, which
>> would let you highlight things like the socket errors as Error, and then
>> just look for the timesheet addresses which are followed by an Error.
>>
>> regards, Andy
>>
>> --
>> Andrew Long
>> andrew dot long at mac dot com

What I('d) do:
get a script like foldsearch.vim (Vimscript #2302) (I use sth. similar)
search
    /MAIL FROM:\|Winsock\|----------
then do
    :Fs
(this is a foldsearch command: fold away lines not matching the pattern)
then search for
    /Winsock

each email address can now be found two screen lines above a match for
"Winsock".  The line in between is a fold.


On the pattern:
IIUC, you want a match for
    /MAIL FROM:\_.\{-}Winsock

that has to fail if "----------" is contained in the match.

You could try these patterns:
    /MAIL FROM:\%(\%(----------\)[EMAIL PROTECTED])\{-}Winsock
    " at each position, make sure there is no match for ----------

or
    /MAIL FROM:.*\%(\%(----------\)\@<!\n.*\)*.*Winsock
    " make sure there is no match for ---------- before a line break


Another (IMHO new!?) idea: check out the following pattern on the
next sentence:
    /A.\{-}\%(\zsA.*\)\@<=N

As AlwAys these Are oNly A few suggestioNs.

Matches are "Are oN" and "A few suggestioN" (esp. the first match is
interesting).

-- 
Andy






--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

RE: Search expression spanning multiple lines

Reply via email to