Re: Regular Expression Question

Sean Hubbell Fri, 26 Jan 2007 06:16:44 -0800

Tim Chase wrote:

print ("small string");
print (
  "This is a very long string");
and I need to format it as so:

print ("small string\n");
print (
  "This is a very long string\n");
Ideally, I would like to do this in one command and I would also liketo understand the regex itself. So, given the above, here is what Iunderstand of the regex pattern:
    %s/print\s*(\s*"[^"]*\(\\n\)\@<!\ze"/&\\n/g
    %             - globally
s              - substitute
/              - delimeter
print\s*(\s*" - my phrase to match including zero or more matchingspaces at the end print, then a literal paren then zero or morespaces up until the quote
[^"]*       - then everything that is not a quote (zero or more)
Doing well up through here...
(             - The beginning of the group ???
\\n          - literal \n
)             - End group ????
\@<!          - Nothing, requires no match behind ???
You've got the understanding right (though those parens are "\(" and"\)" with backslashes). Those four lines in concert assert that aliteral "\n" doesn't come before the current point. Without thegrouping, it would only assure that the previous atom (in this case,the "n") didn't appear here, so you'd have problems with things like
    print("terminal n")
because it sees the terminal "n" so it doesn't do the substitution.By grouping them, you assert "and when you get to this point [beforethe closing quote] and there isn't a literal backslash-en here, thenwe match"
In here, you're missing the "\ze" which means "when doing thereplacement, treat it as though the thing we're substituting endedhere, even though there's more stuff we're looking for (namely, thedouble-quote that's next)"
"             - my ending quote to match in the pattern print ("")
correct
/&          - ???
This is standard substitution...the slash is the break between thesearch and its replacement. The ampersand is "the whole previousmatch". In this case, it's slightly tweaked because of the "\ze" thatwe used...the thing we replace goes up through (but not including) thesecond double-quote. So it drops in everything from "print" throughthe end of the internal string (sans-closing-quote)
\\n          - literal \n
correct...appending the literal \n you want.
/             - delimeter
g            - each occurrence on the line

Then we have the spanning multiple lines option:

\_ [^"]*
that's

    \_[

not

    \_ [
\_ - match text over multiple lines (Is this like anotherregex engine, like the one sed uses?)
It's a vim thing:

    :help /\_
should drop you in the fray. It prefixes (infixes?)a number of atomsthat could include whitespace, so for your change, you'd likely wantto do something like change the \s atoms to \_s to include newlines.
Does this make since? The area I am having difficulty with is /& andhow the grouping is working.
Hopefully this sheds some light on matters and helps you tweak yourown regexps in the future. If you have any questions, feel free to ask.
-tim


Yes, this helps greatly. Thanks again Tim.

Sean

Re: Regular Expression Question

Reply via email to