Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 99 by [email protected]: Feature: Extended regular expressions
http://code.google.com/p/vim/issues/detail?id=99
(See https://code.google.com/r/mike-vim-extended-regex/ for the source code
of this feature. Diffs are here:
https://code.google.com/r/mike-vim-extended-regex/source/list )
I've implemented support for extended regular expressions in Vim, somewhat
similar to Perl's extended regex feature, which allows you to make
complicated regexes (especially in Vimscript files) easier to read, by
including whitespace and comments in them. (Vim already allows multiline
regexes.) I'm hoping that, after any changes suggested on this forum, this
will be a useful addition to Vim.
One of the trickiest, and perhaps most contentious, parts is choosing the
syntax to use -- how to turn on extended mode, and what comments should
look like. I am very open to feedback and changes on this. Below, I
present the reasoning behind my initial choices.
This is what I have implemented:
- To turn on extended mode, put \# at the beginning of your regex.
- A comment is enclosed in double-braces, like {{ this }}.
- To match a space rather than having it be ignored, use "\ ".
Here is a simple example. syntax/c.vim includes this, for syntax
highlighting of backslash-escaped sequences inside strings in C:
" String and Character constants
" Highlight special characters (those which have a backslash)
differently
syn match cSpecial display contained "\\\(x\x\+\|\o\{1,3}\|.\|$\)"
With extended regular expressions, the above could be written with
whitespace and comments:
" String and Character constants
" Highlight special characters (those which have a backslash)
differently
syn match cSpecial display contained
\ "\#
\ \\ {{ literal backslash, followed by one
of... }}
\ \(
\ x \x\+ {{ hex, e.g. '\x2c' }}
\ \|
\ \o\{1,3} {{ octal, e.g. '\755' }}
\ \|
\ . {{ e.g. '\n' or '\t' etc. }}
\ \|
\ $ {{ end of line }}
\ \)"
I have not yet written tests or docs. If you want, I would be happy to do
so.
As for the syntax: Obviously it is best not to invent a brand new syntax
unless there is a good reason to do so. I would have preferred to use
Perl's syntax, which is:
- To turn on extended mode, use "x" in the flags area after the regex, e.g.
/foo/x
- A comment begins with (?# and ends with )
Unfortunately, neither one of those worked out especially well in Vim. For
turning on extended mode, Vim makes only very light use of "flags" after
regular expressions. In fact, although it allows a few flags after the :s
(substitute) command, in general it doesn't use flags after regular
expressions. In Vim, usually the same effect is achieved by putting
special codes at the beginning of a regex, such as \c to ignore case.
And for comments, Using (?# ... ) would work, but would be somewhat
awkward. In Perl, both the () operator and the ? operator are "magic" by
default (do not need to be escaped with a backslash to give them special
meaning). But in Vim, the opposite is true: By default, () just matches
parentheses, and ? just matches a question mark. So in a Vim regex, a
comment would look like \(\?# this \), which is just too ugly and too
tricky for people to remember.
So I played around with a number of alternative syntax options.
-----
1. Syntax for turning on extended mode:
Consistent with other regex syntax in Vim, it seemed to me that the best
way to let the user turn on extended mode would be the presence of some
special sequence at the beginning of the regex, similar to Vim's current
use of \c or \C for case sensitivity, \m \M \v \V to choose a "magic" mode,
and so on. Here is a list of all available one-character backslash
sequences:
\! \" \# \$ \' \, \- \: \; \g \j \q \y \^ \`
I would have liked to use \x or \e to indicate extended mode, but both of
those are already used. (\x means any hex digit; \e means the escape key.)
Given those choices, my favorite was:
\#
... mainly because "#" is used in many programming languages to begin a
comment.
Other possibilities: Vim already uses \% and \z as prefixes for a number of
other commands, so two options that seem pretty good to me are:
\%e
or
\zx
I sort of like \%e. It has the advantage of being somewhat mnemonic (e for
extended), and also it avoids using up a punctuation character (#) that
might be better saved for other future enhancements.
-----
2. Syntax for comments:
One issue is: Should turning comments on/off require "magic" characters or
not? At first I thought, of course it would have to include magic
characters; but then it occurred to me that we could just use a character
sequence that is somewhat unlikely to appear in regexes, and that is easy
to represent as regular characters (rather than comment delimiters) in a
regex if necessary.
I like {{ double braces }} because:
- They look nice and are easy to type.
- They don't conflict with any other regex syntax patterns. Yes, braces
are used to indicate a count, e.g. x{1,3} for one to three x's, but that
uses single braces.
- It is easy to represent a match for the actual characters "{{" in an
extended regex: Just put a space between them, "{ {".
Other options:
If we use \# to turn on extended mode, I thought it might be nice to use
some sort of comment delimiter that includes the "#" character, but I
couldn't come up with anything that good. The best I could come up with is
## to begin a comment and ## again to end a comment, but that could lead to
trouble if the user tries to mark off a comment with "#############".
Other possibilities:
#( )
{# #}
We can't use "#" by itself for comments, with end-of-line indicating the
end of the comment, because of the way Vim multiline strings work. In Vim,
when you write
let x = "this is
\ a string"
What you get is, "this is a string". There is no embedded newline in the
result.
I also thought it might be nice to somehow use the " double-quote character
to indicate comments, since that is Vimscript's comment character; but the
double-quote character would be a bad choice because often, in Vimscript,
the regex itself is double-quoted, so you would have to backslash-escape
all the embedded double-quote characters, which would get a bit messy.
-----
A few more details about the syntax:
- Comments support nesting. This is mainly useful while debugging your
regex, to "comment out" part of it.
- Comments and extra whitespace are not allowed in places such as inside
collections such as [a-z], repetition indicators such as {1,3}, in the
middle of special sequences such as "\%$", and so on.
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php