Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 99 by [email protected]: Feature: Extended regular expressions
http://code.google.com/p/vim/issues/detail?id=99

(See https://code.google.com/r/mike-vim-extended-regex/ for the source code of this feature. Diffs are here: https://code.google.com/r/mike-vim-extended-regex/source/list )

I've implemented support for extended regular expressions in Vim, somewhat similar to Perl's extended regex feature, which allows you to make complicated regexes (especially in Vimscript files) easier to read, by including whitespace and comments in them. (Vim already allows multiline regexes.) I'm hoping that, after any changes suggested on this forum, this will be a useful addition to Vim.

One of the trickiest, and perhaps most contentious, parts is choosing the syntax to use -- how to turn on extended mode, and what comments should look like. I am very open to feedback and changes on this. Below, I present the reasoning behind my initial choices.

This is what I have implemented:

- To turn on extended mode, put \# at the beginning of your regex.
- A comment is enclosed in double-braces, like {{ this }}.
- To match a space rather than having it be ignored, use "\ ".

Here is a simple example. syntax/c.vim includes this, for syntax highlighting of backslash-escaped sequences inside strings in C:

    " String and Character constants
" Highlight special characters (those which have a backslash) differently
    syn match   cSpecial        display contained "\\\(x\x\+\|\o\{1,3}\|.\|$\)"

With extended regular expressions, the above could be written with whitespace and comments:

    " String and Character constants
" Highlight special characters (those which have a backslash) differently
    syn match   cSpecial        display contained
                \ "\#
\ \\ {{ literal backslash, followed by one of... }}
                \    \(
                \        x \x\+     {{ hex, e.g. '\x2c' }}
                \      \|
                \        \o\{1,3}   {{ octal, e.g. '\755' }}
                \      \|
                \        .          {{ e.g. '\n' or '\t' etc. }}
                \      \|
                \        $          {{ end of line }}
                \    \)"

I have not yet written tests or docs. If you want, I would be happy to do so.

As for the syntax: Obviously it is best not to invent a brand new syntax unless there is a good reason to do so. I would have preferred to use Perl's syntax, which is:

- To turn on extended mode, use "x" in the flags area after the regex, e.g. /foo/x
- A comment begins with (?# and ends with )

Unfortunately, neither one of those worked out especially well in Vim. For turning on extended mode, Vim makes only very light use of "flags" after regular expressions. In fact, although it allows a few flags after the :s (substitute) command, in general it doesn't use flags after regular expressions. In Vim, usually the same effect is achieved by putting special codes at the beginning of a regex, such as \c to ignore case.

And for comments, Using (?# ... ) would work, but would be somewhat awkward. In Perl, both the () operator and the ? operator are "magic" by default (do not need to be escaped with a backslash to give them special meaning). But in Vim, the opposite is true: By default, () just matches parentheses, and ? just matches a question mark. So in a Vim regex, a comment would look like \(\?# this \), which is just too ugly and too tricky for people to remember.

So I played around with a number of alternative syntax options.

-----

1. Syntax for turning on extended mode:

Consistent with other regex syntax in Vim, it seemed to me that the best way to let the user turn on extended mode would be the presence of some special sequence at the beginning of the regex, similar to Vim's current use of \c or \C for case sensitivity, \m \M \v \V to choose a "magic" mode, and so on. Here is a list of all available one-character backslash sequences:

    \!   \"   \#   \$   \'   \,   \-   \:   \;   \g   \j   \q   \y   \^   \`

I would have liked to use \x or \e to indicate extended mode, but both of those are already used. (\x means any hex digit; \e means the escape key.)

Given those choices, my favorite was:

    \#

... mainly because "#" is used in many programming languages to begin a comment.

Other possibilities: Vim already uses \% and \z as prefixes for a number of other commands, so two options that seem pretty good to me are:

    \%e
or
    \zx

I sort of like \%e. It has the advantage of being somewhat mnemonic (e for extended), and also it avoids using up a punctuation character (#) that might be better saved for other future enhancements.

-----

2. Syntax for comments:

One issue is: Should turning comments on/off require "magic" characters or not? At first I thought, of course it would have to include magic characters; but then it occurred to me that we could just use a character sequence that is somewhat unlikely to appear in regexes, and that is easy to represent as regular characters (rather than comment delimiters) in a regex if necessary.

I like {{ double braces }} because:

- They look nice and are easy to type.
- They don't conflict with any other regex syntax patterns. Yes, braces are used to indicate a count, e.g. x{1,3} for one to three x's, but that uses single braces. - It is easy to represent a match for the actual characters "{{" in an extended regex: Just put a space between them, "{ {".

Other options:

If we use \# to turn on extended mode, I thought it might be nice to use some sort of comment delimiter that includes the "#" character, but I couldn't come up with anything that good. The best I could come up with is ## to begin a comment and ## again to end a comment, but that could lead to trouble if the user tries to mark off a comment with "#############". Other possibilities:
    #( )
    {# #}

We can't use "#" by itself for comments, with end-of-line indicating the end of the comment, because of the way Vim multiline strings work. In Vim, when you write

    let x = "this is
                \ a string"

What you get is, "this is a string". There is no embedded newline in the result.

I also thought it might be nice to somehow use the " double-quote character to indicate comments, since that is Vimscript's comment character; but the double-quote character would be a bad choice because often, in Vimscript, the regex itself is double-quoted, so you would have to backslash-escape all the embedded double-quote characters, which would get a bit messy.

-----

A few more details about the syntax:

- Comments support nesting. This is mainly useful while debugging your regex, to "comment out" part of it.

- Comments and extra whitespace are not allowed in places such as inside collections such as [a-z], repetition indicators such as {1,3}, in the middle of special sequences such as "\%$", and so on.

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Raspunde prin e-mail lui