Reply to message «How to update perl.vim for UTF-8 chars in identifiers? 
(RFE?)», 
sent 23:35:42 14 May 2011, Saturday
by Linda W:

I heard something about that vim regexp code needs rewrite/is being rewritten 
(Bram requisted more tests for regexes for new regex engine, but I do not know 
where it is and whether someone is working on it or it is just plans). In 
current engine unicode character classes are not implemented, and collections 
have some weird implementation that will prevent you from writing a long 
collection: according to help [0-9] is slower then \d and you can't use 
[\uN1-\uN2] if N2-N1>255 (so, I guess, regex compiler generates code that 
checks 
for each symbol instead of checking whether symbol is in given range).

So, no, it is not possible without patching vim.

Original message:
> I'm using a perl script that has
> "use utf8;" at the top that enables use of UTF-8 in variable names.
> 
> So in one place, instead of
> 
>    my "@deltaio";
>    ...
>    $total_io += $deltaio[$Tbytes];
> 
> I have:
> 
>    my @Δio;
>    ...
>    $total_io += $Δio[$Tbytes];
> 
> (note: 'Δ' = U+0394: GREEK CAPITAL LETTER DELTA)
> 
> ---
> My question/problem is how to update the syntax coloring file,
> "perl.vim" to correctly accommodate UTF-8 variable names?
> 
> Is this even possible in vim?
> 
> I see:
> 
> syn match  perlVarPlain                "$^[ACDEFHILMNOPRSTVWX]\="
> 
> in the perl.vim file for matching alphabet chars in a variable name,
> 
> How would I specify that it match alphabet characters of any language?
> 
> Unicode has 'properties' assigned to characters, that allow one to tell
> what class a codepoint (or character) falls into.
> 
> Perl allows specifying the base Unicode properties as well as some
> derived properties to aid in classifying.  For Alphabetic characters,
> Unicode has the properties (with Short and Long versions):
> 
>                Short       Long
> 
>                L           Letter
>                LC          CasedLetter
>                Lu          UppercaseLetter
>                Ll          LowercaseLetter
>                Lt          TitlecaseLetter
>                Lm          ModifierLetter
>                Lo          OtherLetter
>                Nl          NumberLetter
> 
> In regular expressions, in Perl, these can be specified with \p{PROPNAME}.
> One can use the short or long form, so "\p{Lu}" is equivalent to
> "\p{UppercaseLetter}".
> 
> Unicode also has names for the various 'scripts' like "Arabic, Braille,
> Latin, etc... that can also be specified in Regex's, e.g.: \p{Arabic}.
> 
> There are also 'extended property classes' in Unicode defined in the
> 'proplist' Unicode database, including things like (abbreviated list).
> 
>                ASCIIHexDigit
>                HexDigit
>                OtherAlphabetic
>                OtherLowercase
>                OtherMath
>                OtherUppercase
>                PatternSyntax
>                PatternWhiteSpace
>                QuotationMark
>                TerminalPunctuation
>                WhiteSpace
> 
> 
> Some additional, 'derived' properties are available in perl as well:
> Alphabetic (= Lu + Ll + Lt + Lm + Lo + Nl + OtherAlphabetic),
> Lowercase, Uppercase, Math, ASCII, etc..
> 
> The proplist and derived props are also described in RE's via the same
> syntax, e.g.:  \p{Alphabetic}
> 
> Does Vim have anything similar to this?
> 
> I didn't know this until recently, but javascript's Regex syntax follows,
> or was adopted from Perl as well, though I don't know how complete the
> implementation is (i.e. if it includes UTF-8).
> 
> To implement the syntax hilighting changes I want, is this something that
> would require an enhancement to vim?  Or, alternately, how would
> I go about adding syntax highlighting to perl that allows UTF-8
> variable names?
> 
> 
> Thanks!
> Linda

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to