Bug#441785: Umlauts break syntax highlighting

2007-09-12 Thread martin f krafft
I boiled down the problem to the vimoutliner syntax definition:

  :syntax region OL2 start=+^\t[^:\t]+ end=+^\t[^:\t]+me=e-2
  
contains=outlTags,BT2,BT3,PT2,PT3,TA2,TA3,UT2,UT3,UB2,UB3,spellErr,SpellErrors,BadWord,OL3
  keepend

The problem here is the offset for end: me=e-2. This basically means
that at level 2 (one leading tab), the match region ends on the
first character that's also at level 2 (unless it encounters a match
region not in the set specified by contains), minus 2 (the character
and the leading tab).

Vim seems to use bytes instead of characters here though: the syntax
highlighting only breaks when a UTF8 character is the first of the
heading, in which case the me=e-2 offset somehow gets lost and the
OL2 region is extended to the *next* level 2 heading.

Using me=e-3 or me=e-1 both work, which really does not make sense
to me.

Sven, this remains a bug in vim, I think, and I don't see a way to
work around it in vimoutliner. If you want to help fix it, bring up
the issue on the vim mailing list (and CC this bug report).

-- 
 .''`.   martin f. krafft [EMAIL PROTECTED]
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)


Bug#441785: Umlauts break syntax highlighting

2007-09-11 Thread Sven Bischof
Package: vim-vimoutliner
Version: 0.3.4-8

Umlauts break syntax highlighting.

$ locale
LANG=en_US
LC_CTYPE=zh_CN.UTF8
LC_NUMERIC=en_US
LC_TIME=en_US
LC_COLLATE=en_US
LC_MONETARY=en_US
LC_MESSAGES=en_US
LC_PAPER=en_US
LC_NAME=en_US
LC_ADDRESS=en_US
LC_TELEPHONE=en_US
LC_MEASUREMENT=en_US
LC_IDENTIFICATION=en_US
LC_ALL=

$ locale -a
C
de_CH
de_CH.iso88591
de_CH.utf8
[EMAIL PROTECTED]
[EMAIL PROTECTED]
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
POSIX
ru_RU.koi8r
ru_RU.utf8
russian
zh_CN
zh_CN.gb18030
zh_CN.gb2312
zh_CN.gbk
zh_CN.utf8
zh_TW
zh_TW.big5
zh_TW.utf8



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#441785: Umlauts break syntax highlighting

2007-09-11 Thread martin f krafft
reassign 441785 vim
retitle 441785 vim's POSIX regexp classes don't honour LC_CTYPE properly
thanks

also sprach Sven Bischof [EMAIL PROTECTED] [2007.09.11.1010 +0200]:
 Umlauts break syntax highlighting.
 
 $ locale
 LANG=en_US
 LC_CTYPE=zh_CN.UTF8

It appears to me as if this is a bug in vim, which does not include
a character such as ä in the class [[:alpha:]]. However, with
a Unicode charset, [[:alpha:]] seems to be defined to include any
kind of letter from any language

  http://www.regular-expressions.info/posixbrackets.html
  http://www.regular-expressions.info/unicode.html

See this:

  $ export LC_CTYPE=zh_CN.UTF8
  $ echo a  a
  $ echo ä  ä
  $ file a ä
  a: ASCII text
  ä: UTF-8 Unicode text
  $ grep '[[:alpha:]]' a ä
  a:a
  ä:ä
  $ vim -es +'argdo g/[[:alpha:]]' +':q!' a ä
  a

The problem is the same if I use the de_CH.UTF8 locale.

Thanks,

-- 
 .''`.   martin f. krafft [EMAIL PROTECTED]
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)