[Bug 754272] Re: Range matching incorrect in UTF-8

Marcel Stimberg Sat, 18 Jun 2011 16:56:52 -0700

Well, it is a difficult problem and there is no easy solution -- for scripts 
etc. you just have to use the POSIX locale, only then the behaviour is well 
defined. For your first example, yes, LC_COLLATE=C should be used. But your 
"[á-ú]" example is a good one for showing the difficulty: With the current 
behaviour one has at least an idea about what it will match, but using Unicode 
codepoint ordering this would also match '÷' and 'ø'...
IMHO, those examples are not that realistic anyway, scripts often set LC_ALL=C 
for parsing the output of other programs and in situations where Unicode is 
really needed, things like [[:upper:]] mostly suffice.


But please also see bug 759849 and the comments, there seem to be some
upstream changes regarding the issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/754272

Title:
  Range matching incorrect in UTF-8

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grep/+bug/754272/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 754272] Re: Range matching incorrect in UTF-8

Reply via email to