Well, it is a difficult problem and there is no easy solution -- for scripts etc. you just have to use the POSIX locale, only then the behaviour is well defined. For your first example, yes, LC_COLLATE=C should be used. But your "[á-ú]" example is a good one for showing the difficulty: With the current behaviour one has at least an idea about what it will match, but using Unicode codepoint ordering this would also match '÷' and 'ø'... IMHO, those examples are not that realistic anyway, scripts often set LC_ALL=C for parsing the output of other programs and in situations where Unicode is really needed, things like [[:upper:]] mostly suffice.
But please also see bug 759849 and the comments, there seem to be some upstream changes regarding the issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/754272 Title: Range matching incorrect in UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/grep/+bug/754272/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
