On Sat, Jun 18, 2011 at 6:31 PM, Marcel Stimberg <[email protected]>wrote:
> Thank you for your bug report and taking the time to make Ubuntu better. > > The behaviour you are seeing is correct, the man page explicitely mentions > that ranges take the locale collation settings into account: > "Within a bracket expression, a range expression consists of two characters > separated by a hyphen. It matches any single character that sorts between > the two characters, inclusive, using the locale's collating sequence and > character set. For example, in the default C locale, [a-d] is equivalent to > [abcd]. Many locales sort characters in dictionary order, and in these > locales [a-d] is typically not equivalent to [abcd]; it might be equivalent > to [aBbCcDd], for example. To obtain the traditional interpretation of > bracket expressions, you can use the C locale by setting the LC_ALL > environment variable to the value C." > Documenting a bug does not make it correct behavior. This renders ranges essentially useless, and obscure, brittle workarounds shouldn't be required to get usable ranges. When in a Unicode locale, it should by default use Unicode codepoint ordering for ranges, and only use LC_COLLATE when explicitly told to. That workaround isn't correct, either. It breaks '[A-Z][ÀÁÂÃÄÅ]', for example. Using LC_COLLATE instead of LC_ALL will avoid disabling Unicode entirely, but still results in broken ranges; [á-ú] will fail outright. -- Glenn Maynard -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/754272 Title: Range matching incorrect in UTF-8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/grep/+bug/754272/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
