On Sat, Jun 18, 2011 at 6:31 PM, Marcel Stimberg
<[email protected]>wrote:

> Thank you for your bug report and taking the time to make Ubuntu better.
>
> The behaviour you are seeing is correct, the man page explicitely mentions
> that ranges take the locale collation settings into account:
> "Within a bracket expression, a range expression consists of two characters
> separated by a hyphen.  It matches any single character that sorts between
> the two characters, inclusive, using the locale's collating  sequence  and
> character set.  For example, in the default C locale, [a-d] is equivalent to
> [abcd].  Many locales sort characters in dictionary order, and in these
> locales [a-d] is typically not equivalent to [abcd]; it might be equivalent
> to [aBbCcDd], for example.  To obtain the traditional interpretation of
> bracket expressions, you can use the C locale by setting the LC_ALL
> environment variable to the value C."
>

Documenting a bug does not make it correct behavior.  This renders ranges
essentially useless, and obscure, brittle workarounds shouldn't be required
to get usable ranges.  When in a Unicode locale, it should by default use
Unicode codepoint ordering for ranges, and only use LC_COLLATE when
explicitly told to.

That workaround isn't correct, either.  It breaks '[A-Z][ÀÁÂÃÄÅ]', for
example.  Using LC_COLLATE instead of LC_ALL will avoid disabling Unicode
entirely, but still results in broken ranges; [á-ú] will fail outright.

-- 
Glenn Maynard

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/754272

Title:
  Range matching incorrect in UTF-8

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grep/+bug/754272/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to