Re: [Bug 754272] Re: Range matching incorrect in UTF-8

Marcel Stimberg Sun, 19 Jun 2011 02:52:15 -0700

I appreciate you have a much stronger opinion on this than I do :-)

> However, I doubt anyone who hasn't
> already been bitten by this issue would ever expect this:
anyone who is using regex ranges in a non-POSIX locale should be aware
of this, it has been like this for quite a while now and is mostly
consistent across tools like bash and grep.


> The problem is that collation is meant for collation; it's unsuitable for
> range matching.
That is your opinion, but you'd have to change the POSIX standard for
it being the official view[1]:
"The LC_COLLATE category provides a collation sequence definition for
[...] regular expression matching"

> Other examples where you want ranges by codepoint order, just off the top of
> my head: '[ぁ-ヾｦ-ﾟ]' (incomplete) to match Japanese hiragana and katakana;
I think these examples are good examples for *not* using codepoint
ranges -- in these case you'd rather want to use Unicode character
classes like \p{InHiragana} etc.These are supported by perl regexes at
least, I'm not sure of the current status in grep.

Either way, this is certainly a fundamental issue that is not
appropriate for changes specifically to Ubuntu. I'd therefore
recommend to raise your concerns e.g. in the grep or libc mailing
lists.

[1]http://pubs.opengroup.org/onlinepubs/9699919799/ -- 7.3.2 LC_COLLATE

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/754272

Title:
  Range matching incorrect in UTF-8

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grep/+bug/754272/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 754272] Re: Range matching incorrect in UTF-8

Reply via email to