On Sun, Jul 26, 2009 at 11:29 PM, Marcin Hanclik<[email protected]> wrote: > Hi, > > Given the fact that > > "Rule names are case insensitive." > http://tools.ietf.org/html/rfc5234#section-2.1 > > it could potentially be better to rename the rule from "utf8-char" to > something else, since it may get confused with "UTF8-char" rule from > http://tools.ietf.org/html/rfc3629#section-4. > > Taken into account my comments in the mail below, we could have new rule > replacing utf8-char: > zip-UTF8-char = UTF8-2 / UTF8-3 / UTF8-4 > UTF8-2 = %xC2-DF UTF8-tail > UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / > %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) > UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) / > %xF4 %x80-8F 2( UTF8-tail ) > UTF8-tail = %x80-BF > > The problem may be with the allowed ranges of the Unicode characters. > The above grammar seems to allow 0080-10FFFF (the UTF-16 accessible range > minus characters < 0080) > http://tools.ietf.org/html/rfc3629#section-3 > whereas the current utf8-char rule is more selective. >
Unless it's broken (?), I would prefer to leave it as is. -- Marcos Caceres http://datadriven.com.au
