Richard,
On Monday, March 18, 2013, Richard Wordingham wrote: > On Mon, 18 Mar 2013 21:07:27 +0000 > "Whistler, Ken" <[email protected] <javascript:;>> wrote: > > It seems to me that the more > > significant issue here would be whether the enclosing combining marks > > are present, whether or not any variation selectors are present. So: > > > > <U+0031, U+20E3, U+0032, U+20E3> > > > > Isn't much different, for this purpose, than: > > > > <U+0031, U+FE0F, U+20E3, U+0032, U+FE0F, U+20E3> > > > > I wouldn't really expect most processes to recognize either of those > > sequences as "a number" for parsing purposes. > > Nor I, as they're not much closer that <U+2460 CIRCLED DIGIT ONE, > U+2461 CIRCLED DIGIT TWO>. That is why I wondered if one could argue > that their use in multi-digit numbers was not playing the game, and > therefore one should not be surprised if things went wrong. > It's not much different from <U+0031, U+26C4, U+0032> or <U+0031, U+0620, U+0032> either. Parsed as a number, only <U+0031> is a number. Parsing would stop after this codepoint. > The issue is rather with emphatically plain text <U+0031, U+FE0E, > U+0032, U+FE0E>. > It's the same situation to something like an implementation of LDML number parsing. U+FE0E is not part of a number. > > 123456 versus 123<ZWJ>456 versus 123<LRM>456 > > LRM is misplaced if not totally pointless, but in general ZWJ is a fair > point. So, numeric tailoring (one of the standard UCA parametric > tailoring options, remember) was already potentially broken. > 10<ZWJ>0<ZWJ>0 would be perfectly reasonable for text likely to be > rendered by a cursive Latin font. > Identifying such an edge case does not prove that numeric tailoring is broken. -s

