On Oct 21, 2013, at 2:13 , Igor Stasenko <[email protected]> wrote:
> > > > On 21 October 2013 13:41, Henrik Johansen <[email protected]> > wrote: > > On Oct 21, 2013, at 11:56 , Henrik Johansen <[email protected]> > wrote: > >> My guess is most if not all systems which do support that case, implement a >> higher level abstraction than "String" to take care of it. > > … or not :) > http://www.unicode.org/faq/vs.html > http://www.unicode.org/reports/tr37/ > > Which means, you can solve the problems caused by Han Unification using > standard Unicode. > Seems like a lot of fun to implement support for :) > > > Why we should care? > > We define a string as a sequence of Characters. > We say that every Character can be uniquely identified by its unicode value. > > and we say nothing about things like locale, language etc.. because it is > higher level concepts, e.g. > things like mapping unicode value (or sequence of them) into sequence of > glyphs to display on screen, using whatever font, is outside of scope of > 'String' definition. > > Cheers, > Henry The proposition was that leadingChar might be valuable on the grounds that Unicode doesn't allow you to differentiate Korean/Japanese characters in the same document. Variation sequences show Unicode has acquired a built-in mechanism for doing just that, so the proposition is false. The work that would be involved in implementing support for them in paths from User input -> String instance and String -> Glyph display is outside the definition of a String, sure, but work nonetheless. Maybe I should've put quotes around "fun". :P One would probably also need to deal with them if implementing Unicode functionality that arguably *is* within String scope btw, such as equality, collation and normalization. Just because Strings currently treat code point = character, doesn't make it 100% correct :) Cheers, Henry
signature.asc
Description: Message signed with OpenPGP using GPGMail
