As an added bonus, asInteger / asUnicode / codePoint / charCode / asciiValue would all share the same definition; ^value :)
Cheers, Henry P.S. codePoint is currently bugged, it should be ^self asUnicode I'd hardly say the leadingChar-tagged value in potentially different character sets it currently returns meets the ANSI definition of: "Return the encoding value of the receiver in the implementation defined execution character set." On Oct 21, 2013, at 11:18 , Henrik Johansen <[email protected]> wrote: > > On Oct 18, 2013, at 6:34 , Sven Van Caekenberghe <[email protected]> wrote: > >> Hi, >> >> So once again we have an issue with Character>>#leadingChar, see >> >> https://pharo.fogbugz.com/f/cases/6368 >> >> Do we really need this ? >> Any Japanese, Chinese or Korean users willing to comment ? >> >> Thx, >> >> Sven >> > > I'm not any of those, but my short answer would be no. > > As for the long answer: > LeadingChar has too many responsibilities: > - Character set of string > - Font selection (see StrikeFontSet) > - Han unification disambiguation (through the above font selection) > > The conflation of these, and confusion of which leadingChar actually implies, > easily leads to bugs, and has done so already. (see Character >> asUnicode as > opposed to JapaneseEnvironment >> fromJISX0208String: for example). > I would bet 100€ StrikeFontSet no longer works as intended either, that is, > being able to display > latin1 glyphs using StrikeFonts. > > Now, here's why I feel those areas are not worth keeping, especially in their > current, buggy state: > - Non-unicode character sets > The main reasons for supporting this would be > 1) Size reduction. All Widestrings are 32bits per character, so that's moot. > 2) No need for converting codepoints when using Fonts stored with JISX0208 > etc. codePoints . I've yet to see a free/truetype font using anything but > Unicode, and since we'd be the creators of any theoretical StrikeFontSet > covering other languages, we'd be able to avoid it anyways. > > If, in the future, it'd be desirable to support encodings other than Unicode > for internal strings, I feel separate subclasses are a cleaner solution. > > - Font selection / Han unification disambiguation > IMHO, obsoleted by the use of standard TrueType fonts. As long as one does > not use StrikeFontSets to display a string, it currently has no benefits. > Yes, one could potentially select different FreeTypeFonts based on it when a > run is encountered as well, but the fonts themselves do not contain metadata > pertaining to which variant of the glyphs they include, afaik (if they even > support them; automatic fallback to another font when current font doesn't > cover a glyph would be a higher area of priority) > Even in that case, it could be a property of the current locale instead, > while it means you can't display both korean/japanese text in the same image > correctly, it'd be a (imho) acceptable tradeoff. > > Cheers, > Henry >
signature.asc
Description: Message signed with OpenPGP using GPGMail
