Hi guys I would love to see some effort cleaning this part (removing the leading char) and using Unicode. It will simplify a lot from what I understand. Who would like to think about a roadmap and share some effort?
Stef On Oct 21, 2013, at 11:37 AM, Henrik Johansen <[email protected]> wrote: > As an added bonus, asInteger / asUnicode / codePoint / charCode / asciiValue > would all share the same definition; ^value :) > > Cheers, > Henry > > P.S. codePoint is currently bugged, it should be ^self asUnicode > I'd hardly say the leadingChar-tagged value in potentially different > character sets it currently returns meets the ANSI definition of: > "Return the encoding value of the receiver in the implementation defined > execution character set." > > > On Oct 21, 2013, at 11:18 , Henrik Johansen <[email protected]> > wrote: > >> >> On Oct 18, 2013, at 6:34 , Sven Van Caekenberghe <[email protected]> wrote: >> >>> Hi, >>> >>> So once again we have an issue with Character>>#leadingChar, see >>> >>> https://pharo.fogbugz.com/f/cases/6368 >>> >>> Do we really need this ? >>> Any Japanese, Chinese or Korean users willing to comment ? >>> >>> Thx, >>> >>> Sven >>> >> >> I'm not any of those, but my short answer would be no. >> >> As for the long answer: >> LeadingChar has too many responsibilities: >> - Character set of string >> - Font selection (see StrikeFontSet) >> - Han unification disambiguation (through the above font selection) >> >> The conflation of these, and confusion of which leadingChar actually >> implies, easily leads to bugs, and has done so already. (see Character >> >> asUnicode as opposed to JapaneseEnvironment >> fromJISX0208String: for >> example). >> I would bet 100€ StrikeFontSet no longer works as intended either, that is, >> being able to display > latin1 glyphs using StrikeFonts. >> >> Now, here's why I feel those areas are not worth keeping, especially in >> their current, buggy state: >> - Non-unicode character sets >> The main reasons for supporting this would be >> 1) Size reduction. All Widestrings are 32bits per character, so that's moot. >> 2) No need for converting codepoints when using Fonts stored with JISX0208 >> etc. codePoints . I've yet to see a free/truetype font using anything but >> Unicode, and since we'd be the creators of any theoretical StrikeFontSet >> covering other languages, we'd be able to avoid it anyways. >> >> If, in the future, it'd be desirable to support encodings other than Unicode >> for internal strings, I feel separate subclasses are a cleaner solution. >> >> - Font selection / Han unification disambiguation >> IMHO, obsoleted by the use of standard TrueType fonts. As long as one does >> not use StrikeFontSets to display a string, it currently has no benefits. >> Yes, one could potentially select different FreeTypeFonts based on it when a >> run is encountered as well, but the fonts themselves do not contain metadata >> pertaining to which variant of the glyphs they include, afaik (if they even >> support them; automatic fallback to another font when current font doesn't >> cover a glyph would be a higher area of priority) >> Even in that case, it could be a property of the current locale instead, >> while it means you can't display both korean/japanese text in the same image >> correctly, it'd be a (imho) acceptable tradeoff. >> >> Cheers, >> Henry >> >
