#leadingChar

Henrik Johansen Mon, 21 Oct 2013 02:38:20 -0700

As an added bonus, asInteger / asUnicode / codePoint / charCode / asciiValue 
would all share the same definition; ^value :)


Cheers,
Henry

P.S. codePoint is currently bugged, it should be ^self asUnicode
I'd hardly say the leadingChar-tagged value in potentially different character 
sets it currently returns meets the ANSI definition of: 
"Return the encoding value of the receiver in the implementation defined 
execution character set."


On Oct 21, 2013, at 11:18 , Henrik Johansen <[email protected]> 
wrote:

> 
> On Oct 18, 2013, at 6:34 , Sven Van Caekenberghe <[email protected]> wrote:
> 
>> Hi,
>> 
>> So once again we have an issue with Character>>#leadingChar, see
>> 
>> https://pharo.fogbugz.com/f/cases/6368
>> 
>> Do we really need this ?
>> Any Japanese, Chinese or Korean users willing to comment ?
>> 
>> Thx,
>> 
>> Sven
>> 
> 
> I'm not any of those, but my short answer would be no.
> 
> As for the long answer:
> LeadingChar has too many responsibilities:
> - Character set of string
> - Font selection (see StrikeFontSet)
> - Han unification disambiguation (through the above font selection)
> 
> The conflation of these, and confusion of which leadingChar actually implies, 
> easily leads to bugs, and has done so already. (see Character >> asUnicode as 
> opposed to JapaneseEnvironment >> fromJISX0208String: for example).
> I would bet 100€ StrikeFontSet no longer works as intended either, that is, 
> being able to display > latin1 glyphs using StrikeFonts. 
> 
> Now, here's why I feel those areas are not worth keeping, especially in their 
> current, buggy state:
> - Non-unicode character sets
> The main reasons for supporting this would be
> 1) Size reduction. All Widestrings are 32bits per character, so that's moot.
> 2) No need for converting codepoints when using Fonts stored with JISX0208 
> etc. codePoints . I've yet to see a free/truetype font using anything but 
> Unicode, and since we'd be the creators of any theoretical StrikeFontSet 
> covering other languages, we'd be able to avoid it anyways.
> 
> If, in the future, it'd be desirable to support encodings other than Unicode 
> for internal strings, I feel separate subclasses are a cleaner solution.
> 
> - Font selection / Han unification disambiguation
> IMHO, obsoleted by the use of standard TrueType fonts. As long as one does 
> not use StrikeFontSets to display a string, it currently has no benefits.
> Yes, one could potentially select different FreeTypeFonts based on it when a 
> run is encountered as well, but the fonts themselves do not contain metadata 
> pertaining to which variant of the glyphs they include, afaik (if they even 
> support them; automatic fallback to another font when current font doesn't 
> cover a  glyph would be a higher area of priority)
> Even in that case, it could be a property of the current locale instead, 
> while it means you can't display both korean/japanese text in the same image 
> correctly, it'd be a (imho) acceptable tradeoff.
> 
> Cheers,
> Henry
>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [Pharo-dev] Character>>#leadingChar

Reply via email to