Hi guys

I would love to see some effort cleaning this part (removing the leading char) 
and using Unicode.
It will simplify a lot from what I understand. 
Who would like to think about a roadmap and share some effort?

Stef


On Oct 21, 2013, at 11:37 AM, Henrik Johansen <[email protected]> 
wrote:

> As an added bonus, asInteger / asUnicode / codePoint / charCode / asciiValue 
> would all share the same definition; ^value :)
> 
> Cheers,
> Henry
> 
> P.S. codePoint is currently bugged, it should be ^self asUnicode
> I'd hardly say the leadingChar-tagged value in potentially different 
> character sets it currently returns meets the ANSI definition of: 
> "Return the encoding value of the receiver in the implementation defined 
> execution character set."
> 
> 
> On Oct 21, 2013, at 11:18 , Henrik Johansen <[email protected]> 
> wrote:
> 
>> 
>> On Oct 18, 2013, at 6:34 , Sven Van Caekenberghe <[email protected]> wrote:
>> 
>>> Hi,
>>> 
>>> So once again we have an issue with Character>>#leadingChar, see
>>> 
>>> https://pharo.fogbugz.com/f/cases/6368
>>> 
>>> Do we really need this ?
>>> Any Japanese, Chinese or Korean users willing to comment ?
>>> 
>>> Thx,
>>> 
>>> Sven
>>> 
>> 
>> I'm not any of those, but my short answer would be no.
>> 
>> As for the long answer:
>> LeadingChar has too many responsibilities:
>> - Character set of string
>> - Font selection (see StrikeFontSet)
>> - Han unification disambiguation (through the above font selection)
>> 
>> The conflation of these, and confusion of which leadingChar actually 
>> implies, easily leads to bugs, and has done so already. (see Character >> 
>> asUnicode as opposed to JapaneseEnvironment >> fromJISX0208String: for 
>> example).
>> I would bet 100€ StrikeFontSet no longer works as intended either, that is, 
>> being able to display > latin1 glyphs using StrikeFonts. 
>> 
>> Now, here's why I feel those areas are not worth keeping, especially in 
>> their current, buggy state:
>> - Non-unicode character sets
>> The main reasons for supporting this would be
>> 1) Size reduction. All Widestrings are 32bits per character, so that's moot.
>> 2) No need for converting codepoints when using Fonts stored with JISX0208 
>> etc. codePoints . I've yet to see a free/truetype font using anything but 
>> Unicode, and since we'd be the creators of any theoretical StrikeFontSet 
>> covering other languages, we'd be able to avoid it anyways.
>> 
>> If, in the future, it'd be desirable to support encodings other than Unicode 
>> for internal strings, I feel separate subclasses are a cleaner solution.
>> 
>> - Font selection / Han unification disambiguation
>> IMHO, obsoleted by the use of standard TrueType fonts. As long as one does 
>> not use StrikeFontSets to display a string, it currently has no benefits.
>> Yes, one could potentially select different FreeTypeFonts based on it when a 
>> run is encountered as well, but the fonts themselves do not contain metadata 
>> pertaining to which variant of the glyphs they include, afaik (if they even 
>> support them; automatic fallback to another font when current font doesn't 
>> cover a  glyph would be a higher area of priority)
>> Even in that case, it could be a property of the current locale instead, 
>> while it means you can't display both korean/japanese text in the same image 
>> correctly, it'd be a (imho) acceptable tradeoff.
>> 
>> Cheers,
>> Henry
>> 
> 

Reply via email to