Re: [Pharo-project] adding ISO-8859-15 and CP-1252 support

Stéphane Ducasse Wed, 18 Aug 2010 03:52:25 -0700

may be you should contact yoshiki.

On Aug 18, 2010, at 9:59 AM, Philippe Marschall wrote:


> On 08/17/2010 04:55 PM, Henrik Johansen wrote:
>> 
>> On Aug 16, 2010, at 9:49 30PM, Philippe Marschall wrote:
>> 
>>> Hi
>>> 
>>> I decided to write ISO-8859-15 and CP-1252 support [1] (mostly for
>>> selfish reasons so that Seaside on Pharo would support ISO-8859-15 and
>>> CP-1252).
>> 
>> 
>> More converters are always nice :D
>> Their code seems ok to me.
>>> 
>>> A couple of notes:
>>> - the five unmapped bytes of CP-1252 (not ISO-8859-15, the comment is
>>> wrong) are mapped to the Unicode replacement character (U+FFFD)
>>> - a new Latin9 language environment is introduced
>>> - some minor clean up like removing unused class variables
>>> 
>>> I'd appreciate it if somebody knowledgeable in these areas could review
>>> the changes. I'm especially unsure about the Latin9 language
>>> environment, but reusing Latin1 or Unicode seemed wrong.
>> 
>> I'm not sure its too wrong, according to EncodedCharSet comment: 
>> "The other confusion comes from the name of "Latin1" class.  It used to mean 
>> the Latin-1 (ISO-8859-1) character set, but now it primarily means that the 
>> "Western European languages that are covered by the characters in Latin-1 
>> character set."
>> I'd reckon the same holds true for Latin1Environment (Western ), 
>> Latin2Environment (Eastern), and Latin7Environment (Greek). I don't think 
>> CP1252/8859-15 warrants the same as they are basically alternative encodings 
>> to latin1 for western languages.
>> 
>> Also: 
>> - leadingChar is used in StrikeFontSet to choose different glyph sets. This 
>> allows for StrikeFonts supporting more than the default latin1 glyphs, seems 
>> to me it would be "wrong" to use the same one for two different encodings. 
>> Not sure why this approach was taken rather than allowing additional strike 
>> font sets based on unicode code point ranges, then using leadingChar only to 
>> differentiate when the visual glyphs for those code points would be 
>> different. I suspect it maybe was developed to deal with Han unification 
>> first, then reused to support multiple character sets later.
>> 
>> - LanguageEnvironment seems to have been used in conjunction with 
>> translation (note the entire old translation system was removed in Pharo and 
>> replaced by an external package), maybe to decide which encoding externally 
>> stored translation files should be read in as.
>> Then, having environments with overlapping supportedLanguages seem somewhat 
>> weird as well.
>> Modifying defaultEncodingName/systemConverterClass of Latin1Environment to 
>> use CP1252 for some Windows systems (as per Latin2) may be another approach, 
>> may or may not lead to unintended consequences elsewhere though, I did not 
>> investigate all uses.
>> 
>> IMHO, for someone who wasn't involved in its developemnt, the whole 
>> multilingual package could use some cleaning, more class comments, and 
>> clearer statement of responsibilities.
>> 
>> Cheers,
>> Henry
>> 
>> TLDR; 
>> More converters: yay! 
>> More LanguageEnvironments: o_O, not sure
> 
> OK, if nobody says it's a good idea and the right thing to do I'll drop
> the LanguageEnvironment.
> 
> Cheers
> Philippe
> 
> 
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] adding ISO-8859-15 and CP-1252 support

Reply via email to