Re: [Pharo-project] adding ISO-8859-15 and CP-1252 support

Philippe Marschall Wed, 18 Aug 2010 00:59:48 -0700

On 08/17/2010 04:55 PM, Henrik Johansen wrote:
> 
> On Aug 16, 2010, at 9:49 30PM, Philippe Marschall wrote:
> 
>> Hi
>>
>> I decided to write ISO-8859-15 and CP-1252 support [1] (mostly for
>> selfish reasons so that Seaside on Pharo would support ISO-8859-15 and
>> CP-1252).
> 
> 
> More converters are always nice :D
> Their code seems ok to me.
>>
>> A couple of notes:
>> - the five unmapped bytes of CP-1252 (not ISO-8859-15, the comment is
>> wrong) are mapped to the Unicode replacement character (U+FFFD)
>> - a new Latin9 language environment is introduced
>> - some minor clean up like removing unused class variables
>>
>> I'd appreciate it if somebody knowledgeable in these areas could review
>> the changes. I'm especially unsure about the Latin9 language
>> environment, but reusing Latin1 or Unicode seemed wrong.
> 
> I'm not sure its too wrong, according to EncodedCharSet comment: 
> "The other confusion comes from the name of "Latin1" class.  It used to mean 
> the Latin-1 (ISO-8859-1) character set, but now it primarily means that the 
> "Western European languages that are covered by the characters in Latin-1 
> character set."
> I'd reckon the same holds true for Latin1Environment (Western ), 
> Latin2Environment (Eastern), and Latin7Environment (Greek). I don't think 
> CP1252/8859-15 warrants the same as they are basically alternative encodings 
> to latin1 for western languages.
> 
> Also: 
> - leadingChar is used in StrikeFontSet to choose different glyph sets. This 
> allows for StrikeFonts supporting more than the default latin1 glyphs, seems 
> to me it would be "wrong" to use the same one for two different encodings. 
> Not sure why this approach was taken rather than allowing additional strike 
> font sets based on unicode code point ranges, then using leadingChar only to 
> differentiate when the visual glyphs for those code points would be 
> different. I suspect it maybe was developed to deal with Han unification 
> first, then reused to support multiple character sets later.
> 
> - LanguageEnvironment seems to have been used in conjunction with translation 
> (note the entire old translation system was removed in Pharo and replaced by 
> an external package), maybe to decide which encoding externally stored 
> translation files should be read in as.
> Then, having environments with overlapping supportedLanguages seem somewhat 
> weird as well.
> Modifying defaultEncodingName/systemConverterClass of Latin1Environment to 
> use CP1252 for some Windows systems (as per Latin2) may be another approach, 
> may or may not lead to unintended consequences elsewhere though, I did not 
> investigate all uses.
> 
> IMHO, for someone who wasn't involved in its developemnt, the whole 
> multilingual package could use some cleaning, more class comments, and 
> clearer statement of responsibilities.
> 
> Cheers,
> Henry
> 
> TLDR; 
> More converters: yay! 
> More LanguageEnvironments: o_O, not sure


OK, if nobody says it's a good idea and the right thing to do I'll drop
the LanguageEnvironment.

Cheers
Philippe


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] adding ISO-8859-15 and CP-1252 support

Reply via email to