Re: [Pharo-project] squeakToUTF-8 and related?

Nicolas Cellier Mon, 29 Mar 2010 02:16:45 -0700

I presume that under the idiom "latin1" you refer to code page 1252
rather than iso8859-L1, right ?


Nicolas


2010/3/29 Henrik Johansen <[email protected]>:
>
> On Mar 28, 2010, at 4:36 13PM, Stéphane Ducasse wrote:
>
>> Hi
>>
>> I'm trying to remember the situation with the internal representation of 
>> string in pharo/squeak
>> to revise 
>> http://book.seaside.st/book/in-action/serving-files/character-encodings/seaside-pharo
>>
>> I saw that in pharo we have this NonASCIIMap. I do not remember what have 
>> been done in pharo.
>> Argh memory leaks.... Nicolas do you remember the situation?
> NonASCIIMap is used for quickly determining whether the string with no 
> character codes > 127 (ie only ascii characters).
> It's very useful for doing primitive accellerated isAsciiString, which in the 
> case of ascii-compatible encodings (utf8, latin1, macroman, etc.) would mean 
> no conversion is required for it to be the "appropriate" internal bytestring 
> format.
> It's used f.ex. in the nextChunk code,
> Strangely it is also used in  FileStream writeSourceCodeFrom: baseName: isSt: 
> , for some reason we there use a MacRoman if stream contents isAscii, which 
> really makes no sense, but whatever.
>
> John pointed out some converters were lying, I'm not entirely sure that's 
> true anymore, what IS certain though, is the external code format used is 
> inconsistent, depending on from where/how you save/load it.
> It really should be cleaned up to always store in utf8, and possibly also 
> latin1 if possible.
> All this should be cleared up to always try reading as UTF8, then raising an 
> InvalidUTF8 error which can be handled by telling it to use a different 
> converter and restart.
> Possibly chosen from a menu when dropping a file on image, or choosing an 
> alternative automatically if we know the possible other encodings a file 
> could have been saved as,  not sure how to best do it for scripts given as 
> parameters when launching the vm
>
> On the font rendering side, I agree with Nicolas it's too complicated doing 
> font rendering in-image, FT is an ok compromise though.
> As for the bitmap strikefont rendering, what is really needed is a way to 
> specify the charset it represents, and mappings from the internal string 
> encodings to its glyphs.
> F.ex., Bitmap DejaVu is really latin15, so it will currently render some 
> ByteString characters incorrectly, as well as render some Unicode chars it 
> really has glyphs for as ?. (such as the euro sign)
>
> Which all really has nothing to do with your initial question :)
> The internal representation of strings really hasn't changed since it was 
> written, with the exception that leadingChar for WideStrings are now zero.
> As far as I can tell, that means the interal storage format of widestrings is 
> now equivalent to utf32, not sure what Byte Order it uses though, or if that 
> is even consistent across platforms. :)
>
> The point about using WaKomEncoded, and passing all strings going into/out of 
> the image through an encoder is still valid.
>
> Cheers,
> Henry
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] squeakToUTF-8 and related?

Reply via email to