On Dec 6, 2013, at 3:15 PM, Nicolas Cellier 
<[email protected]> wrote:

> But MC should work better now that sources are UTF8 encoded (for a few 
> months).
> 
> The problem with old squeak/pharo/MC is that encoding did switch for 
> iso-8859L1 (latin1) to UTF32 if ever a wide character was encountered...
> But this wasn't done properly with the ugly text converters, basicNextPut: et 
> all, the generated stuff was indeed UTF32, but only N bytes would be written 
> instead of N characters !!! That means that you only stored (an can retrieve) 
> first 1/4 of source...
> But you can have more luck, because the ugglyness did not stop there: it's 
> possible that first buffers (4096 bytes) were already sent in latin1 
> encoding, and the next ones in UTF32 (with size bug). In which cas you can 
> retrieve a bit more of your sources.
> I have a prototype to decode such messy sources, but did not publish it, 
> since you can't recover the whole code anyway.

arghhhhhhh (deep sounds stef falling from a cliff :)
If somebody has time and knowledge to radically fix that please shot. 

> If ever you have problem with recent MC and improper UTF8 please, please 
> report.

For the moment I just have problem with importing old VisualWorks code into 
Pharo via fileIn :)

> 
> 
> 
> 2013/12/6 Stephan Eggermont <[email protected]>
> Ben wrote:
> >who put a ô in the code at the first place ? :P
> 
> Doesn’t happen often, I’m happy to observe. Strings in code
> with interesting characters are a much more common problem,
> though. Made it impossible to import MCs into Gemstone.
> 
> Stephan
> 

Reply via email to