But MC should work better now that sources are UTF8 encoded (for a few months).
The problem with old squeak/pharo/MC is that encoding did switch for iso-8859L1 (latin1) to UTF32 if ever a wide character was encountered... But this wasn't done properly with the ugly text converters, basicNextPut: et all, the generated stuff was indeed UTF32, but only N bytes would be written instead of N characters !!! That means that you only stored (an can retrieve) first 1/4 of source... But you can have more luck, because the ugglyness did not stop there: it's possible that first buffers (4096 bytes) were already sent in latin1 encoding, and the next ones in UTF32 (with size bug). In which cas you can retrieve a bit more of your sources. I have a prototype to decode such messy sources, but did not publish it, since you can't recover the whole code anyway. If ever you have problem with recent MC and improper UTF8 please, please report. 2013/12/6 Stephan Eggermont <[email protected]> > Ben wrote: > >who put a ô in the code at the first place ? :P > > Doesn’t happen often, I’m happy to observe. Strings in code > with interesting characters are a much more common problem, > though. Made it impossible to import MCs into Gemstone. > > Stephan >
