On Dec 6, 2013, at 3:15 PM, Nicolas Cellier <[email protected]> wrote:
> But MC should work better now that sources are UTF8 encoded (for a few > months). > > The problem with old squeak/pharo/MC is that encoding did switch for > iso-8859L1 (latin1) to UTF32 if ever a wide character was encountered... > But this wasn't done properly with the ugly text converters, basicNextPut: et > all, the generated stuff was indeed UTF32, but only N bytes would be written > instead of N characters !!! That means that you only stored (an can retrieve) > first 1/4 of source... > But you can have more luck, because the ugglyness did not stop there: it's > possible that first buffers (4096 bytes) were already sent in latin1 > encoding, and the next ones in UTF32 (with size bug). In which cas you can > retrieve a bit more of your sources. > I have a prototype to decode such messy sources, but did not publish it, > since you can't recover the whole code anyway. arghhhhhhh (deep sounds stef falling from a cliff :) If somebody has time and knowledge to radically fix that please shot. > If ever you have problem with recent MC and improper UTF8 please, please > report. For the moment I just have problem with importing old VisualWorks code into Pharo via fileIn :) > > > > 2013/12/6 Stephan Eggermont <[email protected]> > Ben wrote: > >who put a ô in the code at the first place ? :P > > Doesn’t happen often, I’m happy to observe. Strings in code > with interesting characters are a much more common problem, > though. Made it impossible to import MCs into Gemstone. > > Stephan >
