On Wed, May 22, 2013 at 3:57 PM, Nicolas Cellier <[email protected]> wrote: > MC never wrote a BOM, so we don't have to be compatible with BOM. > > If we can simplify the process, let's simplify, because maintaining useless > compatibility costs, the code is really crooked by now, and this leads to > mis-understanding, and soon to broken features and noise. Currently, > snapshot/source.st IS broken.
For a long time, yes. > If there are codes > 127, the UTF8TextConverter will most likely fail, and I > like the idea of Norbert to retry with a legacy encoding. This way, we put > crooked compatibility layer in exceptional handling. > > This will also simplify the MC readers/writers in VW, gst, Gemstone, ... > > Even for the legacy code, I wonder if MacRoman would be the right choice. MC > never encoded the strings and always wrote the codes as is. Right. I now remember the pain. > So, setEncoderForCode is here for maintaining compatibility with MC > snapshot/source.st written from an old image where internal String encoding > was MacRoman - when was it, 3.7? Are there really many of these? > > I bet 99% of MC-files are encoded in latin-1 but decoded with MacRoman if we > go through a MczInstaller... > > Of course, MC now uses snapshot.bin rather than snapshot/source.st. > Did old versions of MC failed to write snapshot.bin? > > Eventually, we can set a Preferences in Squeak for ultra old legacy encoding > (not in Pharo, I guess Pharo should not care at all). For Pharo, I'd guess so, too. (I heard that the Japanese support is pretty much dropped in Pharo.) -- -- Yoshiki
