That sounds good. We could even try to fallback to UT-32 if we encounter zeros (but his should be very rare...).
For write, ZipArchive are un-aware of any encoding... They use latin1. In Squeak, I could place some squeakToUTF8 sends in MCMczWriter, and equivalent UTF8TextConverter in Pharo #serializeDefinitions:, maybe this is needed in some other serialize* (version, dependencies who knows...) 2013/5/22 Norbert Hartl <[email protected]> > > > Am 22.05.2013 um 23:16 schrieb Nicolas Cellier < > [email protected]>: > > First thing would be to simplify #setConverterForCode and > #selectTextConverterForCode. > Do we still want to use a MacRomanTextConverter, seriously? I'm not even > sure I've got that many files with that encoding on my Mac-OSX... > Do we really need to put a ByteOrderMark for UTF-8, seriously? See > http://en.wikipedia.org/wiki/Byte_order_mark, it's valueless, and not > recommended. It were a Squeak way to specify that a Squeak source file > would use UTF-8 rather than MacRoman, but now this should be obsolescent. > > > A BOM for utf-8 does not make sense. It could act as a switch between > legacy encoding and utf-8. But it would also be a decision that will be > regretted shortly after. Most files in monticello are 7bit so there > wouldn't be a problem changing the default encoding. For every other file > an exception will be thrown. So reading utf-8 and on exception reading the > same thing in legacy might be a way to go. > > Norbert > > > > 2013/5/22 Nicolas Cellier <[email protected]> > >> >> http://stackoverflow.com/questions/16645848/squeak-monticello-character-encoding >> Let's kill this one, it's totally insane >> > >
