On 23.05.2013 00:06, Nicolas Cellier wrote:
That sounds good. We could even try to fallback to UT-32 if we
encounter zeros (but his should be very rare...).
For write, ZipArchive are un-aware of any encoding... They use latin1.
In Squeak, I could place some squeakToUTF8 sends in MCMczWriter, and
equivalent UTF8TextConverter in Pharo #serializeDefinitions:, maybe
this is needed in some other serialize* (version, dependencies who
knows...)
That won't work, if the file contained sources for both widestring and
bytestring sourced methods.
In which case the file would contain code stored BOTH as latin1 bytes,
and (same endianness as platform saved from) UTF32.
Which means you'd have to detect and handle jumps back and forth in
encoding when reading...
IMHO, just consider those files lost beyond hope.
Cheers,
Henry