On Aug 5, 2010, at 5:37 34PM, Norbert Hartl wrote:

> I'm trying to port the newest XML Parser from squeaksource to gemstone. In 
> XML-Parser-AlexandreBergel.73 there is a unicode test introduced with a 
> longer unicode xml snippet. From this release on I cannot load or merge 
> anything. 
> 
> Besides that the xml snippet looks very strange it loads in pharo but not in 
> gemstone. Unpacking the mcz on the console and examine the content showed 
> that the encoding is indeed weird. I don't know what it is but it is neither 
> ascii nor utf-8. Pharo loaded the snippet into a WideString instance.
> 
> How does monticello handle WideString instances when written to a file? 

Rather randomly. ;)

If you merely need to export a package so you can import in gemstone, you could 
change:

MCMczWriter >> addString: internalString at: path
        | member utfConverter utfStringStream|
        utfConverter := TextConverter newForEncoding: 'utf8'. "(Or whatever 
other format Gemstone thinks .mcz definitions will be)"
        utfStringStream := RWBinaryOrTextStream on: String new.
        utfStringStream binary.
        utfConverter class writeBOMOn: utfStringStream.
        utfStringStream ascii.
        utfConverter nextPutAll: internalString toStream: utfStringStream.
        member := zip addString: utfStringStream contents asString as: path.
        member desiredCompressionMethod: ZipArchive compressionDeflated 

(Alternatively use String new writeStream if you don't need/want to write BOM).

Doing changes like this in the base image is unlikely without further 
investigation, as it would probably break reading new packages (saved in proper 
utf8) containing WideStrings into old images.
I haven't read the import code, but if the binary format is preferred by old 
images if available, it might be a reasonable compromise saving the source in 
utf8, provided you also include the binary file.

Cheers,
Henry

PS. Another fun fact I encountered when porting Assets:
Monticello uses MethodReference>>source, which kindly converts all LF / CRLFs 
in your source / strings in the source to CR.
So you can forget f.ex. trying to save arbitrary ByteArrays as strings in your 
code, and expect them to work the same when converting back to ByteArray after 
saving to monticello :)


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to