Hi Pavel, On 31 Jan 2014, at 20:33, Pavel Krivanek <[email protected]> wrote:
> Hi, > > I was looking why we cannot do condenseChanges and the reason looks to be > clear: we do not have proper methods to move backwards (messages #back and > #oldBack) in UTF-8 files. The current implementation changes position by one > byte but in UTF-8 files it can be up to six bytes. And it's worse. We do not > have mechanism how a text converter could handle this. > Obvious solution is to rewrite condenseChanges to backup position somehow and > do not move backwards at all but it doesn't seem to be so easy. > Any ideas? > > And why do we need working condenseChanges? Because when we unload non-kernel > packages and load them back, the resultant changes file has about 51MB :-) > > Cheers, > -- Pavel Very interesting! Actually, I think this is fairly easy to do. Here is a prove of concept. It does not implement the exact semantics of either #back or #oldBack, but it does give the elementary building block to make it possible. Adding the following method: ZnUTF8Encoder>>backOnStream: stream [ (stream back bitAnd: 2r11000000) == 2r10000000 ] whileTrue Makes this possible: | encoder stream | encoder := ZnUTF8Encoder new. stream := (encoder encodeString: 'Les élèves Françaises') readStream. 4 timesRepeat: [ encoder nextFromStream: stream ]. encoder nextFromStream: stream. " => $é" encoder backOnStream: stream. encoder nextFromStream: stream. " => $é" 3 timesRepeat: [ encoder backOnStream: stream ]. encoder nextFromStream: stream. " => $s" Implementing #back would then be something like: | char | encoder backOnStream: stream. char := encoder nextFromStream: stream. encoder backOnStream: stream. ^ char to simulate the #peek, but that might not be needed for the caller. Of course, to do this for real would require a couple of good unit tests, as well as implementations for all encoders in the ZnCharacterEncoder hierarchy. What do you think ? Sven -- Sven Van Caekenberghe http://stfx.eu Smalltalk is the Red Pill
