Re: [Pharo-dev] back and oldBack in UTF-8 files

Sven Van Caekenberghe Fri, 31 Jan 2014 14:20:31 -0800

Hi Pavel,

On 31 Jan 2014, at 20:33, Pavel Krivanek <[email protected]> wrote:


> Hi, 
> 
> I was looking why we cannot do condenseChanges and the reason looks to be 
> clear: we do not have proper methods to move backwards (messages #back and 
> #oldBack) in UTF-8 files. The current implementation changes position by one 
> byte but in UTF-8 files it can be up to six bytes. And it's worse. We do not 
> have mechanism how a text converter could handle this. 
> Obvious solution is to rewrite condenseChanges to backup position somehow and 
> do not move backwards at all but it doesn't seem to be so easy.
> Any ideas?
> 
> And why do we need working condenseChanges? Because when we unload non-kernel 
> packages and load them back, the resultant changes file has about 51MB :-)
> 
> Cheers,
> -- Pavel

Very interesting!

Actually, I think this is fairly easy to do. Here is a prove of concept. It 
does not implement the exact semantics of either #back or #oldBack, but it does 
give the elementary building block to make it possible.

Adding the following method:

ZnUTF8Encoder>>backOnStream: stream
  [ (stream back bitAnd: 2r11000000) == 2r10000000 ] whileTrue

Makes this possible:

| encoder stream |
encoder := ZnUTF8Encoder new.
stream := (encoder encodeString: 'Les élèves Françaises') readStream.
4 timesRepeat: [ encoder nextFromStream: stream ].
encoder nextFromStream: stream. " => $é"
encoder backOnStream: stream.
encoder nextFromStream: stream. " => $é"
3 timesRepeat: [ encoder backOnStream: stream ].
encoder nextFromStream: stream. " => $s"

Implementing #back would then be something like:

| char |
encoder backOnStream: stream.
char := encoder nextFromStream: stream.
encoder backOnStream: stream.
^ char

to simulate the #peek, but that might not be needed for the caller.

Of course, to do this for real would require a couple of good unit tests, as 
well as implementations for all encoders in the ZnCharacterEncoder hierarchy.

What do you think ?

Sven

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill

Re: [Pharo-dev] back and oldBack in UTF-8 files

Reply via email to