On 04 Nov 2013, at 5:12 , Sven Van Caekenberghe <[email protected]> wrote:

> 
> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream 
> did not yet make use of the optimisations that I did for ZnCharacterEncoding 
> some time ago. More specifically, they were not yet using 
> #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: 
> which are overwritten for ZnUTF8Encoder with (super hacked) versions that 
> assume most of the input will be ASCII (a reasonable assumption).
> 
> I am still chasing a bug, but right now:
> 
> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream 
> binary))
>       next: 65536; close ] bench. 
> 
>       "135 per second.” BEFORE
>       "3,310 per second.” AFTER
> 
> But of course the input file is ASCII, so YMMV.
> 
> I’ll let you know when I commit this code.
> 
> Sven

Yeah… sooo, I loaded the updated version, great improvement for streams on 
Latin1 content :D

Maybe it’s just me, but I tested with actual wide source as well (it was as 
slow as you’d expect), and I think you need a notQuiteSoOptimizedReadInto* 
which uses the normal Byte -> Wide become: conversion machinery.
Writing a ZnByteStringBecameWideString handler for every next: (and cousins) 
call where source may be non-latin1 is a real chore/nasty surprise for those 
used to dealing with legacy streams/converters... 
You can sort of kinda make up for the performance hit (at least on these sizes) 
with a faster replace:from:to:with:startingAt: in use after the fact, by using 
basicAt:put: to the string, thus avoiding converting the replacement value to 
character. 

Cheers,
Henry

PS: Why is ZnByteStringBecameWideString a notification and not a resumable 
exception? I would assume those who run into it without a handler would rather 
have an actual error, than a result where their string has been read up to the 
first wide char...

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to