On 04 Nov 2013, at 5:12 , Sven Van Caekenberghe <[email protected]> wrote:
>
> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream
> did not yet make use of the optimisations that I did for ZnCharacterEncoding
> some time ago. More specifically, they were not yet using
> #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream:
> which are overwritten for ZnUTF8Encoder with (super hacked) versions that
> assume most of the input will be ASCII (a reasonable assumption).
>
> I am still chasing a bug, but right now:
>
> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream
> binary))
> next: 65536; close ] bench.
>
> "135 per second.” BEFORE
> "3,310 per second.” AFTER
>
> But of course the input file is ASCII, so YMMV.
>
> I’ll let you know when I commit this code.
>
> Sven
Yeah… sooo, I loaded the updated version, great improvement for streams on
Latin1 content :D
Maybe it’s just me, but I tested with actual wide source as well (it was as
slow as you’d expect), and I think you need a notQuiteSoOptimizedReadInto*
which uses the normal Byte -> Wide become: conversion machinery.
Writing a ZnByteStringBecameWideString handler for every next: (and cousins)
call where source may be non-latin1 is a real chore/nasty surprise for those
used to dealing with legacy streams/converters...
You can sort of kinda make up for the performance hit (at least on these sizes)
with a faster replace:from:to:with:startingAt: in use after the fact, by using
basicAt:put: to the string, thus avoiding converting the replacement value to
character.
Cheers,
Henry
PS: Why is ZnByteStringBecameWideString a notification and not a resumable
exception? I would assume those who run into it without a handler would rather
have an actual error, than a result where their string has been read up to the
first wide char...
signature.asc
Description: Message signed with OpenPGP using GPGMail
