On 06 Nov 2013, at 13:56, Sven Van Caekenberghe <[email protected]> wrote:

> 
> On 06 Nov 2013, at 13:20, Henrik Johansen <[email protected]> 
> wrote:
> 
>> 
>> On 04 Nov 2013, at 5:12 , Sven Van Caekenberghe <[email protected]> wrote:
>> 
>>> 
>>> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream 
>>> did not yet make use of the optimisations that I did for 
>>> ZnCharacterEncoding some time ago. More specifically, they were not yet 
>>> using #next:putAll:startingAt:toStream: and 
>>> #readInto:startingAt:count:fromStream: which are overwritten for 
>>> ZnUTF8Encoder with (super hacked) versions that assume most of the input 
>>> will be ASCII (a reasonable assumption).
>>> 
>>> I am still chasing a bug, but right now:
>>> 
>>> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream 
>>> binary))
>>>     next: 65536; close ] bench. 
>>> 
>>>     "135 per second.” BEFORE
>>>     "3,310 per second.” AFTER
>>> 
>>> But of course the input file is ASCII, so YMMV.
>>> 
>>> I’ll let you know when I commit this code.
>>> 
>>> Sven
>> 
>> Yeah… sooo, I loaded the updated version, great improvement for streams on 
>> Latin1 content :D
>> 
>> Maybe it’s just me, but I tested with actual wide source as well (it was as 
>> slow as you’d expect), and I think you need a notQuiteSoOptimizedReadInto* 
>> which uses the normal Byte -> Wide become: conversion machinery.
>> Writing a ZnByteStringBecameWideString handler for every next: (and cousins) 
>> call where source may be non-latin1 is a real chore/nasty surprise for those 
>> used to dealing with legacy streams/converters... 
>> You can sort of kinda make up for the performance hit (at least on these 
>> sizes) with a faster replace:from:to:with:startingAt: in use after the fact, 
>> by using basicAt:put: to the string, thus avoiding converting the 
>> replacement value to character. 
>> 
>> Cheers,
>> Henry
>> 
>> PS: Why is ZnByteStringBecameWideString a notification and not a resumable 
>> exception? I would assume those who run into it without a handler would 
>> rather have an actual error, than a result where their string has been read 
>> up to the first wide char…
> 
> Henrik, thanks again for the feedback, it is really useful.
> 
> You are right, the ZnByteStringBecameWideString hack was introduced 
> specifically to avoid the become, because benchmarking showed that it took 
> too long. The original user is ZnStringEntity>>#readFrom: 
> 
> Now, if you know up front that your string will be wide anyway, then there is 
> #decodeBytesIntoWideString:
> 
> Yes, ZnByteStringBecameWideString should probably be a resumable exception, 
> as not handling it properly will lead to errors.
> 
> I will have to think and reconsider the generality/specificity of the 
> optimisation hack in ZnUTF8Encoder. I got some ideas, but it will take some 
> time.
> 
> At least the code works now, time to process the constructive criticism.

===
Name: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.28
Author: SvenVanCaekenberghe
Time: 13 November 2013, 3:04:00.925461 pm
UUID: f104e1ee-7fe5-4a23-a650-197dd9462783
Ancestors: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.27

ZnByteStringBecameWideString is now a resumable Error instead of a Notification 
(as suggested by henrik sperre johansen);
Added ZnByteStringBecameWideString>>#becomeForward convenience method
Fixed ZnCharacterReadStream>>#readInto:startingAt:count: to do the actual 
becomeForward when needed so that clients are not affected
===
Name: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.15
Author: SvenVanCaekenberghe
Time: 13 November 2013, 3:05:18.26391 pm
UUID: 8c41ee29-9906-4626-a9da-cb576980cd4c
Ancestors: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.14

Added a unit test to make sure 
ZnCharacterReadStream>>#readInto:startingAt:count: does an actual becomeForward 
when needed
===

At least the situation is correct now.

Sven


Reply via email to