On 06 Nov 2013, at 13:56, Sven Van Caekenberghe <[email protected]> wrote:
> > On 06 Nov 2013, at 13:20, Henrik Johansen <[email protected]> > wrote: > >> >> On 04 Nov 2013, at 5:12 , Sven Van Caekenberghe <[email protected]> wrote: >> >>> >>> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream >>> did not yet make use of the optimisations that I did for >>> ZnCharacterEncoding some time ago. More specifically, they were not yet >>> using #next:putAll:startingAt:toStream: and >>> #readInto:startingAt:count:fromStream: which are overwritten for >>> ZnUTF8Encoder with (super hacked) versions that assume most of the input >>> will be ASCII (a reasonable assumption). >>> >>> I am still chasing a bug, but right now: >>> >>> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream >>> binary)) >>> next: 65536; close ] bench. >>> >>> "135 per second.” BEFORE >>> "3,310 per second.” AFTER >>> >>> But of course the input file is ASCII, so YMMV. >>> >>> I’ll let you know when I commit this code. >>> >>> Sven >> >> Yeah… sooo, I loaded the updated version, great improvement for streams on >> Latin1 content :D >> >> Maybe it’s just me, but I tested with actual wide source as well (it was as >> slow as you’d expect), and I think you need a notQuiteSoOptimizedReadInto* >> which uses the normal Byte -> Wide become: conversion machinery. >> Writing a ZnByteStringBecameWideString handler for every next: (and cousins) >> call where source may be non-latin1 is a real chore/nasty surprise for those >> used to dealing with legacy streams/converters... >> You can sort of kinda make up for the performance hit (at least on these >> sizes) with a faster replace:from:to:with:startingAt: in use after the fact, >> by using basicAt:put: to the string, thus avoiding converting the >> replacement value to character. >> >> Cheers, >> Henry >> >> PS: Why is ZnByteStringBecameWideString a notification and not a resumable >> exception? I would assume those who run into it without a handler would >> rather have an actual error, than a result where their string has been read >> up to the first wide char… > > Henrik, thanks again for the feedback, it is really useful. > > You are right, the ZnByteStringBecameWideString hack was introduced > specifically to avoid the become, because benchmarking showed that it took > too long. The original user is ZnStringEntity>>#readFrom: > > Now, if you know up front that your string will be wide anyway, then there is > #decodeBytesIntoWideString: > > Yes, ZnByteStringBecameWideString should probably be a resumable exception, > as not handling it properly will lead to errors. > > I will have to think and reconsider the generality/specificity of the > optimisation hack in ZnUTF8Encoder. I got some ideas, but it will take some > time. > > At least the code works now, time to process the constructive criticism. === Name: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.28 Author: SvenVanCaekenberghe Time: 13 November 2013, 3:04:00.925461 pm UUID: f104e1ee-7fe5-4a23-a650-197dd9462783 Ancestors: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.27 ZnByteStringBecameWideString is now a resumable Error instead of a Notification (as suggested by henrik sperre johansen); Added ZnByteStringBecameWideString>>#becomeForward convenience method Fixed ZnCharacterReadStream>>#readInto:startingAt:count: to do the actual becomeForward when needed so that clients are not affected === Name: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.15 Author: SvenVanCaekenberghe Time: 13 November 2013, 3:05:18.26391 pm UUID: 8c41ee29-9906-4626-a9da-cb576980cd4c Ancestors: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.14 Added a unit test to make sure ZnCharacterReadStream>>#readInto:startingAt:count: does an actual becomeForward when needed === At least the situation is correct now. Sven
