On 04 Nov 2013, at 22:25, Henrik Sperre Johansen <henrik.s.johan...@veloxit.no> 
wrote:

> That's great!
> Remembering that commit message was part of the reason for benching, was sort 
> of disappointed there was no significant difference between Zn in 2.0 and 
> latest 3.0...
> 
> I guess with the amount of hacks accumulating, it is indeed turning into a 
> worthy successor of MultiByteFileStream ;)

Bah, the hacking is limited to one method ( 
#optimizedReadInto:startingAt:count:fromStream: ) and the readable version is 
still present and used ( #originalReadInto:startingAt:count:fromStream: ). It 
is encapsulated and the rest of the code remains quite clear.

> Cheers,
> Henry
> 
> P.S: If you want another delightful one in the same vein (both from WTFy-ness 
> and perf improvement POV), take a gander at UTF16TextConverter >> 
> nextPutByteString:toStream: 

Pretty confusing !

BTW, do we still need UTF16 support ?

For those encodings that we still want to support in the future, we should have 
new and more principled implementations under ZnCharacterEncoder. That is, if 
we ever want to fase out TextConverter.

> On Mon, Nov 4, 2013 at 5:12 PM, Sven Van Caekenberghe <s...@stfx.eu> wrote:
> Hi Henrik,
> 
> Great writeup, thanks !
> 
> (more inline)
> 
> On 04 Nov 2013, at 11:58, Henrik Johansen <henrik.s.johan...@veloxit.no> 
> wrote:
> 
> > On 04 Nov 2013, at 9:57 , Diego Lont <diego.l...@delware.nl> wrote:
> >
> >> Working on Petit Delphi we found a strange implementation for 
> >> asPetitStream:
> >> Stream>asPetitStream
> >>      ^ self contents asPetitStream
> >>
> >> Further investigation showed that the basic peek was not fast enough for 
> >> Petit Parser, as it is used a lot. So it implemented a "improved unchecked 
> >> peek":
> >> PPStream>peek
> >>      "An improved version of peek, that is slightly faster than the built 
> >> in version."
> >>      ^ self atEnd ifFalse: [ collection at: position + 1 ]
> >>
> >> PPStream>uncheckedPeek
> >>      "An unchecked version of peek that throws an error if we try to peek 
> >> over the end of the stream, even faster than #peek."
> >>      ^ collection at: position + 1
> >>
> >> But in my knowledge a basic peek should be fast. The real problem is the 
> >> peek in the underlying peek:
> >> PositionableStream>peek
> >>      "Answer what would be returned if the message next were sent to the
> >>      receiver. If the receiver is at the end, answer nil."
> >>
> >>      | nextObject |
> >>      self atEnd ifTrue: [^nil].
> >>      nextObject := self next.
> >>      position := position - 1.
> >>      ^nextObject
> >>
> >> That actually uses "self next". The least thing one should do is to cache 
> >> the next object. But isn't there a primitive for peek in a file stream? 
> >> Because al overriding peeks of PositionableStream have basically the same 
> >> implementation: reading the next and restoring the state to before the 
> >> peek (that is slow). So we would like to be able to remove PPStream 
> >> without causing performance issues, as the only added method is the 
> >> "improved peek".
> >>
> >> Stephan and Diego
> >
> > If you are reading from file, ZnCharacterStream should be a valid 
> > alternative.
> > If not, ZnBufferedReadStream on an internal collection stream also does 
> > peek caching.
> >
> > Beware with files though; it’s better to bench the overall operation for 
> > different alternatives.
> > F.ex, ZnCharacterStream is much faster than the standard Filestream for 
> > peek:
> >
> > cr := ZnCharacterReadStream on: 'PharoDebug.log' asFileReference readStream 
> > binary.
> > [cr peek] bench. '49,400,000 per second.'
> > cr close.
> >
> > FileStream fileNamed: 'PharoDebug.log' do: [:fs | [fs peek] bench] '535,000 
> > per second.’
> >
> > but has different bulk reads characteristics (faster for small bulks, 
> > slower for large bulks, crossover-point at around 1k chars at once);
> > (The actual values are of course also dependent on encoder/file contents, 
> > those given here obtained with UTF-8 and a mostly/all ascii text file)
> >
> > [cr := ZnCharacterReadStream on: ('PharoDebug.log' asFileReference 
> > readStream binary ) readStream.
> >       cr next: 65536; close] bench  '105 per second.'  '106 per second.’
> 
> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream 
> did not yet make use of the optimisations that I did for ZnCharacterEncoding 
> some time ago. More specifically, they were not yet using 
> #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: 
> which are overwritten for ZnUTF8Encoder with (super hacked) versions that 
> assume most of the input will be ASCII (a reasonable assumption).
> 
> I am still chasing a bug, but right now:
> 
> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream 
> binary))
>         next: 65536; close ] bench.
> 
>         "135 per second.” BEFORE
>         "3,310 per second.” AFTER
> 
> But of course the input file is ASCII, so YMMV.
> 
> I’ll let you know when I commit this code.
> 
> Sven
> 
> > [FileStream fileNamed: 'PharoDebug.log' do: [:fs | fs next: 65536]
> >       ] bench  '176 per second.’
> >
> > If you use a StandardFilestream set to binary ( which has less overhead for 
> > binary next’s compared to the MultiByteFileStream returned by 
> > asFileReference readStream)as the base stream instead, , the same general 
> > profile holds true, but with a crossover around 2k characters.
> >
> > TL;DR: Benchmark the alternatives. The best replacement option depends on 
> > your results. Appropriately (according to source and actual use) set up 
> > Zn-streams are probably your best bet.
> >
> > Cheers,
> > Henry
> 
> 
> 


Reply via email to