Hi henrik

why don't you give a try to change our lives and propose a new 
MultuByteFileStream and friends :)

Stef


> That's great!
> Remembering that commit message was part of the reason for benching, was sort 
> of disappointed there was no significant difference between Zn in 2.0 and 
> latest 3.0...
> 
> I guess with the amount of hacks accumulating, it is indeed turning into a 
> worthy successor of MultiByteFileStream ;)
> 
> Cheers,
> Henry
> 
> P.S: If you want another delightful one in the same vein (both from WTFy-ness 
> and perf improvement POV), take a gander at UTF16TextConverter >> 
> nextPutByteString:toStream: 
> 
> 
> On Mon, Nov 4, 2013 at 5:12 PM, Sven Van Caekenberghe <s...@stfx.eu> wrote:
> Hi Henrik,
> 
> Great writeup, thanks !
> 
> (more inline)
> 
> On 04 Nov 2013, at 11:58, Henrik Johansen <henrik.s.johan...@veloxit.no> 
> wrote:
> 
> > On 04 Nov 2013, at 9:57 , Diego Lont <diego.l...@delware.nl> wrote:
> >
> >> Working on Petit Delphi we found a strange implementation for 
> >> asPetitStream:
> >> Stream>asPetitStream
> >>      ^ self contents asPetitStream
> >>
> >> Further investigation showed that the basic peek was not fast enough for 
> >> Petit Parser, as it is used a lot. So it implemented a "improved unchecked 
> >> peek":
> >> PPStream>peek
> >>      "An improved version of peek, that is slightly faster than the built 
> >> in version."
> >>      ^ self atEnd ifFalse: [ collection at: position + 1 ]
> >>
> >> PPStream>uncheckedPeek
> >>      "An unchecked version of peek that throws an error if we try to peek 
> >> over the end of the stream, even faster than #peek."
> >>      ^ collection at: position + 1
> >>
> >> But in my knowledge a basic peek should be fast. The real problem is the 
> >> peek in the underlying peek:
> >> PositionableStream>peek
> >>      "Answer what would be returned if the message next were sent to the
> >>      receiver. If the receiver is at the end, answer nil."
> >>
> >>      | nextObject |
> >>      self atEnd ifTrue: [^nil].
> >>      nextObject := self next.
> >>      position := position - 1.
> >>      ^nextObject
> >>
> >> That actually uses "self next". The least thing one should do is to cache 
> >> the next object. But isn't there a primitive for peek in a file stream? 
> >> Because al overriding peeks of PositionableStream have basically the same 
> >> implementation: reading the next and restoring the state to before the 
> >> peek (that is slow). So we would like to be able to remove PPStream 
> >> without causing performance issues, as the only added method is the 
> >> "improved peek".
> >>
> >> Stephan and Diego
> >
> > If you are reading from file, ZnCharacterStream should be a valid 
> > alternative.
> > If not, ZnBufferedReadStream on an internal collection stream also does 
> > peek caching.
> >
> > Beware with files though; it’s better to bench the overall operation for 
> > different alternatives.
> > F.ex, ZnCharacterStream is much faster than the standard Filestream for 
> > peek:
> >
> > cr := ZnCharacterReadStream on: 'PharoDebug.log' asFileReference readStream 
> > binary.
> > [cr peek] bench. '49,400,000 per second.'
> > cr close.
> >
> > FileStream fileNamed: 'PharoDebug.log' do: [:fs | [fs peek] bench] '535,000 
> > per second.’
> >
> > but has different bulk reads characteristics (faster for small bulks, 
> > slower for large bulks, crossover-point at around 1k chars at once);
> > (The actual values are of course also dependent on encoder/file contents, 
> > those given here obtained with UTF-8 and a mostly/all ascii text file)
> >
> > [cr := ZnCharacterReadStream on: ('PharoDebug.log' asFileReference 
> > readStream binary ) readStream.
> >       cr next: 65536; close] bench  '105 per second.'  '106 per second.’
> 
> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream 
> did not yet make use of the optimisations that I did for ZnCharacterEncoding 
> some time ago. More specifically, they were not yet using 
> #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: 
> which are overwritten for ZnUTF8Encoder with (super hacked) versions that 
> assume most of the input will be ASCII (a reasonable assumption).
> 
> I am still chasing a bug, but right now:
> 
> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream 
> binary))
>         next: 65536; close ] bench.
> 
>         "135 per second.” BEFORE
>         "3,310 per second.” AFTER
> 
> But of course the input file is ASCII, so YMMV.
> 
> I’ll let you know when I commit this code.
> 
> Sven
> 
> > [FileStream fileNamed: 'PharoDebug.log' do: [:fs | fs next: 65536]
> >       ] bench  '176 per second.’
> >
> > If you use a StandardFilestream set to binary ( which has less overhead for 
> > binary next’s compared to the MultiByteFileStream returned by 
> > asFileReference readStream)as the base stream instead, , the same general 
> > profile holds true, but with a crossover around 2k characters.
> >
> > TL;DR: Benchmark the alternatives. The best replacement option depends on 
> > your results. Appropriately (according to source and actual use) set up 
> > Zn-streams are probably your best bet.
> >
> > Cheers,
> > Henry
> 
> 
> 

Reply via email to