Hi Henrik, Great writeup, thanks !
(more inline) On 04 Nov 2013, at 11:58, Henrik Johansen <[email protected]> wrote: > On 04 Nov 2013, at 9:57 , Diego Lont <[email protected]> wrote: > >> Working on Petit Delphi we found a strange implementation for asPetitStream: >> Stream>asPetitStream >> ^ self contents asPetitStream >> >> Further investigation showed that the basic peek was not fast enough for >> Petit Parser, as it is used a lot. So it implemented a "improved unchecked >> peek": >> PPStream>peek >> "An improved version of peek, that is slightly faster than the built in >> version." >> ^ self atEnd ifFalse: [ collection at: position + 1 ] >> >> PPStream>uncheckedPeek >> "An unchecked version of peek that throws an error if we try to peek >> over the end of the stream, even faster than #peek." >> ^ collection at: position + 1 >> >> But in my knowledge a basic peek should be fast. The real problem is the >> peek in the underlying peek: >> PositionableStream>peek >> "Answer what would be returned if the message next were sent to the >> receiver. If the receiver is at the end, answer nil." >> >> | nextObject | >> self atEnd ifTrue: [^nil]. >> nextObject := self next. >> position := position - 1. >> ^nextObject >> >> That actually uses "self next". The least thing one should do is to cache >> the next object. But isn't there a primitive for peek in a file stream? >> Because al overriding peeks of PositionableStream have basically the same >> implementation: reading the next and restoring the state to before the peek >> (that is slow). So we would like to be able to remove PPStream without >> causing performance issues, as the only added method is the "improved peek". >> >> Stephan and Diego > > If you are reading from file, ZnCharacterStream should be a valid alternative. > If not, ZnBufferedReadStream on an internal collection stream also does peek > caching. > > Beware with files though; it’s better to bench the overall operation for > different alternatives. > F.ex, ZnCharacterStream is much faster than the standard Filestream for peek: > > cr := ZnCharacterReadStream on: 'PharoDebug.log' asFileReference readStream > binary. > [cr peek] bench. '49,400,000 per second.' > cr close. > > FileStream fileNamed: 'PharoDebug.log' do: [:fs | [fs peek] bench] '535,000 > per second.’ > > but has different bulk reads characteristics (faster for small bulks, slower > for large bulks, crossover-point at around 1k chars at once); > (The actual values are of course also dependent on encoder/file contents, > those given here obtained with UTF-8 and a mostly/all ascii text file) > > [cr := ZnCharacterReadStream on: ('PharoDebug.log' asFileReference readStream > binary ) readStream. > cr next: 65536; close] bench '105 per second.' '106 per second.’ Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream did not yet make use of the optimisations that I did for ZnCharacterEncoding some time ago. More specifically, they were not yet using #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: which are overwritten for ZnUTF8Encoder with (super hacked) versions that assume most of the input will be ASCII (a reasonable assumption). I am still chasing a bug, but right now: [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream binary)) next: 65536; close ] bench. "135 per second.” BEFORE "3,310 per second.” AFTER But of course the input file is ASCII, so YMMV. I’ll let you know when I commit this code. Sven > [FileStream fileNamed: 'PharoDebug.log' do: [:fs | fs next: 65536] > ] bench '176 per second.’ > > If you use a StandardFilestream set to binary ( which has less overhead for > binary next’s compared to the MultiByteFileStream returned by asFileReference > readStream)as the base stream instead, , the same general profile holds true, > but with a crossover around 2k characters. > > TL;DR: Benchmark the alternatives. The best replacement option depends on > your results. Appropriately (according to source and actual use) set up > Zn-streams are probably your best bet. > > Cheers, > Henry
