On 04 Nov 2013, at 9:57 , Diego Lont <[email protected]> wrote:

> Working on Petit Delphi we found a strange implementation for asPetitStream:
> Stream>asPetitStream
>       ^ self contents asPetitStream
> 
> Further investigation showed that the basic peek was not fast enough for 
> Petit Parser, as it is used a lot. So it implemented a "improved unchecked 
> peek":
> PPStream>peek
>       "An improved version of peek, that is slightly faster than the built in 
> version."
>       ^ self atEnd ifFalse: [ collection at: position + 1 ]
> 
> PPStream>uncheckedPeek
>       "An unchecked version of peek that throws an error if we try to peek 
> over the end of the stream, even faster than #peek."
>       ^ collection at: position + 1
> 
> But in my knowledge a basic peek should be fast. The real problem is the peek 
> in the underlying peek:
> PositionableStream>peek
>       "Answer what would be returned if the message next were sent to the 
>       receiver. If the receiver is at the end, answer nil."
> 
>       | nextObject |
>       self atEnd ifTrue: [^nil].
>       nextObject := self next.
>       position := position - 1.
>       ^nextObject
> 
> That actually uses "self next". The least thing one should do is to cache the 
> next object. But isn't there a primitive for peek in a file stream? Because 
> al overriding peeks of PositionableStream have basically the same 
> implementation: reading the next and restoring the state to before the peek 
> (that is slow). So we would like to be able to remove PPStream without 
> causing performance issues, as the only added method is the "improved peek".
> 
> Stephan and Diego

If you are reading from file, ZnCharacterStream should be a valid alternative.
If not, ZnBufferedReadStream on an internal collection stream also does peek 
caching.

Beware with files though; it’s better to bench the overall operation for 
different alternatives.
F.ex, ZnCharacterStream is much faster than the standard Filestream for peek:

cr := ZnCharacterReadStream on: 'PharoDebug.log' asFileReference readStream 
binary.
[cr peek] bench. '49,400,000 per second.'
cr close.

FileStream fileNamed: 'PharoDebug.log' do: [:fs | [fs peek] bench] '535,000 per 
second.’

but has different bulk reads characteristics (faster for small bulks, slower 
for large bulks, crossover-point at around 1k chars at once);
(The actual values are of course also dependent on encoder/file contents, those 
given here obtained with UTF-8 and a mostly/all ascii text file)

[cr := ZnCharacterReadStream on: ('PharoDebug.log' asFileReference readStream 
binary ) readStream.
        cr next: 65536; close] bench  '105 per second.'  '106 per second.'

[FileStream fileNamed: 'PharoDebug.log' do: [:fs | fs next: 65536]
        ] bench  '176 per second.’

If you use a StandardFilestream set to binary ( which has less overhead for 
binary next’s compared to the MultiByteFileStream returned by asFileReference 
readStream)as the base stream instead, , the same general profile holds true, 
but with a crossover around 2k characters.

TL;DR: Benchmark the alternatives. The best replacement option depends on your 
results. Appropriately (according to source and actual use) set up Zn-streams 
are probably your best bet. 

Cheers,
Henry

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to