Re: [Pharo-dev] Peek

Sven Van Caekenberghe Wed, 06 Nov 2013 01:16:34 -0800

On 04 Nov 2013, at 17:12, Sven Van Caekenberghe <[email protected]> wrote:


> Hi Henrik,
> 
> Great writeup, thanks !
> 
> (more inline)
> 
> On 04 Nov 2013, at 11:58, Henrik Johansen <[email protected]> 
> wrote:
> 
>> On 04 Nov 2013, at 9:57 , Diego Lont <[email protected]> wrote:
>> 
>>> Working on Petit Delphi we found a strange implementation for asPetitStream:
>>> Stream>asPetitStream
>>>     ^ self contents asPetitStream
>>> 
>>> Further investigation showed that the basic peek was not fast enough for 
>>> Petit Parser, as it is used a lot. So it implemented a "improved unchecked 
>>> peek":
>>> PPStream>peek
>>>     "An improved version of peek, that is slightly faster than the built in 
>>> version."
>>>     ^ self atEnd ifFalse: [ collection at: position + 1 ]
>>> 
>>> PPStream>uncheckedPeek
>>>     "An unchecked version of peek that throws an error if we try to peek 
>>> over the end of the stream, even faster than #peek."
>>>     ^ collection at: position + 1
>>> 
>>> But in my knowledge a basic peek should be fast. The real problem is the 
>>> peek in the underlying peek:
>>> PositionableStream>peek
>>>     "Answer what would be returned if the message next were sent to the 
>>>     receiver. If the receiver is at the end, answer nil."
>>> 
>>>     | nextObject |
>>>     self atEnd ifTrue: [^nil].
>>>     nextObject := self next.
>>>     position := position - 1.
>>>     ^nextObject
>>> 
>>> That actually uses "self next". The least thing one should do is to cache 
>>> the next object. But isn't there a primitive for peek in a file stream? 
>>> Because al overriding peeks of PositionableStream have basically the same 
>>> implementation: reading the next and restoring the state to before the peek 
>>> (that is slow). So we would like to be able to remove PPStream without 
>>> causing performance issues, as the only added method is the "improved peek".
>>> 
>>> Stephan and Diego
>> 
>> If you are reading from file, ZnCharacterStream should be a valid 
>> alternative.
>> If not, ZnBufferedReadStream on an internal collection stream also does peek 
>> caching.
>> 
>> Beware with files though; it’s better to bench the overall operation for 
>> different alternatives.
>> F.ex, ZnCharacterStream is much faster than the standard Filestream for peek:
>> 
>> cr := ZnCharacterReadStream on: 'PharoDebug.log' asFileReference readStream 
>> binary.
>> [cr peek] bench. '49,400,000 per second.'
>> cr close.
>> 
>> FileStream fileNamed: 'PharoDebug.log' do: [:fs | [fs peek] bench] '535,000 
>> per second.’
>> 
>> but has different bulk reads characteristics (faster for small bulks, slower 
>> for large bulks, crossover-point at around 1k chars at once);
>> (The actual values are of course also dependent on encoder/file contents, 
>> those given here obtained with UTF-8 and a mostly/all ascii text file)
>> 
>> [cr := ZnCharacterReadStream on: ('PharoDebug.log' asFileReference 
>> readStream binary ) readStream.
>>      cr next: 65536; close] bench  '105 per second.'  '106 per second.’
> 
> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream 
> did not yet make use of the optimisations that I did for ZnCharacterEncoding 
> some time ago. More specifically, they were not yet using 
> #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: 
> which are overwritten for ZnUTF8Encoder with (super hacked) versions that 
> assume most of the input will be ASCII (a reasonable assumption).
> 
> I am still chasing a bug, but right now:
> 
> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream 
> binary))
>       next: 65536; close ] bench. 
> 
>       "135 per second.” BEFORE
>       "3,310 per second.” AFTER
> 
> But of course the input file is ASCII, so YMMV.
> 
> I’ll let you know when I commit this code.

===
Name: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.26
Author: SvenVanCaekenberghe
Time: 6 November 2013, 10:09:58.837656 am
UUID: dcb21b30-9596-446d-801c-b091cf3f415e
Ancestors: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.25

ZnCharacterReadStream and ZnCharacterWriteStream now use the optimized bulk 
reading/writing methods from ZnCharacterEncoder 
(#readInto:startingAt:count:fromStream: and #next:putAll:startingAt:toStream: 
respectively)
Fixed a bug in ZnUTF8Encoder>>#optimizedReadInto:startingAt:count:fromStream:
Some refactoring and cleanup of ZnUTF8Encoder and ZnByteEncoder
Added 2 more unit tests 
===
Name: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.13
Author: SvenVanCaekenberghe
Time: 6 November 2013, 10:11:11.411621 am
UUID: 71b503eb-548e-4e2f-b80d-4d2b805bd11e
Ancestors: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.12

Added 2 more unit tests 
ZnCharacterReadStream and ZnCharacterWriteStream now use the optimized bulk 
reading/writing methods from ZnCharacterEncoder 
(#readInto:startingAt:count:fromStream: and #next:putAll:startingAt:toStream: 
respectively)
Fixed a bug in ZnUTF8Encoder>>#optimizedReadInto:startingAt:count:fromStream:
Some refactoring and cleanup of ZnUTF8Encoder and ZnByteEncoder
===

> Sven
> 
>> [FileStream fileNamed: 'PharoDebug.log' do: [:fs | fs next: 65536]
>>      ] bench  '176 per second.’
>> 
>> If you use a StandardFilestream set to binary ( which has less overhead for 
>> binary next’s compared to the MultiByteFileStream returned by 
>> asFileReference readStream)as the base stream instead, , the same general 
>> profile holds true, but with a crossover around 2k characters.
>> 
>> TL;DR: Benchmark the alternatives. The best replacement option depends on 
>> your results. Appropriately (according to source and actual use) set up 
>> Zn-streams are probably your best bet. 
>> 
>> Cheers,
>> Henry

Re: [Pharo-dev] Peek

Reply via email to