that would be called XStreams, no?

On 2013-11-06, at 13:43, Stéphane Ducasse <[email protected]> wrote:

> Hi henrik
> 
> why don't you give a try to change our lives and propose a new 
> MultuByteFileStream and friends :)
> 
> Stef
> 
> 
>> That's great!
>> Remembering that commit message was part of the reason for benching, was 
>> sort of disappointed there was no significant difference between Zn in 2.0 
>> and latest 3.0...
>> 
>> I guess with the amount of hacks accumulating, it is indeed turning into a 
>> worthy successor of MultiByteFileStream ;)
>> 
>> Cheers,
>> Henry
>> 
>> P.S: If you want another delightful one in the same vein (both from 
>> WTFy-ness and perf improvement POV), take a gander at UTF16TextConverter >> 
>> nextPutByteString:toStream: 
>> 
>> 
>> On Mon, Nov 4, 2013 at 5:12 PM, Sven Van Caekenberghe <[email protected]> wrote:
>> Hi Henrik,
>> 
>> Great writeup, thanks !
>> 
>> (more inline)
>> 
>> On 04 Nov 2013, at 11:58, Henrik Johansen <[email protected]> 
>> wrote:
>> 
>> > On 04 Nov 2013, at 9:57 , Diego Lont <[email protected]> wrote:
>> >
>> >> Working on Petit Delphi we found a strange implementation for 
>> >> asPetitStream:
>> >> Stream>asPetitStream
>> >>      ^ self contents asPetitStream
>> >>
>> >> Further investigation showed that the basic peek was not fast enough for 
>> >> Petit Parser, as it is used a lot. So it implemented a "improved 
>> >> unchecked peek":
>> >> PPStream>peek
>> >>      "An improved version of peek, that is slightly faster than the built 
>> >> in version."
>> >>      ^ self atEnd ifFalse: [ collection at: position + 1 ]
>> >>
>> >> PPStream>uncheckedPeek
>> >>      "An unchecked version of peek that throws an error if we try to peek 
>> >> over the end of the stream, even faster than #peek."
>> >>      ^ collection at: position + 1
>> >>
>> >> But in my knowledge a basic peek should be fast. The real problem is the 
>> >> peek in the underlying peek:
>> >> PositionableStream>peek
>> >>      "Answer what would be returned if the message next were sent to the
>> >>      receiver. If the receiver is at the end, answer nil."
>> >>
>> >>      | nextObject |
>> >>      self atEnd ifTrue: [^nil].
>> >>      nextObject := self next.
>> >>      position := position - 1.
>> >>      ^nextObject
>> >>
>> >> That actually uses "self next". The least thing one should do is to cache 
>> >> the next object. But isn't there a primitive for peek in a file stream? 
>> >> Because al overriding peeks of PositionableStream have basically the same 
>> >> implementation: reading the next and restoring the state to before the 
>> >> peek (that is slow). So we would like to be able to remove PPStream 
>> >> without causing performance issues, as the only added method is the 
>> >> "improved peek".
>> >>
>> >> Stephan and Diego
>> >
>> > If you are reading from file, ZnCharacterStream should be a valid 
>> > alternative.
>> > If not, ZnBufferedReadStream on an internal collection stream also does 
>> > peek caching.
>> >
>> > Beware with files though; it’s better to bench the overall operation for 
>> > different alternatives.
>> > F.ex, ZnCharacterStream is much faster than the standard Filestream for 
>> > peek:
>> >
>> > cr := ZnCharacterReadStream on: 'PharoDebug.log' asFileReference 
>> > readStream binary.
>> > [cr peek] bench. '49,400,000 per second.'
>> > cr close.
>> >
>> > FileStream fileNamed: 'PharoDebug.log' do: [:fs | [fs peek] bench] 
>> > '535,000 per second.’
>> >
>> > but has different bulk reads characteristics (faster for small bulks, 
>> > slower for large bulks, crossover-point at around 1k chars at once);
>> > (The actual values are of course also dependent on encoder/file contents, 
>> > those given here obtained with UTF-8 and a mostly/all ascii text file)
>> >
>> > [cr := ZnCharacterReadStream on: ('PharoDebug.log' asFileReference 
>> > readStream binary ) readStream.
>> >       cr next: 65536; close] bench  '105 per second.'  '106 per second.’
>> 
>> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream 
>> did not yet make use of the optimisations that I did for ZnCharacterEncoding 
>> some time ago. More specifically, they were not yet using 
>> #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: 
>> which are overwritten for ZnUTF8Encoder with (super hacked) versions that 
>> assume most of the input will be ASCII (a reasonable assumption).
>> 
>> I am still chasing a bug, but right now:
>> 
>> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream 
>> binary))
>>         next: 65536; close ] bench.
>> 
>>         "135 per second.” BEFORE
>>         "3,310 per second.” AFTER
>> 
>> But of course the input file is ASCII, so YMMV.
>> 
>> I’ll let you know when I commit this code.
>> 
>> Sven
>> 
>> > [FileStream fileNamed: 'PharoDebug.log' do: [:fs | fs next: 65536]
>> >       ] bench  '176 per second.’
>> >
>> > If you use a StandardFilestream set to binary ( which has less overhead 
>> > for binary next’s compared to the MultiByteFileStream returned by 
>> > asFileReference readStream)as the base stream instead, , the same general 
>> > profile holds true, but with a crossover around 2k characters.
>> >
>> > TL;DR: Benchmark the alternatives. The best replacement option depends on 
>> > your results. Appropriately (according to source and actual use) set up 
>> > Zn-streams are probably your best bet.
>> >
>> > Cheers,
>> > Henry
>> 
>> 
>> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to