Hi henrik why don't you give a try to change our lives and propose a new MultuByteFileStream and friends :)
Stef > That's great! > Remembering that commit message was part of the reason for benching, was sort > of disappointed there was no significant difference between Zn in 2.0 and > latest 3.0... > > I guess with the amount of hacks accumulating, it is indeed turning into a > worthy successor of MultiByteFileStream ;) > > Cheers, > Henry > > P.S: If you want another delightful one in the same vein (both from WTFy-ness > and perf improvement POV), take a gander at UTF16TextConverter >> > nextPutByteString:toStream: > > > On Mon, Nov 4, 2013 at 5:12 PM, Sven Van Caekenberghe <s...@stfx.eu> wrote: > Hi Henrik, > > Great writeup, thanks ! > > (more inline) > > On 04 Nov 2013, at 11:58, Henrik Johansen <henrik.s.johan...@veloxit.no> > wrote: > > > On 04 Nov 2013, at 9:57 , Diego Lont <diego.l...@delware.nl> wrote: > > > >> Working on Petit Delphi we found a strange implementation for > >> asPetitStream: > >> Stream>asPetitStream > >> ^ self contents asPetitStream > >> > >> Further investigation showed that the basic peek was not fast enough for > >> Petit Parser, as it is used a lot. So it implemented a "improved unchecked > >> peek": > >> PPStream>peek > >> "An improved version of peek, that is slightly faster than the built > >> in version." > >> ^ self atEnd ifFalse: [ collection at: position + 1 ] > >> > >> PPStream>uncheckedPeek > >> "An unchecked version of peek that throws an error if we try to peek > >> over the end of the stream, even faster than #peek." > >> ^ collection at: position + 1 > >> > >> But in my knowledge a basic peek should be fast. The real problem is the > >> peek in the underlying peek: > >> PositionableStream>peek > >> "Answer what would be returned if the message next were sent to the > >> receiver. If the receiver is at the end, answer nil." > >> > >> | nextObject | > >> self atEnd ifTrue: [^nil]. > >> nextObject := self next. > >> position := position - 1. > >> ^nextObject > >> > >> That actually uses "self next". The least thing one should do is to cache > >> the next object. But isn't there a primitive for peek in a file stream? > >> Because al overriding peeks of PositionableStream have basically the same > >> implementation: reading the next and restoring the state to before the > >> peek (that is slow). So we would like to be able to remove PPStream > >> without causing performance issues, as the only added method is the > >> "improved peek". > >> > >> Stephan and Diego > > > > If you are reading from file, ZnCharacterStream should be a valid > > alternative. > > If not, ZnBufferedReadStream on an internal collection stream also does > > peek caching. > > > > Beware with files though; it’s better to bench the overall operation for > > different alternatives. > > F.ex, ZnCharacterStream is much faster than the standard Filestream for > > peek: > > > > cr := ZnCharacterReadStream on: 'PharoDebug.log' asFileReference readStream > > binary. > > [cr peek] bench. '49,400,000 per second.' > > cr close. > > > > FileStream fileNamed: 'PharoDebug.log' do: [:fs | [fs peek] bench] '535,000 > > per second.’ > > > > but has different bulk reads characteristics (faster for small bulks, > > slower for large bulks, crossover-point at around 1k chars at once); > > (The actual values are of course also dependent on encoder/file contents, > > those given here obtained with UTF-8 and a mostly/all ascii text file) > > > > [cr := ZnCharacterReadStream on: ('PharoDebug.log' asFileReference > > readStream binary ) readStream. > > cr next: 65536; close] bench '105 per second.' '106 per second.’ > > Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream > did not yet make use of the optimisations that I did for ZnCharacterEncoding > some time ago. More specifically, they were not yet using > #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: > which are overwritten for ZnUTF8Encoder with (super hacked) versions that > assume most of the input will be ASCII (a reasonable assumption). > > I am still chasing a bug, but right now: > > [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream > binary)) > next: 65536; close ] bench. > > "135 per second.” BEFORE > "3,310 per second.” AFTER > > But of course the input file is ASCII, so YMMV. > > I’ll let you know when I commit this code. > > Sven > > > [FileStream fileNamed: 'PharoDebug.log' do: [:fs | fs next: 65536] > > ] bench '176 per second.’ > > > > If you use a StandardFilestream set to binary ( which has less overhead for > > binary next’s compared to the MultiByteFileStream returned by > > asFileReference readStream)as the base stream instead, , the same general > > profile holds true, but with a crossover around 2k characters. > > > > TL;DR: Benchmark the alternatives. The best replacement option depends on > > your results. Appropriately (according to source and actual use) set up > > Zn-streams are probably your best bet. > > > > Cheers, > > Henry > > >