that would be called XStreams, no? On 2013-11-06, at 13:43, Stéphane Ducasse <[email protected]> wrote:
> Hi henrik > > why don't you give a try to change our lives and propose a new > MultuByteFileStream and friends :) > > Stef > > >> That's great! >> Remembering that commit message was part of the reason for benching, was >> sort of disappointed there was no significant difference between Zn in 2.0 >> and latest 3.0... >> >> I guess with the amount of hacks accumulating, it is indeed turning into a >> worthy successor of MultiByteFileStream ;) >> >> Cheers, >> Henry >> >> P.S: If you want another delightful one in the same vein (both from >> WTFy-ness and perf improvement POV), take a gander at UTF16TextConverter >> >> nextPutByteString:toStream: >> >> >> On Mon, Nov 4, 2013 at 5:12 PM, Sven Van Caekenberghe <[email protected]> wrote: >> Hi Henrik, >> >> Great writeup, thanks ! >> >> (more inline) >> >> On 04 Nov 2013, at 11:58, Henrik Johansen <[email protected]> >> wrote: >> >> > On 04 Nov 2013, at 9:57 , Diego Lont <[email protected]> wrote: >> > >> >> Working on Petit Delphi we found a strange implementation for >> >> asPetitStream: >> >> Stream>asPetitStream >> >> ^ self contents asPetitStream >> >> >> >> Further investigation showed that the basic peek was not fast enough for >> >> Petit Parser, as it is used a lot. So it implemented a "improved >> >> unchecked peek": >> >> PPStream>peek >> >> "An improved version of peek, that is slightly faster than the built >> >> in version." >> >> ^ self atEnd ifFalse: [ collection at: position + 1 ] >> >> >> >> PPStream>uncheckedPeek >> >> "An unchecked version of peek that throws an error if we try to peek >> >> over the end of the stream, even faster than #peek." >> >> ^ collection at: position + 1 >> >> >> >> But in my knowledge a basic peek should be fast. The real problem is the >> >> peek in the underlying peek: >> >> PositionableStream>peek >> >> "Answer what would be returned if the message next were sent to the >> >> receiver. If the receiver is at the end, answer nil." >> >> >> >> | nextObject | >> >> self atEnd ifTrue: [^nil]. >> >> nextObject := self next. >> >> position := position - 1. >> >> ^nextObject >> >> >> >> That actually uses "self next". The least thing one should do is to cache >> >> the next object. But isn't there a primitive for peek in a file stream? >> >> Because al overriding peeks of PositionableStream have basically the same >> >> implementation: reading the next and restoring the state to before the >> >> peek (that is slow). So we would like to be able to remove PPStream >> >> without causing performance issues, as the only added method is the >> >> "improved peek". >> >> >> >> Stephan and Diego >> > >> > If you are reading from file, ZnCharacterStream should be a valid >> > alternative. >> > If not, ZnBufferedReadStream on an internal collection stream also does >> > peek caching. >> > >> > Beware with files though; it’s better to bench the overall operation for >> > different alternatives. >> > F.ex, ZnCharacterStream is much faster than the standard Filestream for >> > peek: >> > >> > cr := ZnCharacterReadStream on: 'PharoDebug.log' asFileReference >> > readStream binary. >> > [cr peek] bench. '49,400,000 per second.' >> > cr close. >> > >> > FileStream fileNamed: 'PharoDebug.log' do: [:fs | [fs peek] bench] >> > '535,000 per second.’ >> > >> > but has different bulk reads characteristics (faster for small bulks, >> > slower for large bulks, crossover-point at around 1k chars at once); >> > (The actual values are of course also dependent on encoder/file contents, >> > those given here obtained with UTF-8 and a mostly/all ascii text file) >> > >> > [cr := ZnCharacterReadStream on: ('PharoDebug.log' asFileReference >> > readStream binary ) readStream. >> > cr next: 65536; close] bench '105 per second.' '106 per second.’ >> >> Well, I just realised that ZnCharacterReadStream and ZnCharacterWriteStream >> did not yet make use of the optimisations that I did for ZnCharacterEncoding >> some time ago. More specifically, they were not yet using >> #next:putAll:startingAt:toStream: and #readInto:startingAt:count:fromStream: >> which are overwritten for ZnUTF8Encoder with (super hacked) versions that >> assume most of the input will be ASCII (a reasonable assumption). >> >> I am still chasing a bug, but right now: >> >> [ (ZnCharacterReadStream on: ('timezones.json' asFileReference readStream >> binary)) >> next: 65536; close ] bench. >> >> "135 per second.” BEFORE >> "3,310 per second.” AFTER >> >> But of course the input file is ASCII, so YMMV. >> >> I’ll let you know when I commit this code. >> >> Sven >> >> > [FileStream fileNamed: 'PharoDebug.log' do: [:fs | fs next: 65536] >> > ] bench '176 per second.’ >> > >> > If you use a StandardFilestream set to binary ( which has less overhead >> > for binary next’s compared to the MultiByteFileStream returned by >> > asFileReference readStream)as the base stream instead, , the same general >> > profile holds true, but with a crossover around 2k characters. >> > >> > TL;DR: Benchmark the alternatives. The best replacement option depends on >> > your results. Appropriately (according to source and actual use) set up >> > Zn-streams are probably your best bet. >> > >> > Cheers, >> > Henry >> >> >> >
signature.asc
Description: Message signed with OpenPGP using GPGMail
