Yes, the lack of FileStream buffering was (re)-discovered and corrected when I have experimented with my own SqueaXTream experiments http://www.squeaksource.com/XTream/ before porting the original Xtreams to Squeak http://www.squeaksource.com/Xtreams/ Then Levente soon came up with an excellent patch for buffering Squeak FIleStream and this was later ported to Pharo.
What I also re-discovered is that not only buffering the raw file counts, but also buffering the decoding. Indded, an uncarefull implementation would read file by chunk but decode byte after byte... While a buffered one would decode large chunks of ASCII (if any) in a single copy primitive. This was reported near the end of this post: http://permalink.gmane.org/gmane.comp.lang.smalltalk.squeak.general/151341 For Xtreams, I don't remember exactly where I put some experiments, but pre-allocating a WideString instead of a String can also make a noticeable difference in presence of WideCharacter... Nicolas 2012/12/3 Sven Van Caekenberghe <[email protected]>: > On 03 Dec 2012, at 12:43, Sven Van Caekenberghe <[email protected]> wrote: > >> There are ZnCharacterReadStream and ZnCharacterWriteStream that add >> decoding/encoding to binary streams. They could indeed be used as an >> alternative. But that was not my point. > > I reran the examples using a binary file stream to disable the decoding and > then using ZnCharacterReadStream to handle the decoding: > > [ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream | > (NeoJSONReader on: stream) next ] ] timeToRun. 3502 > > [ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream | > ZnBufferedReadStream on: fstream do: [ :stream | > (NeoJSONReader on: stream) next ] ] ] timeToRun. 1214 > > [ StandardFileStream fileNamed: '/tmp/numbers.json' do: [ :stream | > (NeoJSONReader on: stream) next ] ] timeToRun 2869 > > [ StandardFileStream fileNamed: '/tmp/numbers.json' do: [ :fstream | > ZnBufferedReadStream on: fstream do: [ :stream | > (NeoJSONReader on: stream) next ] ] ] timeToRun 1280 > > > [ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream | > (NeoJSONReader on: (ZnCharacterReadStream on: stream binary)) next ] > ] timeToRun. 1655 > > [ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream | > (NeoJSONReader on: (ZnCharacterReadStream on: stream binary encoding: > 'latin1')) next ] ] timeToRun. 1574 > > [ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream | > ZnBufferedReadStream on: (ZnCharacterReadStream on: fstream binary) > do: [ :stream | > (NeoJSONReader on: stream) next ] ] ] timeToRun. 1753 > > [ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream | > ZnBufferedReadStream on: (ZnCharacterReadStream on: fstream binary > encoding: 'latin1') do: [ :stream | > (NeoJSONReader on: stream) next ] ] ] timeToRun. 1655 > > This shows several things (for this particular test case): > > - ZnCharacterReadStream is at least as good as the builtin decoding streams > (here twice as good) > - UTF-8 (ZnUTF8Encoder) and Latin1 (ZnNullEncoder) decoding are close > together (because the input is ASCII) > - ZnBufferedReadStream does not help when the stream is pretty good > (ZnCharacterReadStream actually uses a 1 character buffer to support peek, > which is important to speed up this use case) - this is to be expected: it > adds a layer, but its fetch next buffer logic still loops using #next, > decoding possibly Unicode characters > - the super fast runs (the 2nd and the 4th) are actually cheating because > they skip decoding - thanks for that observation ! > > Sven > > PS: While doing these explorations, I actually updated ZnUTF8Encoder quite a > bit, both when handling ASCII, which is the majority of characters in most > cases, as well as in the general case; see the latest version. > > -- > Sven Van Caekenberghe > http://stfx.eu > Smalltalk is the Red Pill > > > > -- > Sven Van Caekenberghe > http://stfx.eu > Smalltalk is the Red Pill > > > >
