Yes, the lack of FileStream buffering was (re)-discovered and
corrected when I have experimented with my own SqueaXTream experiments
http://www.squeaksource.com/XTream/ before porting the original
Xtreams to Squeak http://www.squeaksource.com/Xtreams/
Then Levente soon came up with an excellent patch for buffering Squeak
FIleStream and this was later ported to Pharo.

What I also re-discovered is that not only buffering the raw file
counts, but also buffering the decoding.
Indded, an uncarefull implementation would read file by chunk but
decode byte after byte...
While a buffered one would decode large chunks of ASCII (if any) in a
single copy primitive.

This was reported near the end of this post:
http://permalink.gmane.org/gmane.comp.lang.smalltalk.squeak.general/151341

For Xtreams, I don't remember exactly where I put some experiments,
but pre-allocating a WideString instead of a String can also make a
noticeable difference in presence of WideCharacter...

Nicolas

2012/12/3 Sven Van Caekenberghe <[email protected]>:
> On 03 Dec 2012, at 12:43, Sven Van Caekenberghe <[email protected]> wrote:
>
>> There are ZnCharacterReadStream and ZnCharacterWriteStream that add 
>> decoding/encoding to binary streams. They could indeed be used as an 
>> alternative. But that was not my point.
>
> I reran the examples using a binary file stream to disable the decoding and 
> then using ZnCharacterReadStream to handle the decoding:
>
> [ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream |
>         (NeoJSONReader on: stream) next ] ] timeToRun. 3502
>
> [ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream |
>         ZnBufferedReadStream on: fstream do: [ :stream |
>                 (NeoJSONReader on: stream) next ] ] ] timeToRun. 1214
>
> [ StandardFileStream fileNamed: '/tmp/numbers.json' do: [ :stream |
>         (NeoJSONReader on: stream) next ] ] timeToRun 2869
>
> [ StandardFileStream fileNamed: '/tmp/numbers.json' do: [ :fstream |
>         ZnBufferedReadStream on: fstream do: [ :stream |
>                 (NeoJSONReader on: stream) next ] ] ] timeToRun 1280
>
>
> [ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream |
>         (NeoJSONReader on: (ZnCharacterReadStream on: stream binary)) next ] 
> ] timeToRun. 1655
>
> [ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream |
>         (NeoJSONReader on: (ZnCharacterReadStream on: stream binary encoding: 
> 'latin1')) next ] ] timeToRun. 1574
>
> [ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream |
>         ZnBufferedReadStream on: (ZnCharacterReadStream on: fstream binary) 
> do: [ :stream |
>                 (NeoJSONReader on: stream) next ] ] ] timeToRun. 1753
>
> [ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream |
>         ZnBufferedReadStream on: (ZnCharacterReadStream on: fstream binary 
> encoding: 'latin1') do: [ :stream |
>                 (NeoJSONReader on: stream) next ] ] ] timeToRun. 1655
>
> This shows several things (for this particular test case):
>
> - ZnCharacterReadStream is at least as good as the builtin decoding streams 
> (here twice as good)
> - UTF-8 (ZnUTF8Encoder) and Latin1 (ZnNullEncoder) decoding are close 
> together (because the input is ASCII)
> - ZnBufferedReadStream does not help when the stream is pretty good 
> (ZnCharacterReadStream actually uses a 1 character buffer to support peek, 
> which is important to speed up this use case) - this is to be expected: it 
> adds a layer, but its fetch next buffer logic still loops using #next, 
> decoding possibly Unicode characters
> - the super fast runs (the 2nd and the 4th) are actually cheating because 
> they skip decoding - thanks for that observation !
>
> Sven
>
> PS: While doing these explorations, I actually updated ZnUTF8Encoder quite a 
> bit, both when handling ASCII, which is the majority of characters in most 
> cases, as well as in the general case; see the latest version.
>
> --
> Sven Van Caekenberghe
> http://stfx.eu
> Smalltalk is the Red Pill
>
>
>
> --
> Sven Van Caekenberghe
> http://stfx.eu
> Smalltalk is the Red Pill
>
>
>
>

Reply via email to