[Pharo-project] Performance Tip: Using buffered streams

Sven Van Caekenberghe Sun, 02 Dec 2012 14:09:53 -0800

Hi,

<short-version>


When doing complex IO, like writing or parsing protocols, to non-memory 
streams, like file streams (and probably socket streams), buffering can make an 
important difference. ZnBufferedWriteStream and ZnBufferedReadStream are an 
easy solution to experience the difference.

</short-version>

And now the long version: first some measurements (code based on latest 
versions of Zinc, NeoCSV, NeoJSON and STON):

NeoJSON

| data |
data := (1 to: 25000) collect: [ :each |
        { each. each negated. each reciprocal asFloat. each sqrt. each asWords 
} ].
[ '/tmp/numbers.json' asFileReference writeStreamDo: [ :stream |
        (NeoJSONWriter on: stream) nextPut: data ] ] timeToRun. 22152
 
| data |
data := (1 to: 25000) collect: [ :each |
        { each. each negated. each reciprocal asFloat. each sqrt. each asWords 
} ].
[ '/tmp/numbers.json' asFileReference writeStreamDo: [ :fstream |
        ZnBufferedWriteStream on: fstream do: [ :stream |
                (NeoJSONWriter on: stream) nextPut: data ] ] ] timeToRun. 5147
 
[ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream |
        (NeoJSONReader on: stream) next ] ] timeToRun. 3502

[ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream |
        ZnBufferedReadStream on: fstream do: [ :stream |
                (NeoJSONReader on: stream) next ] ] ] timeToRun. 1214

STON

| data |
data := (1 to: 25000) collect: [ :each |
        { each. each negated. each reciprocal asFloat. each sqrt. each asWords 
} ].
[ '/tmp/numbers.ston' asFileReference writeStreamDo: [ :stream |
        STON writer on: stream; nextPut: data ] ] timeToRun. 22275
 
| data |
data := (1 to: 25000) collect: [ :each |
        { each. each negated. each reciprocal asFloat. each sqrt. each asWords 
} ].
[ '/tmp/numbers.ston' asFileReference writeStreamDo: [ :fstream |
        ZnBufferedWriteStream on: fstream do: [ :stream |
                STON writer on: stream; nextPut: data ] ] ] timeToRun. 5501
 
[ '/tmp/numbers.ston' asFileReference readStreamDo: [ :stream |
        STON reader on: stream; next ] ] timeToRun. 3632

[ '/tmp/numbers.ston' asFileReference readStreamDo: [ :fstream |
        ZnBufferedReadStream on: fstream do: [ :stream |
                STON reader on: stream; next ] ] ] timeToRun. 1367

NeoCSV

| data |
data := (1 to: 25000) collect: [ :each |
        { each. each negated. each reciprocal asFloat. each sqrt. each asWords 
} ].
[ '/tmp/numbers.csv' asFileReference writeStreamDo: [ :stream |
        (NeoCSVWriter on: stream) nextPutAll: data ] ] timeToRun. 20916
 
| data |
data := (1 to: 25000) collect: [ :each |
        { each. each negated. each reciprocal asFloat. each sqrt. each asWords 
} ].
[ '/tmp/numbers.csv' asFileReference writeStreamDo: [ :fstream |
        ZnBufferedWriteStream on: fstream do: [ :stream |
                (NeoCSVWriter on: stream) nextPutAll: data ] ] ] timeToRun. 5559
 
[ '/tmp/numbers.csv' asFileReference readStreamDo: [ :stream |
        (NeoCSVReader on: stream) 
                addIntegerField; addIntegerField; addFloatField; addFloatField; 
addField;
                upToEnd ] ] timeToRun. 1290

[ '/tmp/numbers.csv' asFileReference readStreamDo: [ :fstream |
        ZnBufferedReadStream on: fstream do: [ :stream |
                (NeoCSVReader on: stream) 
                        addIntegerField; addIntegerField; addFloatField; 
addFloatField; addField;
                        upToEnd ] ] ] timeToRun.  970


The reason for these non-trivial speedups is that buffering is missing and/or 
buffer management is suboptimal in the standard file stream classes. Simply 
wrapping the stream with a 64Kb buffer yields results like this. We already 
knew this for writing (and Fuel adapted it as well), but it seems to make a 
similar, although smaller difference when reading.

<side-note>

Attentive, curious readers might ask why in this particular benchmark the 
reading is so much slower than the writing - normally it should be the inverse 
or at least they should be closer to each other. Lo and behold: another 
performance issue: float printing (as opposed to parsing) is extremely slow 
(probably partially due to its reliance on LargeInteger arithmetic) !

Here is a similar benchmark with only Integers:

| data |
data := (1 to: 25000) collect: [ :each |
        { each. each negated. each + 100000. each - 100000. each asWords } ].
[ '/tmp/numbers.csv' asFileReference writeStreamDo: [ :fstream |
        ZnBufferedWriteStream on: fstream do: [ :stream |
                (NeoCSVWriter on: stream) nextPutAll: data ] ] ] timeToRun. 311

[ '/tmp/numbers.csv' asFileReference readStreamDo: [ :fstream |
        ZnBufferedReadStream on: fstream do: [ :stream |
                (NeoCSVReader on: stream) 
                        addIntegerField; addIntegerField; addIntegerField; 
addIntegerField; addField;
                        upToEnd ] ] ] timeToRun. 396

</side-note>

If ZnBufferedWriteStream or ZnBufferedReadStream were useful in speading up 
your code, please let us know.

Sven

PS: There is one ZnBufferedReadStream limitation: by design, #position, 
#position: or #skip: -1 and everything on top of that are NOT supported - IMHO 
real streams should not (and cannot) provide these operations. A sad 
consequence is that NumberParser and its subclasses (and thus Number 
class>>#readFrom:) do not work. NeoJSON and STON do their own number parsing, 
NeoCSV relies on a new class, NeoNumberParser, that is independently usable, 
though restricted to normal numbers as opposed to the full Smalltalk syntax.

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill

[Pharo-project] Performance Tip: Using buffered streams

Reply via email to