Hi,
<short-version>
When doing complex IO, like writing or parsing protocols, to non-memory
streams, like file streams (and probably socket streams), buffering can make an
important difference. ZnBufferedWriteStream and ZnBufferedReadStream are an
easy solution to experience the difference.
</short-version>
And now the long version: first some measurements (code based on latest
versions of Zinc, NeoCSV, NeoJSON and STON):
NeoJSON
| data |
data := (1 to: 25000) collect: [ :each |
{ each. each negated. each reciprocal asFloat. each sqrt. each asWords
} ].
[ '/tmp/numbers.json' asFileReference writeStreamDo: [ :stream |
(NeoJSONWriter on: stream) nextPut: data ] ] timeToRun. 22152
| data |
data := (1 to: 25000) collect: [ :each |
{ each. each negated. each reciprocal asFloat. each sqrt. each asWords
} ].
[ '/tmp/numbers.json' asFileReference writeStreamDo: [ :fstream |
ZnBufferedWriteStream on: fstream do: [ :stream |
(NeoJSONWriter on: stream) nextPut: data ] ] ] timeToRun. 5147
[ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream |
(NeoJSONReader on: stream) next ] ] timeToRun. 3502
[ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream |
ZnBufferedReadStream on: fstream do: [ :stream |
(NeoJSONReader on: stream) next ] ] ] timeToRun. 1214
STON
| data |
data := (1 to: 25000) collect: [ :each |
{ each. each negated. each reciprocal asFloat. each sqrt. each asWords
} ].
[ '/tmp/numbers.ston' asFileReference writeStreamDo: [ :stream |
STON writer on: stream; nextPut: data ] ] timeToRun. 22275
| data |
data := (1 to: 25000) collect: [ :each |
{ each. each negated. each reciprocal asFloat. each sqrt. each asWords
} ].
[ '/tmp/numbers.ston' asFileReference writeStreamDo: [ :fstream |
ZnBufferedWriteStream on: fstream do: [ :stream |
STON writer on: stream; nextPut: data ] ] ] timeToRun. 5501
[ '/tmp/numbers.ston' asFileReference readStreamDo: [ :stream |
STON reader on: stream; next ] ] timeToRun. 3632
[ '/tmp/numbers.ston' asFileReference readStreamDo: [ :fstream |
ZnBufferedReadStream on: fstream do: [ :stream |
STON reader on: stream; next ] ] ] timeToRun. 1367
NeoCSV
| data |
data := (1 to: 25000) collect: [ :each |
{ each. each negated. each reciprocal asFloat. each sqrt. each asWords
} ].
[ '/tmp/numbers.csv' asFileReference writeStreamDo: [ :stream |
(NeoCSVWriter on: stream) nextPutAll: data ] ] timeToRun. 20916
| data |
data := (1 to: 25000) collect: [ :each |
{ each. each negated. each reciprocal asFloat. each sqrt. each asWords
} ].
[ '/tmp/numbers.csv' asFileReference writeStreamDo: [ :fstream |
ZnBufferedWriteStream on: fstream do: [ :stream |
(NeoCSVWriter on: stream) nextPutAll: data ] ] ] timeToRun. 5559
[ '/tmp/numbers.csv' asFileReference readStreamDo: [ :stream |
(NeoCSVReader on: stream)
addIntegerField; addIntegerField; addFloatField; addFloatField;
addField;
upToEnd ] ] timeToRun. 1290
[ '/tmp/numbers.csv' asFileReference readStreamDo: [ :fstream |
ZnBufferedReadStream on: fstream do: [ :stream |
(NeoCSVReader on: stream)
addIntegerField; addIntegerField; addFloatField;
addFloatField; addField;
upToEnd ] ] ] timeToRun. 970
The reason for these non-trivial speedups is that buffering is missing and/or
buffer management is suboptimal in the standard file stream classes. Simply
wrapping the stream with a 64Kb buffer yields results like this. We already
knew this for writing (and Fuel adapted it as well), but it seems to make a
similar, although smaller difference when reading.
<side-note>
Attentive, curious readers might ask why in this particular benchmark the
reading is so much slower than the writing - normally it should be the inverse
or at least they should be closer to each other. Lo and behold: another
performance issue: float printing (as opposed to parsing) is extremely slow
(probably partially due to its reliance on LargeInteger arithmetic) !
Here is a similar benchmark with only Integers:
| data |
data := (1 to: 25000) collect: [ :each |
{ each. each negated. each + 100000. each - 100000. each asWords } ].
[ '/tmp/numbers.csv' asFileReference writeStreamDo: [ :fstream |
ZnBufferedWriteStream on: fstream do: [ :stream |
(NeoCSVWriter on: stream) nextPutAll: data ] ] ] timeToRun. 311
[ '/tmp/numbers.csv' asFileReference readStreamDo: [ :fstream |
ZnBufferedReadStream on: fstream do: [ :stream |
(NeoCSVReader on: stream)
addIntegerField; addIntegerField; addIntegerField;
addIntegerField; addField;
upToEnd ] ] ] timeToRun. 396
</side-note>
If ZnBufferedWriteStream or ZnBufferedReadStream were useful in speading up
your code, please let us know.
Sven
PS: There is one ZnBufferedReadStream limitation: by design, #position,
#position: or #skip: -1 and everything on top of that are NOT supported - IMHO
real streams should not (and cannot) provide these operations. A sad
consequence is that NumberParser and its subclasses (and thus Number
class>>#readFrom:) do not work. NeoJSON and STON do their own number parsing,
NeoCSV relies on a new class, NeoNumberParser, that is independently usable,
though restricted to normal numbers as opposed to the full Smalltalk syntax.
--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill