On 03 Dec 2012, at 12:43, Sven Van Caekenberghe <[email protected]> wrote:
> There are ZnCharacterReadStream and ZnCharacterWriteStream that add
> decoding/encoding to binary streams. They could indeed be used as an
> alternative. But that was not my point.
I reran the examples using a binary file stream to disable the decoding and
then using ZnCharacterReadStream to handle the decoding:
[ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream |
(NeoJSONReader on: stream) next ] ] timeToRun. 3502
[ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream |
ZnBufferedReadStream on: fstream do: [ :stream |
(NeoJSONReader on: stream) next ] ] ] timeToRun. 1214
[ StandardFileStream fileNamed: '/tmp/numbers.json' do: [ :stream |
(NeoJSONReader on: stream) next ] ] timeToRun 2869
[ StandardFileStream fileNamed: '/tmp/numbers.json' do: [ :fstream |
ZnBufferedReadStream on: fstream do: [ :stream |
(NeoJSONReader on: stream) next ] ] ] timeToRun 1280
[ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream |
(NeoJSONReader on: (ZnCharacterReadStream on: stream binary)) next ] ]
timeToRun. 1655
[ '/tmp/numbers.json' asFileReference readStreamDo: [ :stream |
(NeoJSONReader on: (ZnCharacterReadStream on: stream binary encoding:
'latin1')) next ] ] timeToRun. 1574
[ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream |
ZnBufferedReadStream on: (ZnCharacterReadStream on: fstream binary) do:
[ :stream |
(NeoJSONReader on: stream) next ] ] ] timeToRun. 1753
[ '/tmp/numbers.json' asFileReference readStreamDo: [ :fstream |
ZnBufferedReadStream on: (ZnCharacterReadStream on: fstream binary
encoding: 'latin1') do: [ :stream |
(NeoJSONReader on: stream) next ] ] ] timeToRun. 1655
This shows several things (for this particular test case):
- ZnCharacterReadStream is at least as good as the builtin decoding streams
(here twice as good)
- UTF-8 (ZnUTF8Encoder) and Latin1 (ZnNullEncoder) decoding are close together
(because the input is ASCII)
- ZnBufferedReadStream does not help when the stream is pretty good
(ZnCharacterReadStream actually uses a 1 character buffer to support peek,
which is important to speed up this use case) - this is to be expected: it adds
a layer, but its fetch next buffer logic still loops using #next, decoding
possibly Unicode characters
- the super fast runs (the 2nd and the 4th) are actually cheating because they
skip decoding - thanks for that observation !
Sven
PS: While doing these explorations, I actually updated ZnUTF8Encoder quite a
bit, both when handling ASCII, which is the majority of characters in most
cases, as well as in the general case; see the latest version.
--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill
--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill