This seems like a pretty thorough reply overall. I'll address some
issues inline.
1. The performance figures that thrift-protobuf-compare provides
for dynamic
serialization systems like JSON are not currently valid since the
tests do
not really test them as a fully general serialization/deserialization
framework.
You mean that since they coded their json serialization directly to
their test data, the performance data is inaccurate? This might be
true in the sense that their serialization doesn't do some things
that Thrift does, but since we generate code, I don't think it's
*that* different.
2. Using TCompactProtocol, Thrift serialization speed and
serialized size
are basically equivalent to protocol buffers
Cool! I bet that we could find other datasets that exercise both
protos better and could turn up slight differences one way or another.
I updated to the trunk of Thrift (rev 773454) and changed the Thrift
serializer to use TCompactProtocol instead of TBinaryProtocol. I also
corrected ThriftSerializer's create() so that the same data was
being sent
for image2 as in ProtobufSerializer. Finally, I updated the
formatting in
BenchmarkRunner and commented out all the serializers except Thrift
and
protocol buffers. Here are the benchmark results from 3 consecutive
runs:
, Create, Ser, Deser,
Total, Size
thrift , 267.37, 8314.00, 8546.00,
17127.36, 220
protobuf , 412.98, 12642.00, 5217.50,
18272.48, 217
, Create, Ser, Deser,
Total, Size
thrift , 266.87, 10905.50, 8526.50,
19698.86, 220
protobuf , 415.21, 11880.50, 4930.00,
17225.71, 217
, Create, Ser, Deser,
Total, Size
thrift , 264.95, 11059.50, 8701.50,
20025.95, 220
protobuf , 417.45, 11125.00, 5203.50,
16745.95, 217
It's interesting that we're that much faster at object creation, but
it would seem from this test that it's basically only like 1% of the
total time of the test, so probably not that significant. We should
definitely address our deserialization time, though.
Looking a little further at deserialization since Thrift seemed to be
performing worse than protocol buffers there, the problem may be
related to
the fact that protocol buffers provides APIs that support direct
serialization to and deserialization from byte arrays which Thrift
does not
provide. The test harness is set up such that the output of
serialize() and
the input of deserialize() is a byte array, so this means that
Thrift needs
to do more work to match up with the test harness. I am still
investigating
this.
Are you saying that you think there is significant overhead to
TSerializer? Or rather, how *is* the test going from objects to byte[]?
It sounds like it might be interesting to strap a profiler to this
test code and see if it shows us anything we can fix.