On 5/12/09 9:20 AM, "Bryan Duxbury" <[email protected]> wrote:
>Are you saying that you think there is significant overhead to
>TSerializer? Or rather, how *is* the test going from objects to byte[]?
>
>It sounds like it might be interesting to strap a profiler to this
>test code and see if it shows us anything we can fix.

Compare ProtobufSerializer with ThriftSerializer.

In ProtobufSerializer:

  public MediaContent deserialize (byte[] array) throws Exception
  {
    return MediaContent.parseFrom(array);
  }

    public byte[] serialize(MediaContent content) throws IOException
    {
        return content.toByteArray();
    }

In ThriftSerializer (I left in original form rather than TCompactProtocol form).

  public MediaContent deserialize(byte[] array) throws Exception
  {
    ByteArrayInputStream bais = new ByteArrayInputStream(array);
    TIOStreamTransport trans = new TIOStreamTransport(bais);
    TBinaryProtocol oprot = new TBinaryProtocol(trans);
    MediaContent content = new MediaContent();
    content.read(oprot);
    return content;
  }

  public byte[] serialize(MediaContent content) throws Exception
  {
      ByteArrayOutputStream baos = new ByteArrayOutputStream(expectedSize);
    TIOStreamTransport trans = new TIOStreamTransport(baos);
    TBinaryProtocol oprot = new TBinaryProtocol(trans);
    content.write(oprot);
    byte[] array = baos.toByteArray();
    expectedSize = array.length;
    return array;
  }

Clearly we are doing more work in both cases - creating new transports and 
protocols, copy byte[] objects, etc. This is basically due to the fact that 
ProtocolBuffers provides a lower-level API than Thrift that operates on byte[], 
which is what the test harness is based around.

I am not sure I consider this a real problem for Thrift. In a real world RPC or 
archiving scenario, you wouldn't need to create new transport or protocol 
objects on a per request basis. However, it does put us at a disadvantage for 
naïve bencharking. We could consider adding a lower-level API so that we could 
avoid doing this for these kind of low-level use cases.

I am not sure this is the whole story on the deserialization performance story 
though - some preliminary tests I have done to remove the overhead above still 
shows a similar gap in deserialization performance over protocol buffers. I am 
going to keep digging if I get time but I am pretty busy this week. If anyone 
else has time to look at this, particularly someone who is able to contribute 
code changes back to Thrift and/or thrift-protobuf-compare, would be much 
appreciated.

Chad

Reply via email to