codec changes

Rafael Schloming Tue, 31 Mar 2015 09:11:23 -0700

Hi Alan,

Sorry I didn't comment on this sooner, I didn't have time to comment on
your original review request during my travels, however I do have some
thoughts on the changes you made to the codec interface. I noticed you
added a separate accessor for the size:


    ssize_t pn_data_encoded_size(pn_data_t *data);

This is alongside the original encode method:

    ssize_t pn_data_encode(pn_data_t *data, char *bytes, size_t size);

I think this API choice while nice in that it is backwards compatible is
also going to result in code that is roughly twice as slow as it needs to
be in the most common case. Based on my experience implementing and
profiling codec in C, Python, and Java, computing the size of the encoded
data seems to usually be roughly the same amount of work as actually
encoding it regardless of the implementation language. Therefore code like
this:

    if (buffer_size() < pn_data_encoded_size(data)) grow_buffer();
    pn_data_encode(data, buffer, buffer_size());

Can end up being roughly twice as slow as code like this:

    ssize_t err;
    while ((err = pn_data_encode(data, buffer, buffer_size())) ==
PN_OVERFLOW) {
        grow_buffer();
    }

Admittedly the latter form is much more awkward in those cases where you
don't care about performance, so I'm all for providing something nicer, but
I think a better API change would be to steal a page from the C stdio.h
APIs and have pn_data_encode always return the number of bytes that would
have been written had there been enough space. This allows you to write the
simplified encode as above:

    if (buffer_size() < pn_data_encode(data, NULL, 0)) grow_buffer();
    pn_data_encode(data, buffer, buffer_size());

Or use a more optimal form:

   ssize_t n = pn_data_encode(data, buffer, buffer_size());
   if (n > buffer_size()) {
       grow_buffer();
       pn_data_encode(data, buffer, buffer_size());
   }

This makes the slow/convenient form possible, and provides some options
that are a bit less awkward than the loop, but it also makes it very clear
that when you use the slow/convenient form you are incurring roughly twice
the cost of the alternative.

Normally I wouldn't be overly fussed by something like this, and I realize
what I'm suggesting is a breaking change relative to what you provided, but
based on what profiling we've done in the past, codec is probably the most
significant source of overhead that we add to an application, and exactly
this sort of double encode effect is almost always one of the first things
you hit when you try to optimize. Given this, I think it would be a good
thing if the API accurately reflects the relative cost of the different
styles of use.

Thoughts?

--Rafael

codec changes

Reply via email to