On Tue, 2015-04-07 at 09:38 -0400, Alan Conway wrote:
> On Tue, 2015-03-31 at 19:17 +0200, Božo Dragojevič wrote:
> > Given the memory overhead of a pn_data_t before encoding, why not have it
> > own an encode buffer? it could get by with exactly that grow_buffer()
> > callback if ownership is the issue .
> 

I think the best way to do this would be to introduce a new class to sit
on top of the existing pn_data_t which does this, rather than extending
the current pn_data_t.

So I think the below is fine, but I'd prefer to avoid stuffing it all
into pn_data_t especially as I think the class is, long term, going
away.

Andrew

> I like this idea a lot. Ownership is an issue for me in Go so I would
> want an alloc/free callback but we could have a non-callback version as
> well (simple wrapper for the callback version with default callbacks)
> for users that don't care about ownership - what do others think?
> Something like:
> 
> // if allocfn or freefn is NULL or allocfn returns NULL return EOS on out of 
> data.
> ssize_t pn_data_encode_alloc(pn_data_t *data, char *bytes, size_t size,
>   void*(*allocfn)(size_t), void (*freefn)(void*));
> 
> // Compatibility
> ssize_t pn_data_encode(pn_data_t *data, char *bytes, size_t size) {
>   return pn_data_encode_alloc(data, bytes, size, NULL, NULL);
> }
> 
> // Convenience when caller would like encode to do buffer management for them.
> // If *bytes is not null it is the initial buffer, if NULL encode allocates.
> // *bytes is updated to point to the final buffer which must be freed with 
> free()
> int *pn_data_encode_grow(pn_data_t *data, char **bytes, size_t size) {
>   return pn_data_encode_alloc(data, bytes, size, pni_realloc, free)
> }
> 
> Note:
> - alloc called 0 or 1 times, and is passed the exact size needed to encode.
> - user is free to allocate more (e.g. doubling strategies)
> - free called after alloc, if and only if alloc is called.
> - separate free instead of single realloc call: encode impl decides
> if/how much data to copy from old to new buffer.
> 
> This can probably be made a bit simpler.
> 
> > Bozzo
> > On Mar 31, 2015 6:10 PM, "Rafael Schloming" <r...@alum.mit.edu> wrote:
> > 
> > > Hi Alan,
> > >
> > > Sorry I didn't comment on this sooner, I didn't have time to comment on
> > > your original review request during my travels, however I do have some
> > > thoughts on the changes you made to the codec interface. I noticed you
> > > added a separate accessor for the size:
> > >
> > >     ssize_t pn_data_encoded_size(pn_data_t *data);
> > >
> > > This is alongside the original encode method:
> > >
> > >     ssize_t pn_data_encode(pn_data_t *data, char *bytes, size_t size);
> > >
> > > I think this API choice while nice in that it is backwards compatible is
> > > also going to result in code that is roughly twice as slow as it needs to
> > > be in the most common case. Based on my experience implementing and
> > > profiling codec in C, Python, and Java, computing the size of the encoded
> > > data seems to usually be roughly the same amount of work as actually
> > > encoding it regardless of the implementation language. Therefore code like
> > > this:
> > >
> > >     if (buffer_size() < pn_data_encoded_size(data)) grow_buffer();
> > >     pn_data_encode(data, buffer, buffer_size());
> > >
> > > Can end up being roughly twice as slow as code like this:
> > >
> > >     ssize_t err;
> > >     while ((err = pn_data_encode(data, buffer, buffer_size())) ==
> > > PN_OVERFLOW) {
> > >         grow_buffer();
> > >     }
> > >
> > > Admittedly the latter form is much more awkward in those cases where you
> > > don't care about performance, so I'm all for providing something nicer, 
> > > but
> > > I think a better API change would be to steal a page from the C stdio.h
> > > APIs and have pn_data_encode always return the number of bytes that would
> > > have been written had there been enough space. This allows you to write 
> > > the
> > > simplified encode as above:
> > >
> > >     if (buffer_size() < pn_data_encode(data, NULL, 0)) grow_buffer();
> > >     pn_data_encode(data, buffer, buffer_size());
> > >
> > > Or use a more optimal form:
> > >
> > >    ssize_t n = pn_data_encode(data, buffer, buffer_size());
> > >    if (n > buffer_size()) {
> > >        grow_buffer();
> > >        pn_data_encode(data, buffer, buffer_size());
> > >    }
> > >
> > > This makes the slow/convenient form possible, and provides some options
> > > that are a bit less awkward than the loop, but it also makes it very clear
> > > that when you use the slow/convenient form you are incurring roughly twice
> > > the cost of the alternative.
> > >
> > > Normally I wouldn't be overly fussed by something like this, and I realize
> > > what I'm suggesting is a breaking change relative to what you provided, 
> > > but
> > > based on what profiling we've done in the past, codec is probably the most
> > > significant source of overhead that we add to an application, and exactly
> > > this sort of double encode effect is almost always one of the first things
> > > you hit when you try to optimize. Given this, I think it would be a good
> > > thing if the API accurately reflects the relative cost of the different
> > > styles of use.
> > >
> > > Thoughts?
> > >
> > > --Rafael
> > >
> 
> 


Reply via email to