Re: codec changes

2015-05-12 Thread Rafael Schloming
On Mon, May 11, 2015 at 3:49 PM, Alan Conway acon...@redhat.com wrote:

 On Thu, 2015-05-07 at 15:53 -0400, Rafael Schloming wrote:
  I believe where we ended up was standardizing on a single snprintf-style
  signature for all encode functions, i.e.:
 
  // always returns encoded size, never overwrites limit, pass in
 NULL, 0
  to compute size in advance
  // returns  0 if there was an actual error of some sort, e.g. xxx_t
  cannot be validly encoded for some reason
  ssize_t pn_xxx_encode(xxx_t, char *buf, size_t limit);
 
  And transitioning to this with a feature macro.

 I'm good with sprintf-style.

 I'm not sure what you mean by a feature macro. Does that addresses
 binary compatibility or is just a source-level switch? If we break
 binary compatibility we need to make sure we bump the library version
 appropriately to avoid breaking existing apps. If we set the switch to
 default to the new behavior we should make sure the compiler barfs on
 old code and it's fairly obvious what to do about it.


I've never actually defined feature test macros before, just used them, so
I'm just making this up as we go along. What I imagine could work though is
to define pn_xxx_encode2 in the .c file to provide the new behavior. By
default it would not be visible from the header files unless the
appropriate feature macro was defined upon inclusion, in which case we
would alias pn_xxx_encode to pn_xxx_encode2, e.g.:

--
#include codec.h
#include message.h
...
int err = pn_message_encode(msg, buf, in_out_size); // we could make this
spew a deprecated warning by default
--

--
#define PN_STANDARD_ENCODE_SIGNATURE
#include codec.h
#include message.h
...
ssize_t encoded_size = pn_message_encode(msg, buf, bufsize);
--

This would allow people to incrementally upgrade and would not break binary
compatibility. We could of course at some later point break binary
compatibility if we want to and clean up the encode2 symbols, especially if
we have other reasons to change ABI.

One thing I'm not sure of is the granularity of the feature test, e.g.
should the macro be specific to the change (PN_STANDARD_ENCODE_SIGNATURE)
or should it be related to the version somehow, e.g. (PN_0_10_SOURCE) or
something like that. I guess both could be an option too, e.g.
PN_0_10_SOURCE could be an alias for all the new features in 0.10,
presumably just this and possibly the sasl stuff if there is a way to make
that work.

--Rafael


Re: codec changes

2015-05-11 Thread Alan Conway
On Thu, 2015-05-07 at 15:53 -0400, Rafael Schloming wrote:
 I believe where we ended up was standardizing on a single snprintf-style
 signature for all encode functions, i.e.:
 
 // always returns encoded size, never overwrites limit, pass in NULL, 0
 to compute size in advance
 // returns  0 if there was an actual error of some sort, e.g. xxx_t
 cannot be validly encoded for some reason
 ssize_t pn_xxx_encode(xxx_t, char *buf, size_t limit);
 
 And transitioning to this with a feature macro.

I'm good with sprintf-style.

I'm not sure what you mean by a feature macro. Does that addresses
binary compatibility or is just a source-level switch? If we break
binary compatibility we need to make sure we bump the library version
appropriately to avoid breaking existing apps. If we set the switch to
default to the new behavior we should make sure the compiler barfs on
old code and it's fairly obvious what to do about it.

Cheers,
Alan.

 
 --Rafael
 
 On Thu, May 7, 2015 at 3:28 PM, Andrew Stitcher astitc...@redhat.com
 wrote:
 
  On Wed, 2015-05-06 at 14:03 -0400, Rafael Schloming wrote:
   We seem to have reached consensus here, but I haven't seen any commits on
   this. We should probably fix this before 0.10 so we don't end up putting
   out a new API and then deprecating it in the next release. Is anyone
   actually working on this?
 
  Could you articulate the consensus then.
 
  Asserting we have reached consensus without explicitly stating what
  you think the consensus to be is remarkably likely to be proven wrong by
  subsequent events!
 
  Andrew
 
 
 




Re: codec changes

2015-05-07 Thread Andrew Stitcher
On Wed, 2015-05-06 at 14:03 -0400, Rafael Schloming wrote:
 We seem to have reached consensus here, but I haven't seen any commits on
 this. We should probably fix this before 0.10 so we don't end up putting
 out a new API and then deprecating it in the next release. Is anyone
 actually working on this?

Could you articulate the consensus then.

Asserting we have reached consensus without explicitly stating what
you think the consensus to be is remarkably likely to be proven wrong by
subsequent events!

Andrew




Re: codec changes

2015-05-07 Thread Rafael Schloming
I believe where we ended up was standardizing on a single snprintf-style
signature for all encode functions, i.e.:

// always returns encoded size, never overwrites limit, pass in NULL, 0
to compute size in advance
// returns  0 if there was an actual error of some sort, e.g. xxx_t
cannot be validly encoded for some reason
ssize_t pn_xxx_encode(xxx_t, char *buf, size_t limit);

And transitioning to this with a feature macro.

--Rafael

On Thu, May 7, 2015 at 3:28 PM, Andrew Stitcher astitc...@redhat.com
wrote:

 On Wed, 2015-05-06 at 14:03 -0400, Rafael Schloming wrote:
  We seem to have reached consensus here, but I haven't seen any commits on
  this. We should probably fix this before 0.10 so we don't end up putting
  out a new API and then deprecating it in the next release. Is anyone
  actually working on this?

 Could you articulate the consensus then.

 Asserting we have reached consensus without explicitly stating what
 you think the consensus to be is remarkably likely to be proven wrong by
 subsequent events!

 Andrew





Re: codec changes

2015-04-15 Thread Dominic Evans
-Rafael Schloming r...@alum.mit.edu wrote: -
 On Tue, Apr 14, 2015 at 1:27 PM, Alan Conway acon...@redhat.com
 wrote:
 That works for me, now how do we manage the transition? I don't think
 we can afford to inflict yum update proton; all proton apps crash
 on our users.  That means we cannot change the behavior of existing
 function names. I don't much like adding encode_foo2 for every
 encode_foo but I don't see what else to do.

 
 We could mark the old ones as deprecated and add the new ones as
 pn_xxx_encode2 and provide a feature macro that #defines pn_xxx_encode
 to pn_xxx_encode2. Then after a sufficient transition period we can
 get rid of the old versions.
 
 --Rafael


+1 

-- 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Re: codec changes

2015-04-15 Thread Andrew Stitcher
On Wed, 2015-04-15 at 07:13 -0400, Rafael Schloming wrote:
 On Tue, Apr 14, 2015 at 1:27 PM, Alan Conway acon...@redhat.com wrote:
 
  That works for me, now how do we manage the transition? I don't think we
  can afford to inflict yum update proton; all proton apps crash on our
  users. That means we cannot change the behavior of existing function
  names. I don't much like adding encode_foo2 for every encode_foo but I
  don't see what else to do.
 
 
 We could mark the old ones as deprecated and add the new ones as
 pn_xxx_encode2 and provide a feature macro that #defines pn_xxx_encode to
 pn_xxx_encode2. Then after a sufficient transition period we can get rid of
 the old versions.

Along these lines it is also possible to keep ABI (binary library
compatibility) by using versioned library symbols as well. There is a
level of linker magic needed, but at least on Unix it's well understood,
if a little arcane.

Another perfectly reasonable approach would be to create a new name for
the new API and deprecate the old name.

So for example deprecate pn_data_t in favour of pn_value_t (or whatever
better but new name). Where pn_value_t has all the old functionality of
pn_data_t and indeed may be the same internal structure initially, but
with a different interface.

Incidentally C++ does make this easier because it allows function
overloading.

Andrew




Re: codec changes

2015-04-15 Thread Rafael Schloming
On Tue, Apr 14, 2015 at 1:27 PM, Alan Conway acon...@redhat.com wrote:

 That works for me, now how do we manage the transition? I don't think we
 can afford to inflict yum update proton; all proton apps crash on our
 users. That means we cannot change the behavior of existing function
 names. I don't much like adding encode_foo2 for every encode_foo but I
 don't see what else to do.


We could mark the old ones as deprecated and add the new ones as
pn_xxx_encode2 and provide a feature macro that #defines pn_xxx_encode to
pn_xxx_encode2. Then after a sufficient transition period we can get rid of
the old versions.

--Rafael


Re: codec changes

2015-04-14 Thread Alan Conway
On Sun, 2015-04-12 at 13:12 -0400, Rafael Schloming wrote:
 On Fri, Apr 10, 2015 at 11:37 AM, Alan Conway acon...@redhat.com wrote:
 
 
  On Wed, 2015-04-08 at 14:50 -0400, Andrew Stitcher wrote:
   On Wed, 2015-04-08 at 10:48 -0400, Alan Conway wrote:
...
The issue is that we need buffer growth AND control over allocation.
pn_buffer_t forces use of malloc/free/realloc. That won't help if
  you're
trying to get the data into a buffer allocated by Go for example. I
agree a growable buffer type is a nicer API than raw function pointers,
but the buffer type itself would need to use function pointers so we
  can
replace the functions used for alloc/free/realloc.
  
   I think this is a pretty general issue actually that also crops up in
   embedded systems, where you want some different control over memory
   allocation.
  
   I think it might be better addressed by making malloc/free/realloc
   replaceable for the whole of proton - would that solve your go issue?
 
  Sadly no. A proton realloc surrogate is just going to be passed void*
  and return void*. There's no memory safe way to implement that in Go,
  unless someone keeps a Go pointer to the returned buffer it gets garbage
  collected. There's no way to communicate a Go pointer thru a C proton
  interface.
 
  Given all that, I think I prefer the sprintf-style solution for growing
  buffers. It keeps the memory management strictly on the caller side of
  the pn_ interface which is simpler.
 
  If there is a big pn_data overhaul in the works then maybe we should not
  be talking about pn_data_encode and instead make sure we fix it in the
  overhaul.
 
  I notice we have these other APIs:
 
  codec.h:499:PN_EXTERN int pn_data_format(pn_data_t *data, char *bytes,
  size_t *size);
  message.h:739:PN_EXTERN int pn_message_encode(pn_message_t *msg, char
  *bytes, size_t *size);
  ssl.h:320:PN_EXTERN int pn_ssl_get_peer_hostname( pn_ssl_t *ssl, char
  *hostname, size_t *bufsize );
 
  That is a better signature IMO, the new (backward compatible) syntax for
  such functions would be:
 
  return 0: *size is updated to be the actual size.
  return PN_OVERFLOW: *size is updated to be the required size.
  return  0: *size is undefined.
  don't return  0
 
  Perhaps for the overhaul we should ensure all such functions (including
  the replacement for pn_data_encode) follow this pattern?
 
 
 I'm a big +1 on standardizing on a single snprintf style signature for all
 encode functions.
 
 The signatures you mention above (similar to sprintf but using an IN/OUT
 parameter) were my first attempt at a standard encode/decode signature. I
 agree with you that at first it looks a bit more appealing for some reason,
 however after using it for a while I actually found it quite cumbersome
 relative to the traditional snprintf style. In the common case it takes
 almost twice as much code to use it since you always have to declare,
 initialize, and check an extra variable in separate steps due to the IN/OUT
 parameter. It also doesn't map that nicely when you swig it since most
 languages don't do the IN/OUT parameter thing well, so you have to have
 more cumborsome/custom typemaps and/or wrapper code to deal with the
 impedance mismatch.
 
 For those reasons I propose that we standardize on the traditional snprintf
 rather than using the IN/OUT variant.

That works for me, now how do we manage the transition? I don't think we
can afford to inflict yum update proton; all proton apps crash on our
users. That means we cannot change the behavior of existing function
names. I don't much like adding encode_foo2 for every encode_foo but I
don't see what else to do.




Re: codec changes

2015-04-08 Thread Alan Conway
On Tue, 2015-04-07 at 17:57 -0400, Rafael Schloming wrote:
 Maybe I'm not following something, but I don't see how passing around
 allocation functions actually solves any ownership problems. I would think
 the ownership problems come from pn_data_t holding onto a pointer
 regardless of whether that pointer was gotten from malloc or from a
 callback. I'm assuming you want to be able to do something like use a
 pn_data_t that is owned by one thread to encode into a buffer and pass that
 buffer over to another thread. How does passing in a bunch of allocators
 help with this? If the pn_data_t holds a pointer to whatever those
 allocators return, then aren't you going to have ownership issues no matter
 what?
 
 To answer Bozzo's original question, I think that it's good to keep the
 encoder/decoder decoupled from the buffer for a number of reasons. In
 addition to the ownership issues that Alan points out, the encoded data may
 have a different lifecycle from the pn_data_t that created it for non
 thread related reasons, or you may simply want to encode directly into a
 frame buffer and avoid an extra copy.
 
 If the goal here is simply to provide a convenient way to avoid having to
 repeat the resizing loop then I would suggest simply providing a
 convenience API that accepts a growable buffer of some sort. This provides
 both convenience and avoids ownership issues with holding pointers. I'm
 thinking something along the lines of:
 
 pn_data_t data = ...;
 pn_string/buffer_t buf = ...;
 pn_data_encode_conveniently(data, buf); // obviously needs a better name
 
 It is perhaps not *quite* as convenient in the minimal case as pn_data_t
 holding the buffer internally, but it is an improvement in general and
 probably simpler than having to mess with function pointers in the case
 where the buffer's lifecycle is independent from the pn_data_t. (Not that I
 really understand how that would work anyways.)

The issue is that we need buffer growth AND control over allocation.
pn_buffer_t forces use of malloc/free/realloc. That won't help if you're
trying to get the data into a buffer allocated by Go for example. I
agree a growable buffer type is a nicer API than raw function pointers,
but the buffer type itself would need to use function pointers so we can
replace the functions used for alloc/free/realloc.




Re: codec changes

2015-04-08 Thread Andrew Stitcher
On Wed, 2015-04-08 at 10:48 -0400, Alan Conway wrote:
 ...
 The issue is that we need buffer growth AND control over allocation.
 pn_buffer_t forces use of malloc/free/realloc. That won't help if you're
 trying to get the data into a buffer allocated by Go for example. I
 agree a growable buffer type is a nicer API than raw function pointers,
 but the buffer type itself would need to use function pointers so we can
 replace the functions used for alloc/free/realloc.

I think this is a pretty general issue actually that also crops up in
embedded systems, where you want some different control over memory
allocation.

I think it might be better addressed by making malloc/free/realloc
replaceable for the whole of proton - would that solve your go issue?

Andrew




Re: codec changes

2015-04-08 Thread Rafael Schloming
On Wed, Apr 8, 2015 at 10:48 AM, Alan Conway acon...@redhat.com wrote:

 On Tue, 2015-04-07 at 17:57 -0400, Rafael Schloming wrote:
  Maybe I'm not following something, but I don't see how passing around
  allocation functions actually solves any ownership problems. I would
 think
  the ownership problems come from pn_data_t holding onto a pointer
  regardless of whether that pointer was gotten from malloc or from a
  callback. I'm assuming you want to be able to do something like use a
  pn_data_t that is owned by one thread to encode into a buffer and pass
 that
  buffer over to another thread. How does passing in a bunch of allocators
  help with this? If the pn_data_t holds a pointer to whatever those
  allocators return, then aren't you going to have ownership issues no
 matter
  what?
 
  To answer Bozzo's original question, I think that it's good to keep the
  encoder/decoder decoupled from the buffer for a number of reasons. In
  addition to the ownership issues that Alan points out, the encoded data
 may
  have a different lifecycle from the pn_data_t that created it for non
  thread related reasons, or you may simply want to encode directly into a
  frame buffer and avoid an extra copy.
 
  If the goal here is simply to provide a convenient way to avoid having to
  repeat the resizing loop then I would suggest simply providing a
  convenience API that accepts a growable buffer of some sort. This
 provides
  both convenience and avoids ownership issues with holding pointers. I'm
  thinking something along the lines of:
 
  pn_data_t data = ...;
  pn_string/buffer_t buf = ...;
  pn_data_encode_conveniently(data, buf); // obviously needs a better
 name
 
  It is perhaps not *quite* as convenient in the minimal case as pn_data_t
  holding the buffer internally, but it is an improvement in general and
  probably simpler than having to mess with function pointers in the case
  where the buffer's lifecycle is independent from the pn_data_t. (Not
 that I
  really understand how that would work anyways.)

 The issue is that we need buffer growth AND control over allocation.
 pn_buffer_t forces use of malloc/free/realloc. That won't help if you're
 trying to get the data into a buffer allocated by Go for example. I
 agree a growable buffer type is a nicer API than raw function pointers,
 but the buffer type itself would need to use function pointers so we can
 replace the functions used for alloc/free/realloc.


I'm skeptical that passing around a set of function pointers whether
directly to pn_data_t or to pn_buffer_t is actually fewer lines of code
than simply writing the analogous convenience encoding function for that
language. For python it would likely be fewer lines of code to simply
define something like this:

ssize_t pn_data_encode2bytearray(pn_data_t *data, PyObject *bytearray) {
ssize_t size = pn_data_encode(data,
PyByteArray_AsString(bytearray), PyByteArray_Size(bytearray));
if (size  0) { return size; }
if (size  PyByteArray_Size(bytearray)) {
PyByteArray_Resize(bytearray, size);
return pn_data_encode(data, PyByteArray_AsString(bytearray),
PyByteArray_Size(bytearray));
}
}

This would also be more flexible since you could pass in any bytearray
object from python, reuse it as much as you like, and have full control
over the lifecycle of that object, whereas I suspect given the signature of
the function pointers you are defining you would need to choose some sort
of predefined allocation strategy for creating whatever go object you are
encoding into. This sounds to me like it is both less convenient and less
flexible, but its entirely possible I'm just not understanding how you
intend to use this. Perhaps it would clarify things if you could post an
actual example usage of how you imagine using this that hopefully
illustrates why it is fewer lines of code than the alternatives?

--Rafael


Re: codec changes

2015-04-07 Thread Alan Conway
On Tue, 2015-03-31 at 19:17 +0200, Božo Dragojevič wrote:
 Given the memory overhead of a pn_data_t before encoding, why not have it
 own an encode buffer? it could get by with exactly that grow_buffer()
 callback if ownership is the issue .

I like this idea a lot. Ownership is an issue for me in Go so I would
want an alloc/free callback but we could have a non-callback version as
well (simple wrapper for the callback version with default callbacks)
for users that don't care about ownership - what do others think?
Something like:

// if allocfn or freefn is NULL or allocfn returns NULL return EOS on out of 
data.
ssize_t pn_data_encode_alloc(pn_data_t *data, char *bytes, size_t size,
  void*(*allocfn)(size_t), void (*freefn)(void*));

// Compatibility
ssize_t pn_data_encode(pn_data_t *data, char *bytes, size_t size) {
  return pn_data_encode_alloc(data, bytes, size, NULL, NULL);
}

// Convenience when caller would like encode to do buffer management for them.
// If *bytes is not null it is the initial buffer, if NULL encode allocates.
// *bytes is updated to point to the final buffer which must be freed with 
free()
int *pn_data_encode_grow(pn_data_t *data, char **bytes, size_t size) {
  return pn_data_encode_alloc(data, bytes, size, pni_realloc, free)
}

Note:
- alloc called 0 or 1 times, and is passed the exact size needed to encode.
- user is free to allocate more (e.g. doubling strategies)
- free called after alloc, if and only if alloc is called.
- separate free instead of single realloc call: encode impl decides
if/how much data to copy from old to new buffer.

This can probably be made a bit simpler.

 Bozzo
 On Mar 31, 2015 6:10 PM, Rafael Schloming r...@alum.mit.edu wrote:
 
  Hi Alan,
 
  Sorry I didn't comment on this sooner, I didn't have time to comment on
  your original review request during my travels, however I do have some
  thoughts on the changes you made to the codec interface. I noticed you
  added a separate accessor for the size:
 
  ssize_t pn_data_encoded_size(pn_data_t *data);
 
  This is alongside the original encode method:
 
  ssize_t pn_data_encode(pn_data_t *data, char *bytes, size_t size);
 
  I think this API choice while nice in that it is backwards compatible is
  also going to result in code that is roughly twice as slow as it needs to
  be in the most common case. Based on my experience implementing and
  profiling codec in C, Python, and Java, computing the size of the encoded
  data seems to usually be roughly the same amount of work as actually
  encoding it regardless of the implementation language. Therefore code like
  this:
 
  if (buffer_size()  pn_data_encoded_size(data)) grow_buffer();
  pn_data_encode(data, buffer, buffer_size());
 
  Can end up being roughly twice as slow as code like this:
 
  ssize_t err;
  while ((err = pn_data_encode(data, buffer, buffer_size())) ==
  PN_OVERFLOW) {
  grow_buffer();
  }
 
  Admittedly the latter form is much more awkward in those cases where you
  don't care about performance, so I'm all for providing something nicer, but
  I think a better API change would be to steal a page from the C stdio.h
  APIs and have pn_data_encode always return the number of bytes that would
  have been written had there been enough space. This allows you to write the
  simplified encode as above:
 
  if (buffer_size()  pn_data_encode(data, NULL, 0)) grow_buffer();
  pn_data_encode(data, buffer, buffer_size());
 
  Or use a more optimal form:
 
 ssize_t n = pn_data_encode(data, buffer, buffer_size());
 if (n  buffer_size()) {
 grow_buffer();
 pn_data_encode(data, buffer, buffer_size());
 }
 
  This makes the slow/convenient form possible, and provides some options
  that are a bit less awkward than the loop, but it also makes it very clear
  that when you use the slow/convenient form you are incurring roughly twice
  the cost of the alternative.
 
  Normally I wouldn't be overly fussed by something like this, and I realize
  what I'm suggesting is a breaking change relative to what you provided, but
  based on what profiling we've done in the past, codec is probably the most
  significant source of overhead that we add to an application, and exactly
  this sort of double encode effect is almost always one of the first things
  you hit when you try to optimize. Given this, I think it would be a good
  thing if the API accurately reflects the relative cost of the different
  styles of use.
 
  Thoughts?
 
  --Rafael
 




Re: codec changes

2015-04-07 Thread Andrew Stitcher
On Tue, 2015-04-07 at 09:38 -0400, Alan Conway wrote:
 On Tue, 2015-03-31 at 19:17 +0200, Božo Dragojevič wrote:
  Given the memory overhead of a pn_data_t before encoding, why not have it
  own an encode buffer? it could get by with exactly that grow_buffer()
  callback if ownership is the issue .
 

I think the best way to do this would be to introduce a new class to sit
on top of the existing pn_data_t which does this, rather than extending
the current pn_data_t.

So I think the below is fine, but I'd prefer to avoid stuffing it all
into pn_data_t especially as I think the class is, long term, going
away.

Andrew

 I like this idea a lot. Ownership is an issue for me in Go so I would
 want an alloc/free callback but we could have a non-callback version as
 well (simple wrapper for the callback version with default callbacks)
 for users that don't care about ownership - what do others think?
 Something like:
 
 // if allocfn or freefn is NULL or allocfn returns NULL return EOS on out of 
 data.
 ssize_t pn_data_encode_alloc(pn_data_t *data, char *bytes, size_t size,
   void*(*allocfn)(size_t), void (*freefn)(void*));
 
 // Compatibility
 ssize_t pn_data_encode(pn_data_t *data, char *bytes, size_t size) {
   return pn_data_encode_alloc(data, bytes, size, NULL, NULL);
 }
 
 // Convenience when caller would like encode to do buffer management for them.
 // If *bytes is not null it is the initial buffer, if NULL encode allocates.
 // *bytes is updated to point to the final buffer which must be freed with 
 free()
 int *pn_data_encode_grow(pn_data_t *data, char **bytes, size_t size) {
   return pn_data_encode_alloc(data, bytes, size, pni_realloc, free)
 }
 
 Note:
 - alloc called 0 or 1 times, and is passed the exact size needed to encode.
 - user is free to allocate more (e.g. doubling strategies)
 - free called after alloc, if and only if alloc is called.
 - separate free instead of single realloc call: encode impl decides
 if/how much data to copy from old to new buffer.
 
 This can probably be made a bit simpler.
 
  Bozzo
  On Mar 31, 2015 6:10 PM, Rafael Schloming r...@alum.mit.edu wrote:
  
   Hi Alan,
  
   Sorry I didn't comment on this sooner, I didn't have time to comment on
   your original review request during my travels, however I do have some
   thoughts on the changes you made to the codec interface. I noticed you
   added a separate accessor for the size:
  
   ssize_t pn_data_encoded_size(pn_data_t *data);
  
   This is alongside the original encode method:
  
   ssize_t pn_data_encode(pn_data_t *data, char *bytes, size_t size);
  
   I think this API choice while nice in that it is backwards compatible is
   also going to result in code that is roughly twice as slow as it needs to
   be in the most common case. Based on my experience implementing and
   profiling codec in C, Python, and Java, computing the size of the encoded
   data seems to usually be roughly the same amount of work as actually
   encoding it regardless of the implementation language. Therefore code like
   this:
  
   if (buffer_size()  pn_data_encoded_size(data)) grow_buffer();
   pn_data_encode(data, buffer, buffer_size());
  
   Can end up being roughly twice as slow as code like this:
  
   ssize_t err;
   while ((err = pn_data_encode(data, buffer, buffer_size())) ==
   PN_OVERFLOW) {
   grow_buffer();
   }
  
   Admittedly the latter form is much more awkward in those cases where you
   don't care about performance, so I'm all for providing something nicer, 
   but
   I think a better API change would be to steal a page from the C stdio.h
   APIs and have pn_data_encode always return the number of bytes that would
   have been written had there been enough space. This allows you to write 
   the
   simplified encode as above:
  
   if (buffer_size()  pn_data_encode(data, NULL, 0)) grow_buffer();
   pn_data_encode(data, buffer, buffer_size());
  
   Or use a more optimal form:
  
  ssize_t n = pn_data_encode(data, buffer, buffer_size());
  if (n  buffer_size()) {
  grow_buffer();
  pn_data_encode(data, buffer, buffer_size());
  }
  
   This makes the slow/convenient form possible, and provides some options
   that are a bit less awkward than the loop, but it also makes it very clear
   that when you use the slow/convenient form you are incurring roughly twice
   the cost of the alternative.
  
   Normally I wouldn't be overly fussed by something like this, and I realize
   what I'm suggesting is a breaking change relative to what you provided, 
   but
   based on what profiling we've done in the past, codec is probably the most
   significant source of overhead that we add to an application, and exactly
   this sort of double encode effect is almost always one of the first things
   you hit when you try to optimize. Given this, I think it would be a good
   thing if the API accurately reflects the relative cost of the different
   styles of 

Re: codec changes

2015-04-07 Thread Bozo Dragojevic
On 7. 04. 15 15.38, Alan Conway wrote:
 On Tue, 2015-03-31 at 19:17 +0200, Božo Dragojevič wrote:
 Given the memory overhead of a pn_data_t before encoding, why not have it
 own an encode buffer? it could get by with exactly that grow_buffer()
 callback if ownership is the issue .
 I like this idea a lot. Ownership is an issue for me in Go so I would
 want an alloc/free callback but we could have a non-callback version as
 well (simple wrapper for the callback version with default callbacks)
 for users that don't care about ownership - what do others think?


[snip]

 Note:
 - alloc called 0 or 1 times, and is passed the exact size needed to encode.

I would hate to have the encoder know a-priori the exact size of allocation.
Especially if we want to, at some point be able to generate a
{LIST|MAP|ARRAY}8 encodings.
I think we can aim for upper bound, but leave out exact-ness.

calculating encoding size for non-composite atoms is easy and cheap so
it could be moved from
encoding to pn_data_put_*() mutators. as long as uuid is part of atom
union, adding another int will not hurt :P
Calculating conservative size for composites is also cheap.

Maybe pn_data_t could even be made to encode as it goes and would thus
not need the extra buffer *and* copying for
variable-sized atoms (strings and binary) and each atom could be reduced
to a offset into the encoded buffer.


 - user is free to allocate more (e.g. doubling strategies)
 - free called after alloc, if and only if alloc is called.
 - separate free instead of single realloc call: encode impl decides
 if/how much data to copy from old to new buffer.

Partial copying of encoded buffers is also problematic, because encoder
has to backtrack to fill out byte sizes for structured types.
So at best we could do a scatter-gather thing, which quickly becomes
another mess as the pn_data_t then needs
a much better arithmetic for offset and size calculations. I really
think we should just offer exactly realloc() for now.


Bozzo




Re: codec changes

2015-04-07 Thread Alan Conway
On Tue, 2015-04-07 at 11:00 -0400, Andrew Stitcher wrote:
 On Tue, 2015-04-07 at 09:38 -0400, Alan Conway wrote:
  On Tue, 2015-03-31 at 19:17 +0200, Božo Dragojevič wrote:
   Given the memory overhead of a pn_data_t before encoding, why not have it
   own an encode buffer? it could get by with exactly that grow_buffer()
   callback if ownership is the issue .
  
 
 I think the best way to do this would be to introduce a new class to sit
 on top of the existing pn_data_t which does this, rather than extending
 the current pn_data_t.
 
 So I think the below is fine, but I'd prefer to avoid stuffing it all
 into pn_data_t especially as I think the class is, long term, going
 away.

Who's replacing it ;) ? Are they taking notes?

Cheers,
Alan.



Re: codec changes

2015-04-07 Thread Rafael Schloming
Maybe I'm not following something, but I don't see how passing around
allocation functions actually solves any ownership problems. I would think
the ownership problems come from pn_data_t holding onto a pointer
regardless of whether that pointer was gotten from malloc or from a
callback. I'm assuming you want to be able to do something like use a
pn_data_t that is owned by one thread to encode into a buffer and pass that
buffer over to another thread. How does passing in a bunch of allocators
help with this? If the pn_data_t holds a pointer to whatever those
allocators return, then aren't you going to have ownership issues no matter
what?

To answer Bozzo's original question, I think that it's good to keep the
encoder/decoder decoupled from the buffer for a number of reasons. In
addition to the ownership issues that Alan points out, the encoded data may
have a different lifecycle from the pn_data_t that created it for non
thread related reasons, or you may simply want to encode directly into a
frame buffer and avoid an extra copy.

If the goal here is simply to provide a convenient way to avoid having to
repeat the resizing loop then I would suggest simply providing a
convenience API that accepts a growable buffer of some sort. This provides
both convenience and avoids ownership issues with holding pointers. I'm
thinking something along the lines of:

pn_data_t data = ...;
pn_string/buffer_t buf = ...;
pn_data_encode_conveniently(data, buf); // obviously needs a better name

It is perhaps not *quite* as convenient in the minimal case as pn_data_t
holding the buffer internally, but it is an improvement in general and
probably simpler than having to mess with function pointers in the case
where the buffer's lifecycle is independent from the pn_data_t. (Not that I
really understand how that would work anyways.)

--Rafael


On Tue, Apr 7, 2015 at 1:21 PM, Alan Conway acon...@redhat.com wrote:

 On Tue, 2015-04-07 at 11:00 -0400, Andrew Stitcher wrote:
  On Tue, 2015-04-07 at 09:38 -0400, Alan Conway wrote:
   On Tue, 2015-03-31 at 19:17 +0200, Božo Dragojevič wrote:
Given the memory overhead of a pn_data_t before encoding, why not
 have it
own an encode buffer? it could get by with exactly that grow_buffer()
callback if ownership is the issue .
  
 
  I think the best way to do this would be to introduce a new class to sit
  on top of the existing pn_data_t which does this, rather than extending
  the current pn_data_t.
 
  So I think the below is fine, but I'd prefer to avoid stuffing it all
  into pn_data_t especially as I think the class is, long term, going
  away.

 Who's replacing it ;) ? Are they taking notes?

 Cheers,
 Alan.




codec changes

2015-03-31 Thread Rafael Schloming
Hi Alan,

Sorry I didn't comment on this sooner, I didn't have time to comment on
your original review request during my travels, however I do have some
thoughts on the changes you made to the codec interface. I noticed you
added a separate accessor for the size:

ssize_t pn_data_encoded_size(pn_data_t *data);

This is alongside the original encode method:

ssize_t pn_data_encode(pn_data_t *data, char *bytes, size_t size);

I think this API choice while nice in that it is backwards compatible is
also going to result in code that is roughly twice as slow as it needs to
be in the most common case. Based on my experience implementing and
profiling codec in C, Python, and Java, computing the size of the encoded
data seems to usually be roughly the same amount of work as actually
encoding it regardless of the implementation language. Therefore code like
this:

if (buffer_size()  pn_data_encoded_size(data)) grow_buffer();
pn_data_encode(data, buffer, buffer_size());

Can end up being roughly twice as slow as code like this:

ssize_t err;
while ((err = pn_data_encode(data, buffer, buffer_size())) ==
PN_OVERFLOW) {
grow_buffer();
}

Admittedly the latter form is much more awkward in those cases where you
don't care about performance, so I'm all for providing something nicer, but
I think a better API change would be to steal a page from the C stdio.h
APIs and have pn_data_encode always return the number of bytes that would
have been written had there been enough space. This allows you to write the
simplified encode as above:

if (buffer_size()  pn_data_encode(data, NULL, 0)) grow_buffer();
pn_data_encode(data, buffer, buffer_size());

Or use a more optimal form:

   ssize_t n = pn_data_encode(data, buffer, buffer_size());
   if (n  buffer_size()) {
   grow_buffer();
   pn_data_encode(data, buffer, buffer_size());
   }

This makes the slow/convenient form possible, and provides some options
that are a bit less awkward than the loop, but it also makes it very clear
that when you use the slow/convenient form you are incurring roughly twice
the cost of the alternative.

Normally I wouldn't be overly fussed by something like this, and I realize
what I'm suggesting is a breaking change relative to what you provided, but
based on what profiling we've done in the past, codec is probably the most
significant source of overhead that we add to an application, and exactly
this sort of double encode effect is almost always one of the first things
you hit when you try to optimize. Given this, I think it would be a good
thing if the API accurately reflects the relative cost of the different
styles of use.

Thoughts?

--Rafael


Re: codec changes

2015-03-31 Thread Božo Dragojevič
Given the memory overhead of a pn_data_t before encoding, why not have it
own an encode buffer? it could get by with exactly that grow_buffer()
callback if ownership is the issue .

Bozzo
On Mar 31, 2015 6:10 PM, Rafael Schloming r...@alum.mit.edu wrote:

 Hi Alan,

 Sorry I didn't comment on this sooner, I didn't have time to comment on
 your original review request during my travels, however I do have some
 thoughts on the changes you made to the codec interface. I noticed you
 added a separate accessor for the size:

 ssize_t pn_data_encoded_size(pn_data_t *data);

 This is alongside the original encode method:

 ssize_t pn_data_encode(pn_data_t *data, char *bytes, size_t size);

 I think this API choice while nice in that it is backwards compatible is
 also going to result in code that is roughly twice as slow as it needs to
 be in the most common case. Based on my experience implementing and
 profiling codec in C, Python, and Java, computing the size of the encoded
 data seems to usually be roughly the same amount of work as actually
 encoding it regardless of the implementation language. Therefore code like
 this:

 if (buffer_size()  pn_data_encoded_size(data)) grow_buffer();
 pn_data_encode(data, buffer, buffer_size());

 Can end up being roughly twice as slow as code like this:

 ssize_t err;
 while ((err = pn_data_encode(data, buffer, buffer_size())) ==
 PN_OVERFLOW) {
 grow_buffer();
 }

 Admittedly the latter form is much more awkward in those cases where you
 don't care about performance, so I'm all for providing something nicer, but
 I think a better API change would be to steal a page from the C stdio.h
 APIs and have pn_data_encode always return the number of bytes that would
 have been written had there been enough space. This allows you to write the
 simplified encode as above:

 if (buffer_size()  pn_data_encode(data, NULL, 0)) grow_buffer();
 pn_data_encode(data, buffer, buffer_size());

 Or use a more optimal form:

ssize_t n = pn_data_encode(data, buffer, buffer_size());
if (n  buffer_size()) {
grow_buffer();
pn_data_encode(data, buffer, buffer_size());
}

 This makes the slow/convenient form possible, and provides some options
 that are a bit less awkward than the loop, but it also makes it very clear
 that when you use the slow/convenient form you are incurring roughly twice
 the cost of the alternative.

 Normally I wouldn't be overly fussed by something like this, and I realize
 what I'm suggesting is a breaking change relative to what you provided, but
 based on what profiling we've done in the past, codec is probably the most
 significant source of overhead that we add to an application, and exactly
 this sort of double encode effect is almost always one of the first things
 you hit when you try to optimize. Given this, I think it would be a good
 thing if the API accurately reflects the relative cost of the different
 styles of use.

 Thoughts?

 --Rafael