If this turns out to be the problem then please file a JIRA for it. That code is some of the oldest code in proton and there has been a nice efficient circular buffer implementation available in the codebase for ages now.
--Rafael On Mon, Oct 27, 2014 at 1:35 PM, Gordon Sim <[email protected]> wrote: > On 10/27/2014 04:39 PM, Bozo Dragojevic wrote: > >> On 27. 10. 14 13:16, Michael Goulish wrote: >> >>> You know, I thought of something along those lines, but I can't >>> see how it makes the receiver actually use less CPU permanently. >>> It seems like it ought to simply get a backlog, but go back >>> to normal CPU usage. >>> >> >> My guess is that sender creates for itself some kind of internal backlog >> that it never recovers from. >> consequence of which is a bored receiver. >> > > > There is an internal buffer to which proton writes encoded frames. These > are then written to the socket and any remaining data is moved up to the > front of the buffer. If there is not enough space when encoding a frame, > the buffer is expanded (I think it may be doubled?). The longer the buffer > gets, the more expensive (and the more likely) the moving of data becomes. > > This may not be possible with your example (I don't know how the driver > and connectors work as I don't use them myself), but if the top half can > ever generate frames faster than they get written out to the wire, the > buffer will expand and may cause an inefficiency that will then never go > away, even if the circumstance that caused the build up and expansion > ceases to be relevant. > > Sender should be more or less CPU bound. so for example, if sender is >> for some reason forced to >> get another delivery on one of the linked lists, it will always from >> there on have to traverse two >> list items to send one delivery (wildly assuming that it takes more than >> one round >> thru driver_wait and all the rest to clean things up) so if sender >> rate-limits itself to >> create a new delivery only after previous one is sent, it can clean up >> it's backlog. >> >>> Can you think of any way that a backlog would cause receiver to >>> stay at low CPU? >>> >>> >> Yeah, a sender that needs more CPU cycles per message. >> >> Now that i can make this happen easily instead of waiting forever- >>> and-a-day, I will get callgrind snapshots of both programs when the >>> test is fast and slow. It seems like that just must show me >>> something. >>> >> > If it is the internal buffer, it should be fairly obvious. (You'll see > lots more time spent in ::memmove()). >
