Re: Observations on the performance of the proton event model
On Thu, Dec 11, 2014 at 6:34 PM, Ken Giusti kgiu...@redhat.com wrote: - Original Message - From: Rafael Schloming r...@alum.mit.edu To: proton@qpid.apache.org Sent: Wednesday, December 10, 2014 5:41:20 PM Subject: Re: Observations on the performance of the proton event model On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com wrote: Hi, I've been working on a simple patch to qpidd that ports the AMQP 1.0 module to the new event interface provided by proton 0.8. See https://issues.apache.org/jira/browse/QPID-6255 for the patch. With the above patch, I've noticed a pretty consistent small drop in overall qpidd performance as gauged by qpid-cpp-benchmark (see comments in above jira). Turns out the event loop is doing a lot more work when compared to the polled approach for the same message load. Digging around a bit, there are a couple of issues that result in qpidd doing a lot of unnecessary work: 1) the PN_TRANSPORT event isn't needed by this driver - pending output is manually checked at a later point. In my test, approximately 25% of the total events are PN_TRANSPORT events, which the driver simply discards 2) A more serious issue - I get a PN_LINK_FLOW event for _every_ message transfer! Turns out that PN_LINK_FLOW is being issued for two different events (IMHO): when a flow frame is received (yay) and each time a transfer is done and credit is consumed (ugh). Item #2 seems like a bug - these two events have different semantic meaning and would likely result in different processing paths in the driver (in the case of qpidd, the credit consumed case would be ignored). It's not a bug, it was added intentionally since there are circumstances where you get stalls if you don't have it. Each time a message is sent you are actually updating the credit values on the link, and so if you don't generate the event at that point, you need some fairly complex/subtle code in your handlers to compensate. (You may actually have a bug yourself now depending on how you've integrated the engine.) Yes, I can see your point - processing a series of credit added/credit removed events doesn't really make sense. In the end you're really just concerned with the credit level at the point where you're servicing the link. The event shouldn't be generated for every message though, just once for each batch of messages. In other words if I stuff a bunch of messages into the engine at the same time, there should be only one flow event produced once the engine has finished writing messages to the wire. If you're actually observing one flow per message then either there is a bug in the logic that elides duplicate events, or you're stuffing one message into the engine at a time. Hmmm... AFAIKT, it would seem that the existing broker code iterates over each consuming link, and issues one message per link. Total conjecture here, but I thought that was desired, esp. in the case of multiple consumers on a shared queue. Gordon could possibly shed some light on this. In any case, if an application is load balancing messages across several outgoing links, won't each send result in a FLOW event? IOW - certain legitimate messaging patterns will result in that 1 Flow for every send. Or am I missing something? I think you're right, in that particular case the automatic duplicate detection that the collector does wouldn't eliminate the extraneous flow events. I believe we could fix this though by adjusting the logic that produces those flow events. I.e. first write all outstanding messages onto the wire and then produce flow events for only those links that were touched in that process. That strategy should be robust to the usage pattern you describe. --Rafael
Re: Observations on the performance of the proton event model
- Original Message - From: Rafael Schloming r...@alum.mit.edu To: proton@qpid.apache.org Sent: Friday, December 12, 2014 10:21:04 AM Subject: Re: Observations on the performance of the proton event model On Thu, Dec 11, 2014 at 6:34 PM, Ken Giusti kgiu...@redhat.com wrote: - Original Message - From: Rafael Schloming r...@alum.mit.edu To: proton@qpid.apache.org Sent: Wednesday, December 10, 2014 5:41:20 PM Subject: Re: Observations on the performance of the proton event model On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com wrote: Hi, I've been working on a simple patch to qpidd that ports the AMQP 1.0 module to the new event interface provided by proton 0.8. See https://issues.apache.org/jira/browse/QPID-6255 for the patch. With the above patch, I've noticed a pretty consistent small drop in overall qpidd performance as gauged by qpid-cpp-benchmark (see comments in above jira). Turns out the event loop is doing a lot more work when compared to the polled approach for the same message load. Digging around a bit, there are a couple of issues that result in qpidd doing a lot of unnecessary work: 1) the PN_TRANSPORT event isn't needed by this driver - pending output is manually checked at a later point. In my test, approximately 25% of the total events are PN_TRANSPORT events, which the driver simply discards 2) A more serious issue - I get a PN_LINK_FLOW event for _every_ message transfer! Turns out that PN_LINK_FLOW is being issued for two different events (IMHO): when a flow frame is received (yay) and each time a transfer is done and credit is consumed (ugh). Item #2 seems like a bug - these two events have different semantic meaning and would likely result in different processing paths in the driver (in the case of qpidd, the credit consumed case would be ignored). It's not a bug, it was added intentionally since there are circumstances where you get stalls if you don't have it. Each time a message is sent you are actually updating the credit values on the link, and so if you don't generate the event at that point, you need some fairly complex/subtle code in your handlers to compensate. (You may actually have a bug yourself now depending on how you've integrated the engine.) Yes, I can see your point - processing a series of credit added/credit removed events doesn't really make sense. In the end you're really just concerned with the credit level at the point where you're servicing the link. The event shouldn't be generated for every message though, just once for each batch of messages. In other words if I stuff a bunch of messages into the engine at the same time, there should be only one flow event produced once the engine has finished writing messages to the wire. If you're actually observing one flow per message then either there is a bug in the logic that elides duplicate events, or you're stuffing one message into the engine at a time. Hmmm... AFAIKT, it would seem that the existing broker code iterates over each consuming link, and issues one message per link. Total conjecture here, but I thought that was desired, esp. in the case of multiple consumers on a shared queue. Gordon could possibly shed some light on this. In any case, if an application is load balancing messages across several outgoing links, won't each send result in a FLOW event? IOW - certain legitimate messaging patterns will result in that 1 Flow for every send. Or am I missing something? I think you're right, in that particular case the automatic duplicate detection that the collector does wouldn't eliminate the extraneous flow events. I believe we could fix this though by adjusting the logic that produces those flow events. I.e. first write all outstanding messages onto the wire and then produce flow events for only those links that were touched in that process. That strategy should be robust to the usage pattern you describe. +1 - that would be ideal. I'll open a JIRA for that. Since the original proton work loop in qpidd didn't check for flow changes (it manually checks each sender link when it wants to generate output) I'll have the patch simply discard the flow events for now and use the existing link handling code. It scales back the patch a bit, but leaves door open for further optimization. One last question: you had mentioned that ignoring the flow events generated when credit is consumed can lead to a deadlock. Can you give a little more detail on how that happens? I want to make sure that qpidd won't trip up on that. Thanks, --Rafael -- -K
Re: Observations on the performance of the proton event model
On Fri, Dec 12, 2014 at 10:56 AM, Ken Giusti kgiu...@redhat.com wrote: - Original Message - From: Rafael Schloming r...@alum.mit.edu To: proton@qpid.apache.org Sent: Friday, December 12, 2014 10:21:04 AM Subject: Re: Observations on the performance of the proton event model On Thu, Dec 11, 2014 at 6:34 PM, Ken Giusti kgiu...@redhat.com wrote: - Original Message - From: Rafael Schloming r...@alum.mit.edu To: proton@qpid.apache.org Sent: Wednesday, December 10, 2014 5:41:20 PM Subject: Re: Observations on the performance of the proton event model On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com wrote: Hi, I've been working on a simple patch to qpidd that ports the AMQP 1.0 module to the new event interface provided by proton 0.8. See https://issues.apache.org/jira/browse/QPID-6255 for the patch. With the above patch, I've noticed a pretty consistent small drop in overall qpidd performance as gauged by qpid-cpp-benchmark (see comments in above jira). Turns out the event loop is doing a lot more work when compared to the polled approach for the same message load. Digging around a bit, there are a couple of issues that result in qpidd doing a lot of unnecessary work: 1) the PN_TRANSPORT event isn't needed by this driver - pending output is manually checked at a later point. In my test, approximately 25% of the total events are PN_TRANSPORT events, which the driver simply discards 2) A more serious issue - I get a PN_LINK_FLOW event for _every_ message transfer! Turns out that PN_LINK_FLOW is being issued for two different events (IMHO): when a flow frame is received (yay) and each time a transfer is done and credit is consumed (ugh). Item #2 seems like a bug - these two events have different semantic meaning and would likely result in different processing paths in the driver (in the case of qpidd, the credit consumed case would be ignored). It's not a bug, it was added intentionally since there are circumstances where you get stalls if you don't have it. Each time a message is sent you are actually updating the credit values on the link, and so if you don't generate the event at that point, you need some fairly complex/subtle code in your handlers to compensate. (You may actually have a bug yourself now depending on how you've integrated the engine.) Yes, I can see your point - processing a series of credit added/credit removed events doesn't really make sense. In the end you're really just concerned with the credit level at the point where you're servicing the link. The event shouldn't be generated for every message though, just once for each batch of messages. In other words if I stuff a bunch of messages into the engine at the same time, there should be only one flow event produced once the engine has finished writing messages to the wire. If you're actually observing one flow per message then either there is a bug in the logic that elides duplicate events, or you're stuffing one message into the engine at a time. Hmmm... AFAIKT, it would seem that the existing broker code iterates over each consuming link, and issues one message per link. Total conjecture here, but I thought that was desired, esp. in the case of multiple consumers on a shared queue. Gordon could possibly shed some light on this. In any case, if an application is load balancing messages across several outgoing links, won't each send result in a FLOW event? IOW - certain legitimate messaging patterns will result in that 1 Flow for every send. Or am I missing something? I think you're right, in that particular case the automatic duplicate detection that the collector does wouldn't eliminate the extraneous flow events. I believe we could fix this though by adjusting the logic that produces those flow events. I.e. first write all outstanding messages onto the wire and then produce flow events for only those links that were touched in that process. That strategy should be robust to the usage pattern you describe. +1 - that would be ideal. I'll open a JIRA for that. Since the original proton work loop in qpidd didn't check for flow changes (it manually checks each sender link when it wants to generate output) I'll have the patch simply discard the flow events for now and use the existing link handling code. It scales back the patch a bit, but leaves door open for further optimization. One last question: you had mentioned that ignoring the flow events generated when credit is consumed can lead to a deadlock. Can you give a little more detail on how that happens? I want to make sure that qpidd won't trip up on that.
Re: Observations on the performance of the proton event model
- Original Message - From: Rafael Schloming r...@alum.mit.edu To: proton@qpid.apache.org Sent: Wednesday, December 10, 2014 5:41:20 PM Subject: Re: Observations on the performance of the proton event model On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com wrote: Hi, I've been working on a simple patch to qpidd that ports the AMQP 1.0 module to the new event interface provided by proton 0.8. See https://issues.apache.org/jira/browse/QPID-6255 for the patch. With the above patch, I've noticed a pretty consistent small drop in overall qpidd performance as gauged by qpid-cpp-benchmark (see comments in above jira). Turns out the event loop is doing a lot more work when compared to the polled approach for the same message load. Digging around a bit, there are a couple of issues that result in qpidd doing a lot of unnecessary work: 1) the PN_TRANSPORT event isn't needed by this driver - pending output is manually checked at a later point. In my test, approximately 25% of the total events are PN_TRANSPORT events, which the driver simply discards 2) A more serious issue - I get a PN_LINK_FLOW event for _every_ message transfer! Turns out that PN_LINK_FLOW is being issued for two different events (IMHO): when a flow frame is received (yay) and each time a transfer is done and credit is consumed (ugh). Item #2 seems like a bug - these two events have different semantic meaning and would likely result in different processing paths in the driver (in the case of qpidd, the credit consumed case would be ignored). It's not a bug, it was added intentionally since there are circumstances where you get stalls if you don't have it. Each time a message is sent you are actually updating the credit values on the link, and so if you don't generate the event at that point, you need some fairly complex/subtle code in your handlers to compensate. (You may actually have a bug yourself now depending on how you've integrated the engine.) Yes, I can see your point - processing a series of credit added/credit removed events doesn't really make sense. In the end you're really just concerned with the credit level at the point where you're servicing the link. The event shouldn't be generated for every message though, just once for each batch of messages. In other words if I stuff a bunch of messages into the engine at the same time, there should be only one flow event produced once the engine has finished writing messages to the wire. If you're actually observing one flow per message then either there is a bug in the logic that elides duplicate events, or you're stuffing one message into the engine at a time. Hmmm... AFAIKT, it would seem that the existing broker code iterates over each consuming link, and issues one message per link. Total conjecture here, but I thought that was desired, esp. in the case of multiple consumers on a shared queue. Gordon could possibly shed some light on this. In any case, if an application is load balancing messages across several outgoing links, won't each send result in a FLOW event? IOW - certain legitimate messaging patterns will result in that 1 Flow for every send. Or am I missing something? I propose we fix #2 by breaking up that event into two separate events, something like PN_LINK_REMOTE_FLOW for when flow is granted, and PN_LINK_LOCAL_FLOW when credit is consumed (not in love with these names btw, they seem consistent with the endpoint states) Furthermore, I think the event API would benefit from a way to 'opt-in' to specific events. For example, for qpidd we would not want to receive PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events. I've hacked my proton library to avoid generating PN_TRANSPORT and PN_LINK_FLOW on local credit consumption and that results in performance parity with the existing polled approach. Does this make sense? Other ideas? I don't think it makes sense to introduce a new event type. The two events mean very much the same thing, i.e. credit levels have changed. A distinct event type would only make sense if there were common cases where you wanted to react to the two different events differently, and having them be the same made that difficult or cumbersome. Of course that doesn't preclude us from adding some configuration to the engine that lets us control when the flow event is produced, however as Andrew points out that isn't very compatible with the whole event bus model of engine use. I think it's worth investigating exactly why you're getting so many of them since I wouldn't expect there to be one per message unless you are somehow forcing only one message to go through the engine at a time. It would also probably be good to look at exactly why ignored events are so expensive, perhaps isolate/measure how expensive they actually are as I'm not entirely convinced there aren't other factors in your
Observations on the performance of the proton event model
Hi, I've been working on a simple patch to qpidd that ports the AMQP 1.0 module to the new event interface provided by proton 0.8. See https://issues.apache.org/jira/browse/QPID-6255 for the patch. With the above patch, I've noticed a pretty consistent small drop in overall qpidd performance as gauged by qpid-cpp-benchmark (see comments in above jira). Turns out the event loop is doing a lot more work when compared to the polled approach for the same message load. Digging around a bit, there are a couple of issues that result in qpidd doing a lot of unnecessary work: 1) the PN_TRANSPORT event isn't needed by this driver - pending output is manually checked at a later point. In my test, approximately 25% of the total events are PN_TRANSPORT events, which the driver simply discards 2) A more serious issue - I get a PN_LINK_FLOW event for _every_ message transfer! Turns out that PN_LINK_FLOW is being issued for two different events (IMHO): when a flow frame is received (yay) and each time a transfer is done and credit is consumed (ugh). Item #2 seems like a bug - these two events have different semantic meaning and would likely result in different processing paths in the driver (in the case of qpidd, the credit consumed case would be ignored). I propose we fix #2 by breaking up that event into two separate events, something like PN_LINK_REMOTE_FLOW for when flow is granted, and PN_LINK_LOCAL_FLOW when credit is consumed (not in love with these names btw, they seem consistent with the endpoint states) Furthermore, I think the event API would benefit from a way to 'opt-in' to specific events. For example, for qpidd we would not want to receive PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events. I've hacked my proton library to avoid generating PN_TRANSPORT and PN_LINK_FLOW on local credit consumption and that results in performance parity with the existing polled approach. Does this make sense? Other ideas? -- -K
Re: Observations on the performance of the proton event model
An off the cuff thought - worth what you paid for it - On Wed, 2014-12-10 at 11:23 -0500, Ken Giusti wrote: ... Furthermore, I think the event API would benefit from a way to 'opt-in' to specific events. For example, for qpidd we would not want to receive PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events. As far as I can see it should be hardly more expensive to do this filtering in the listening code than in the proton code itself. So I'm a little suspicious that the large number of ignored events is causing the problem. If they are causing a lot of app work to happen then perhaps they can be filtered out closer to reception. I think it would be difficult to resolve filtering the events out in proton, with allowing multiple independent and ignorant of each other layers that can coexist. Andrew
Re: Observations on the performance of the proton event model
On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com wrote: Hi, I've been working on a simple patch to qpidd that ports the AMQP 1.0 module to the new event interface provided by proton 0.8. See https://issues.apache.org/jira/browse/QPID-6255 for the patch. With the above patch, I've noticed a pretty consistent small drop in overall qpidd performance as gauged by qpid-cpp-benchmark (see comments in above jira). Turns out the event loop is doing a lot more work when compared to the polled approach for the same message load. Digging around a bit, there are a couple of issues that result in qpidd doing a lot of unnecessary work: 1) the PN_TRANSPORT event isn't needed by this driver - pending output is manually checked at a later point. In my test, approximately 25% of the total events are PN_TRANSPORT events, which the driver simply discards 2) A more serious issue - I get a PN_LINK_FLOW event for _every_ message transfer! Turns out that PN_LINK_FLOW is being issued for two different events (IMHO): when a flow frame is received (yay) and each time a transfer is done and credit is consumed (ugh). Item #2 seems like a bug - these two events have different semantic meaning and would likely result in different processing paths in the driver (in the case of qpidd, the credit consumed case would be ignored). It's not a bug, it was added intentionally since there are circumstances where you get stalls if you don't have it. Each time a message is sent you are actually updating the credit values on the link, and so if you don't generate the event at that point, you need some fairly complex/subtle code in your handlers to compensate. (You may actually have a bug yourself now depending on how you've integrated the engine.) The event shouldn't be generated for every message though, just once for each batch of messages. In other words if I stuff a bunch of messages into the engine at the same time, there should be only one flow event produced once the engine has finished writing messages to the wire. If you're actually observing one flow per message then either there is a bug in the logic that elides duplicate events, or you're stuffing one message into the engine at a time. I propose we fix #2 by breaking up that event into two separate events, something like PN_LINK_REMOTE_FLOW for when flow is granted, and PN_LINK_LOCAL_FLOW when credit is consumed (not in love with these names btw, they seem consistent with the endpoint states) Furthermore, I think the event API would benefit from a way to 'opt-in' to specific events. For example, for qpidd we would not want to receive PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events. I've hacked my proton library to avoid generating PN_TRANSPORT and PN_LINK_FLOW on local credit consumption and that results in performance parity with the existing polled approach. Does this make sense? Other ideas? I don't think it makes sense to introduce a new event type. The two events mean very much the same thing, i.e. credit levels have changed. A distinct event type would only make sense if there were common cases where you wanted to react to the two different events differently, and having them be the same made that difficult or cumbersome. Of course that doesn't preclude us from adding some configuration to the engine that lets us control when the flow event is produced, however as Andrew points out that isn't very compatible with the whole event bus model of engine use. I think it's worth investigating exactly why you're getting so many of them since I wouldn't expect there to be one per message unless you are somehow forcing only one message to go through the engine at a time. It would also probably be good to look at exactly why ignored events are so expensive, perhaps isolate/measure how expensive they actually are as I'm not entirely convinced there aren't other factors in your case. I'm also curious how your usage of the engine compares to mick's examples as he seems to be getting pretty decent numbers now. --Rafael
Re: Observations on the performance of the proton event model
On Wed, Dec 10, 2014 at 5:10 PM, Andrew Stitcher astitc...@redhat.com wrote: An off the cuff thought - worth what you paid for it - On Wed, 2014-12-10 at 11:23 -0500, Ken Giusti wrote: ... Furthermore, I think the event API would benefit from a way to 'opt-in' to specific events. For example, for qpidd we would not want to receive PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events. As far as I can see it should be hardly more expensive to do this filtering in the listening code than in the proton code itself. So I'm a little suspicious that the large number of ignored events is causing the problem. If they are causing a lot of app work to happen then perhaps they can be filtered out closer to reception. +1 I think it would be difficult to resolve filtering the events out in proton, with allowing multiple independent and ignorant of each other layers that can coexist. We could probably get something to work, i.e. essentially build a tiny little pub/sub model for events so we only produce them if someone is listening for them, but there's no guarantee that would be faster than just ignoring the events you don't care about. --Rafael