Re: Observations on the performance of the proton event model

2014-12-12 Thread Rafael Schloming
On Thu, Dec 11, 2014 at 6:34 PM, Ken Giusti kgiu...@redhat.com wrote:



 - Original Message -
  From: Rafael Schloming r...@alum.mit.edu
  To: proton@qpid.apache.org
  Sent: Wednesday, December 10, 2014 5:41:20 PM
  Subject: Re: Observations on the performance of the proton event model
 
  On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com wrote:
 
   Hi,
  
   I've been working on a simple patch to qpidd that ports the AMQP 1.0
   module to the new event interface provided by proton 0.8.  See
   https://issues.apache.org/jira/browse/QPID-6255 for the patch.
  
   With the above patch, I've noticed a pretty consistent small drop in
   overall qpidd performance as gauged by qpid-cpp-benchmark (see
 comments in
   above jira).  Turns out the event loop is doing a lot more work when
   compared to the polled approach for the same message load.
  
   Digging around a bit, there are a couple of issues that result in qpidd
   doing a lot of unnecessary work:
  
   1) the PN_TRANSPORT event isn't needed by this driver - pending output
 is
   manually checked at a later point.  In my test, approximately 25% of
 the
   total events are PN_TRANSPORT events, which the driver simply discards
  
   2) A more serious issue - I get a PN_LINK_FLOW event for _every_
 message
   transfer!  Turns out that PN_LINK_FLOW is being issued for two
 different
   events (IMHO): when a flow frame is received (yay) and each time a
 transfer
   is done and credit is consumed (ugh).
  
   Item #2 seems like a bug - these two events have different semantic
   meaning and would likely result in different processing paths in the
 driver
   (in the case of qpidd, the credit consumed case would be ignored).
  
 
  It's not a bug, it was added intentionally since there are circumstances
  where you get stalls if you don't have it. Each time a message is sent
 you
  are actually updating the credit values on the link, and so if you don't
  generate the event at that point, you need some fairly complex/subtle
 code
  in your handlers to compensate. (You may actually have a bug yourself now
  depending on how you've integrated the engine.)
 

 Yes, I can see your point - processing a series of credit added/credit
 removed events doesn't really make sense.  In the end you're really just
 concerned with the credit level at the point where you're servicing the
 link.


  The event shouldn't be generated for every message though, just once for
  each batch of messages. In other words if I stuff a bunch of messages
 into
  the engine at the same time, there should be only one flow event produced
  once the engine has finished writing messages to the wire. If you're
  actually observing one flow per message then either there is a bug in the
  logic that elides duplicate events, or you're stuffing one message into
 the
  engine at a time.
 

 Hmmm... AFAIKT, it would seem that the existing broker code iterates over
 each consuming link, and issues one message per link.  Total conjecture
 here, but I thought that was desired, esp. in the case of multiple
 consumers on a shared queue.  Gordon could possibly shed some light on
 this.  In any case, if an application is load balancing messages across
 several outgoing links, won't each send result in a FLOW event?

 IOW - certain legitimate messaging patterns will result in that 1 Flow for
 every send.  Or am I missing something?


I think you're right, in that particular case the automatic duplicate
detection that the collector does wouldn't eliminate the extraneous flow
events. I believe we could fix this though by adjusting the logic that
produces those flow events. I.e. first write all outstanding messages onto
the wire and then produce flow events for only those links that were
touched in that process. That strategy should be robust to the usage
pattern you describe.

--Rafael


Re: Observations on the performance of the proton event model

2014-12-12 Thread Ken Giusti


- Original Message -
 From: Rafael Schloming r...@alum.mit.edu
 To: proton@qpid.apache.org
 Sent: Friday, December 12, 2014 10:21:04 AM
 Subject: Re: Observations on the performance of the proton event model
 
 On Thu, Dec 11, 2014 at 6:34 PM, Ken Giusti kgiu...@redhat.com wrote:
 
 
 
  - Original Message -
   From: Rafael Schloming r...@alum.mit.edu
   To: proton@qpid.apache.org
   Sent: Wednesday, December 10, 2014 5:41:20 PM
   Subject: Re: Observations on the performance of the proton event model
  
   On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com wrote:
  
Hi,
   
I've been working on a simple patch to qpidd that ports the AMQP 1.0
module to the new event interface provided by proton 0.8.  See
https://issues.apache.org/jira/browse/QPID-6255 for the patch.
   
With the above patch, I've noticed a pretty consistent small drop in
overall qpidd performance as gauged by qpid-cpp-benchmark (see
  comments in
above jira).  Turns out the event loop is doing a lot more work when
compared to the polled approach for the same message load.
   
Digging around a bit, there are a couple of issues that result in qpidd
doing a lot of unnecessary work:
   
1) the PN_TRANSPORT event isn't needed by this driver - pending output
  is
manually checked at a later point.  In my test, approximately 25% of
  the
total events are PN_TRANSPORT events, which the driver simply discards
   
2) A more serious issue - I get a PN_LINK_FLOW event for _every_
  message
transfer!  Turns out that PN_LINK_FLOW is being issued for two
  different
events (IMHO): when a flow frame is received (yay) and each time a
  transfer
is done and credit is consumed (ugh).
   
Item #2 seems like a bug - these two events have different semantic
meaning and would likely result in different processing paths in the
  driver
(in the case of qpidd, the credit consumed case would be ignored).
   
  
   It's not a bug, it was added intentionally since there are circumstances
   where you get stalls if you don't have it. Each time a message is sent
  you
   are actually updating the credit values on the link, and so if you don't
   generate the event at that point, you need some fairly complex/subtle
  code
   in your handlers to compensate. (You may actually have a bug yourself now
   depending on how you've integrated the engine.)
  
 
  Yes, I can see your point - processing a series of credit added/credit
  removed events doesn't really make sense.  In the end you're really just
  concerned with the credit level at the point where you're servicing the
  link.
 
 
   The event shouldn't be generated for every message though, just once for
   each batch of messages. In other words if I stuff a bunch of messages
  into
   the engine at the same time, there should be only one flow event produced
   once the engine has finished writing messages to the wire. If you're
   actually observing one flow per message then either there is a bug in the
   logic that elides duplicate events, or you're stuffing one message into
  the
   engine at a time.
  
 
  Hmmm... AFAIKT, it would seem that the existing broker code iterates over
  each consuming link, and issues one message per link.  Total conjecture
  here, but I thought that was desired, esp. in the case of multiple
  consumers on a shared queue.  Gordon could possibly shed some light on
  this.  In any case, if an application is load balancing messages across
  several outgoing links, won't each send result in a FLOW event?
 
  IOW - certain legitimate messaging patterns will result in that 1 Flow for
  every send.  Or am I missing something?
 
 
 I think you're right, in that particular case the automatic duplicate
 detection that the collector does wouldn't eliminate the extraneous flow
 events. I believe we could fix this though by adjusting the logic that
 produces those flow events. I.e. first write all outstanding messages onto
 the wire and then produce flow events for only those links that were
 touched in that process. That strategy should be robust to the usage
 pattern you describe.
 

+1 - that would be ideal.  I'll open a JIRA for that.  Since the original 
proton work loop in qpidd didn't check for flow changes (it manually checks 
each sender link when it wants to generate output) I'll have the patch simply 
discard the flow events for now and use the existing link handling code.  It 
scales back the patch a bit, but leaves door open for further optimization.

One last question: you had mentioned that ignoring the flow events generated 
when credit is consumed can lead to a deadlock.  Can you give a little more 
detail on how that happens?  I want to make sure that qpidd won't trip up on 
that.

Thanks,


 --Rafael
 

-- 
-K


Re: Observations on the performance of the proton event model

2014-12-12 Thread Rafael Schloming
On Fri, Dec 12, 2014 at 10:56 AM, Ken Giusti kgiu...@redhat.com wrote:



 - Original Message -
  From: Rafael Schloming r...@alum.mit.edu
  To: proton@qpid.apache.org
  Sent: Friday, December 12, 2014 10:21:04 AM
  Subject: Re: Observations on the performance of the proton event model
 
  On Thu, Dec 11, 2014 at 6:34 PM, Ken Giusti kgiu...@redhat.com wrote:
  
  
  
   - Original Message -
From: Rafael Schloming r...@alum.mit.edu
To: proton@qpid.apache.org
Sent: Wednesday, December 10, 2014 5:41:20 PM
Subject: Re: Observations on the performance of the proton event
 model
   
On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com
 wrote:
   
 Hi,

 I've been working on a simple patch to qpidd that ports the AMQP
 1.0
 module to the new event interface provided by proton 0.8.  See
 https://issues.apache.org/jira/browse/QPID-6255 for the patch.

 With the above patch, I've noticed a pretty consistent small drop
 in
 overall qpidd performance as gauged by qpid-cpp-benchmark (see
   comments in
 above jira).  Turns out the event loop is doing a lot more work
 when
 compared to the polled approach for the same message load.

 Digging around a bit, there are a couple of issues that result in
 qpidd
 doing a lot of unnecessary work:

 1) the PN_TRANSPORT event isn't needed by this driver - pending
 output
   is
 manually checked at a later point.  In my test, approximately 25%
 of
   the
 total events are PN_TRANSPORT events, which the driver simply
 discards

 2) A more serious issue - I get a PN_LINK_FLOW event for _every_
   message
 transfer!  Turns out that PN_LINK_FLOW is being issued for two
   different
 events (IMHO): when a flow frame is received (yay) and each time a
   transfer
 is done and credit is consumed (ugh).

 Item #2 seems like a bug - these two events have different semantic
 meaning and would likely result in different processing paths in
 the
   driver
 (in the case of qpidd, the credit consumed case would be ignored).

   
It's not a bug, it was added intentionally since there are
 circumstances
where you get stalls if you don't have it. Each time a message is
 sent
   you
are actually updating the credit values on the link, and so if you
 don't
generate the event at that point, you need some fairly complex/subtle
   code
in your handlers to compensate. (You may actually have a bug
 yourself now
depending on how you've integrated the engine.)
   
  
   Yes, I can see your point - processing a series of credit
 added/credit
   removed events doesn't really make sense.  In the end you're really
 just
   concerned with the credit level at the point where you're servicing the
   link.
  
  
The event shouldn't be generated for every message though, just once
 for
each batch of messages. In other words if I stuff a bunch of messages
   into
the engine at the same time, there should be only one flow event
 produced
once the engine has finished writing messages to the wire. If you're
actually observing one flow per message then either there is a bug
 in the
logic that elides duplicate events, or you're stuffing one message
 into
   the
engine at a time.
   
  
   Hmmm... AFAIKT, it would seem that the existing broker code iterates
 over
   each consuming link, and issues one message per link.  Total conjecture
   here, but I thought that was desired, esp. in the case of multiple
   consumers on a shared queue.  Gordon could possibly shed some light on
   this.  In any case, if an application is load balancing messages across
   several outgoing links, won't each send result in a FLOW event?
  
   IOW - certain legitimate messaging patterns will result in that 1 Flow
 for
   every send.  Or am I missing something?
  
  
  I think you're right, in that particular case the automatic duplicate
  detection that the collector does wouldn't eliminate the extraneous flow
  events. I believe we could fix this though by adjusting the logic that
  produces those flow events. I.e. first write all outstanding messages
 onto
  the wire and then produce flow events for only those links that were
  touched in that process. That strategy should be robust to the usage
  pattern you describe.
 

 +1 - that would be ideal.  I'll open a JIRA for that.  Since the original
 proton work loop in qpidd didn't check for flow changes (it manually checks
 each sender link when it wants to generate output) I'll have the patch
 simply discard the flow events for now and use the existing link handling
 code.  It scales back the patch a bit, but leaves door open for further
 optimization.

 One last question: you had mentioned that ignoring the flow events
 generated when credit is consumed can lead to a deadlock.  Can you give a
 little more detail on how that happens?  I want to make sure that qpidd
 won't trip up on that.

Re: Observations on the performance of the proton event model

2014-12-11 Thread Ken Giusti


- Original Message -
 From: Rafael Schloming r...@alum.mit.edu
 To: proton@qpid.apache.org
 Sent: Wednesday, December 10, 2014 5:41:20 PM
 Subject: Re: Observations on the performance of the proton event model
 
 On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com wrote:
 
  Hi,
 
  I've been working on a simple patch to qpidd that ports the AMQP 1.0
  module to the new event interface provided by proton 0.8.  See
  https://issues.apache.org/jira/browse/QPID-6255 for the patch.
 
  With the above patch, I've noticed a pretty consistent small drop in
  overall qpidd performance as gauged by qpid-cpp-benchmark (see comments in
  above jira).  Turns out the event loop is doing a lot more work when
  compared to the polled approach for the same message load.
 
  Digging around a bit, there are a couple of issues that result in qpidd
  doing a lot of unnecessary work:
 
  1) the PN_TRANSPORT event isn't needed by this driver - pending output is
  manually checked at a later point.  In my test, approximately 25% of the
  total events are PN_TRANSPORT events, which the driver simply discards
 
  2) A more serious issue - I get a PN_LINK_FLOW event for _every_ message
  transfer!  Turns out that PN_LINK_FLOW is being issued for two different
  events (IMHO): when a flow frame is received (yay) and each time a transfer
  is done and credit is consumed (ugh).
 
  Item #2 seems like a bug - these two events have different semantic
  meaning and would likely result in different processing paths in the driver
  (in the case of qpidd, the credit consumed case would be ignored).
 
 
 It's not a bug, it was added intentionally since there are circumstances
 where you get stalls if you don't have it. Each time a message is sent you
 are actually updating the credit values on the link, and so if you don't
 generate the event at that point, you need some fairly complex/subtle code
 in your handlers to compensate. (You may actually have a bug yourself now
 depending on how you've integrated the engine.)
 

Yes, I can see your point - processing a series of credit added/credit 
removed events doesn't really make sense.  In the end you're really just 
concerned with the credit level at the point where you're servicing the link.


 The event shouldn't be generated for every message though, just once for
 each batch of messages. In other words if I stuff a bunch of messages into
 the engine at the same time, there should be only one flow event produced
 once the engine has finished writing messages to the wire. If you're
 actually observing one flow per message then either there is a bug in the
 logic that elides duplicate events, or you're stuffing one message into the
 engine at a time.
 

Hmmm... AFAIKT, it would seem that the existing broker code iterates over each 
consuming link, and issues one message per link.  Total conjecture here, but I 
thought that was desired, esp. in the case of multiple consumers on a shared 
queue.  Gordon could possibly shed some light on this.  In any case, if an 
application is load balancing messages across several outgoing links, won't 
each send result in a FLOW event?

IOW - certain legitimate messaging patterns will result in that 1 Flow for 
every send.  Or am I missing something?

 
 
  I propose we fix #2 by breaking up that event into two separate events,
  something like PN_LINK_REMOTE_FLOW for when flow is granted, and
  PN_LINK_LOCAL_FLOW when credit is consumed (not in love with these names
  btw, they seem consistent with the endpoint states)
 
  Furthermore, I think the event API would benefit from a way to 'opt-in' to
  specific events.  For example, for qpidd we would not want to receive
  PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events.
 
  I've hacked my proton library to avoid generating PN_TRANSPORT and
  PN_LINK_FLOW on local credit consumption and that results in performance
  parity with the existing polled approach.
 
  Does this make sense?  Other ideas?
 
 
 I don't think it makes sense to introduce a new event type. The two events
 mean very much the same thing, i.e. credit levels have changed. A distinct
 event type would only make sense if there were common cases where you
 wanted to react to the two different events differently, and having them be
 the same made that difficult or cumbersome. Of course that doesn't preclude
 us from adding some configuration to the engine that lets us control when
 the flow event is produced, however as Andrew points out that isn't very
 compatible with the whole event bus model of engine use.
 
 I think it's worth investigating exactly why you're getting so many of them
 since I wouldn't expect there to be one per message unless you are somehow
 forcing only one message to go through the engine at a time. It would also
 probably be good to look at exactly why ignored events are so expensive,
 perhaps isolate/measure how expensive they actually are as I'm not entirely
 convinced there aren't other factors in your 

Observations on the performance of the proton event model

2014-12-10 Thread Ken Giusti
Hi,

I've been working on a simple patch to qpidd that ports the AMQP 1.0 module to 
the new event interface provided by proton 0.8.  See 
https://issues.apache.org/jira/browse/QPID-6255 for the patch.

With the above patch, I've noticed a pretty consistent small drop in overall 
qpidd performance as gauged by qpid-cpp-benchmark (see comments in above jira). 
 Turns out the event loop is doing a lot more work when compared to the polled 
approach for the same message load.

Digging around a bit, there are a couple of issues that result in qpidd doing a 
lot of unnecessary work:

1) the PN_TRANSPORT event isn't needed by this driver - pending output is 
manually checked at a later point.  In my test, approximately 25% of the total 
events are PN_TRANSPORT events, which the driver simply discards

2) A more serious issue - I get a PN_LINK_FLOW event for _every_ message 
transfer!  Turns out that PN_LINK_FLOW is being issued for two different events 
(IMHO): when a flow frame is received (yay) and each time a transfer is done 
and credit is consumed (ugh).

Item #2 seems like a bug - these two events have different semantic meaning and 
would likely result in different processing paths in the driver (in the case of 
qpidd, the credit consumed case would be ignored).

I propose we fix #2 by breaking up that event into two separate events, 
something like PN_LINK_REMOTE_FLOW for when flow is granted, and 
PN_LINK_LOCAL_FLOW when credit is consumed (not in love with these names btw, 
they seem consistent with the endpoint states)

Furthermore, I think the event API would benefit from a way to 'opt-in' to 
specific events.  For example, for qpidd we would not want to receive 
PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events.

I've hacked my proton library to avoid generating PN_TRANSPORT and PN_LINK_FLOW 
on local credit consumption and that results in performance parity with the 
existing polled approach.

Does this make sense?  Other ideas?


-- 
-K


Re: Observations on the performance of the proton event model

2014-12-10 Thread Andrew Stitcher
An off the cuff thought - worth what you paid for it -

On Wed, 2014-12-10 at 11:23 -0500, Ken Giusti wrote:
 ...
 Furthermore, I think the event API would benefit from a way to 'opt-in' to 
 specific events.  For example, for qpidd we would not want to receive 
 PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events.
 

As far as I can see it should be hardly more expensive to do this
filtering in the listening code than in the proton code itself. So I'm a
little suspicious that the large number of ignored events is causing the
problem. If they are causing a lot of app work to happen then perhaps
they can be filtered out closer to reception.

I think it would be difficult to resolve filtering the events out in
proton, with allowing multiple independent and ignorant of each other
layers that can coexist.

Andrew




Re: Observations on the performance of the proton event model

2014-12-10 Thread Rafael Schloming
On Wed, Dec 10, 2014 at 11:23 AM, Ken Giusti kgiu...@redhat.com wrote:

 Hi,

 I've been working on a simple patch to qpidd that ports the AMQP 1.0
 module to the new event interface provided by proton 0.8.  See
 https://issues.apache.org/jira/browse/QPID-6255 for the patch.

 With the above patch, I've noticed a pretty consistent small drop in
 overall qpidd performance as gauged by qpid-cpp-benchmark (see comments in
 above jira).  Turns out the event loop is doing a lot more work when
 compared to the polled approach for the same message load.

 Digging around a bit, there are a couple of issues that result in qpidd
 doing a lot of unnecessary work:

 1) the PN_TRANSPORT event isn't needed by this driver - pending output is
 manually checked at a later point.  In my test, approximately 25% of the
 total events are PN_TRANSPORT events, which the driver simply discards

 2) A more serious issue - I get a PN_LINK_FLOW event for _every_ message
 transfer!  Turns out that PN_LINK_FLOW is being issued for two different
 events (IMHO): when a flow frame is received (yay) and each time a transfer
 is done and credit is consumed (ugh).

 Item #2 seems like a bug - these two events have different semantic
 meaning and would likely result in different processing paths in the driver
 (in the case of qpidd, the credit consumed case would be ignored).


It's not a bug, it was added intentionally since there are circumstances
where you get stalls if you don't have it. Each time a message is sent you
are actually updating the credit values on the link, and so if you don't
generate the event at that point, you need some fairly complex/subtle code
in your handlers to compensate. (You may actually have a bug yourself now
depending on how you've integrated the engine.)

The event shouldn't be generated for every message though, just once for
each batch of messages. In other words if I stuff a bunch of messages into
the engine at the same time, there should be only one flow event produced
once the engine has finished writing messages to the wire. If you're
actually observing one flow per message then either there is a bug in the
logic that elides duplicate events, or you're stuffing one message into the
engine at a time.



 I propose we fix #2 by breaking up that event into two separate events,
 something like PN_LINK_REMOTE_FLOW for when flow is granted, and
 PN_LINK_LOCAL_FLOW when credit is consumed (not in love with these names
 btw, they seem consistent with the endpoint states)

 Furthermore, I think the event API would benefit from a way to 'opt-in' to
 specific events.  For example, for qpidd we would not want to receive
 PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events.

 I've hacked my proton library to avoid generating PN_TRANSPORT and
 PN_LINK_FLOW on local credit consumption and that results in performance
 parity with the existing polled approach.

 Does this make sense?  Other ideas?


I don't think it makes sense to introduce a new event type. The two events
mean very much the same thing, i.e. credit levels have changed. A distinct
event type would only make sense if there were common cases where you
wanted to react to the two different events differently, and having them be
the same made that difficult or cumbersome. Of course that doesn't preclude
us from adding some configuration to the engine that lets us control when
the flow event is produced, however as Andrew points out that isn't very
compatible with the whole event bus model of engine use.

I think it's worth investigating exactly why you're getting so many of them
since I wouldn't expect there to be one per message unless you are somehow
forcing only one message to go through the engine at a time. It would also
probably be good to look at exactly why ignored events are so expensive,
perhaps isolate/measure how expensive they actually are as I'm not entirely
convinced there aren't other factors in your case.

I'm also curious how your usage of the engine compares to mick's examples
as he seems to be getting pretty decent numbers now.

--Rafael


Re: Observations on the performance of the proton event model

2014-12-10 Thread Rafael Schloming
On Wed, Dec 10, 2014 at 5:10 PM, Andrew Stitcher astitc...@redhat.com
wrote:

 An off the cuff thought - worth what you paid for it -

 On Wed, 2014-12-10 at 11:23 -0500, Ken Giusti wrote:
  ...
  Furthermore, I think the event API would benefit from a way to 'opt-in'
 to specific events.  For example, for qpidd we would not want to receive
 PN_TRANSPORT nor PN_LINK_LOCAL_FLOW events.
 

 As far as I can see it should be hardly more expensive to do this
 filtering in the listening code than in the proton code itself. So I'm a
 little suspicious that the large number of ignored events is causing the
 problem. If they are causing a lot of app work to happen then perhaps
 they can be filtered out closer to reception.


+1


 I think it would be difficult to resolve filtering the events out in
 proton, with allowing multiple independent and ignorant of each other
 layers that can coexist.


We could probably get something to work, i.e. essentially build a tiny
little pub/sub model for events so we only produce them if someone is
listening for them, but there's no guarantee that would be faster than just
ignoring the events you don't care about.

--Rafael