Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2014-01-03 Thread Gordon Sim

On 01/02/2014 10:46 PM, Herndon, John Luke wrote:



On 1/2/14, 11:36 AM, Gordon Sim g...@redhat.com wrote:


On 12/20/2013 09:26 PM, Herndon, John Luke wrote:


On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote:


On 12/20/2013 05:27 PM, Herndon, John Luke wrote:


Other protocols may support bulk consumption. My one concern with
this approach is error handling. Currently the executors treat
each notification individually. So let¹s say the broker hands
100 messages at a time. When client is done processing the
messages, the broker needs to know if message 25 had an error or
not. We would somehow need to communicate back to the broker
which messages failed. I think this may take some refactoring of
executors/dispatchers. What do you think?

[...]

(2) What would you want the broker to do with the failed messages?
What sort of things might fail? Is it related to the message
content itself? Or is it failures suspected to be of a temporal
nature?


There will be situations where the message can¹t be parsed, and those
messages can¹t just be thrown away. My current thought is that
ceilometer could provide some sort of mechanism for sending messages
that are invalid to an external data store (like a file, or a
different topic on the amqp server) where a living, breathing human
can look at them and try to parse out any meaningful information.


Right, in those cases simply requeueing probably is not the right thing
and you really want it dead-lettered in some way. I guess the first
question is whether that is part of the notification systems function,
or if it is done by the application itself (e.g. by storing it or
republishing it). If it is the latter you may not need any explicit
negative acknowledgement.


Exactly, I¹m thinking this is something we¹d build into ceilometer and not
oslo, since ceilometer is where the event parsing knowledge lives. From an
oslo point of view, the message would be 'acked¹.




Other errors might be ³database not available², in which case
re-queing the message is probably the right way to go.


That does mean however that the backlog of messages starts to grow on
the broker, so some scheme for dealing with this if the database outage
goes on for a bit is probably important. It also means that the messages
will keep being retried without any 'backoff' waiting for the database
to be restored which could increase the load.


This is a problem we already have :(


Agreed, it is a property of reliable (i.e. acknowledged) transfer from 
the broker, rather than batching. And of course, some degree of 
buffering here is exactly what message queues are supposed to provide. 
The point is simply to provide some way of configuring things so that 
this can be bounded, or prevented from taking down the entire broker. 
(And perhaps some way of altering the unfortunate someone!)



https://github.com/openstack/ceilometer/blob/master/ceilometer/notification
.py#L156-L158
Since notifications cannot be lost, overflow needs to be detected and the
messages need to be saved. I¹m thinking the database being down is a rare
occurrence that will be worthy of waking someone up in the middle of the
night. One possible solution: flip the collector into an emergency mode
and save notifications to disc until the issue is resolved. Once the db is
up and running, the collector inserts all of these saved messages (as one
big batch!). Thoughts?

I¹m not sure I understand what you are saying about retrying without a
backoff. Can you explain?


I mean that if the messages are explicitly requeued and the original 
subscription is still active, they will be immediately redelivered and 
will thus keep cycling from broker to client, back to broker, back to 
client etc etc until the database is available again.


Pulling messages off continually like this without actually being able 
to dequeue them may reduce the brokers effectiveness at e.g. paging out, 
and in any event involves some unnecessary load on top of the expanding 
queue.


It might be better, just as an example, to abort the connection to the 
broker (implicitly requeueing all unacked messages), and only reconnect 
when the database becomes available (and that can be tried after 1 
second, then 2, then 4 etc up to some maximum retry interval).


Or another alternative would be to leave the connection to the broker, 
but by not requeing or acking ensure that once the prefetch has been 
reached, no further messages will be delivered. Then locally, on the 
client, retry the processing for the prefetched messages until the 
database is back again.


The basic point I'm trying to make is that it seems to me there is 
little value in simply handing the messages back to the broker for 
immediate redelivery back to the client. It delays the retry certainly, 
but at unnecessary expense.


More generally I wonder whether an explicit negative acknowledgement is 
actually needed in the notify API at all. If it isn't, that may simplify 
things for 

Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2014-01-02 Thread Herndon, John Luke


On 12/20/13, 11:57 PM, Jay Pipes jaypi...@gmail.com wrote:

On 12/20/2013 04:43 PM, Julien Danjou wrote:
 On Fri, Dec 20 2013, Herndon, John Luke wrote:

 I think there is probably a tolerance for duplicates but you¹re right,
 missing a notification is unacceptable. Can anyone weigh in on how big
of a
 deal duplicates are for meters? Duplicates aren¹t really unique to the
 batching approach, though. If a consumer dies after it¹s inserted a
message
 into the data store but before the message is acked, the message will
be
 requeued and handled by another consumer resulting in a duplicate.

 Duplicates can be a problem for metering, as if you see twice the same
 event it's possible you will think it happened twice.

 As for event storage, it won't be a problem if you use a good storage
 driver that can have unique constraint; you'll just drop it and log the
 fact that this should not have happened, or something like that.

The above brings up a point related to the implementation of the
existing SQL driver code that will need to be re-thought with the
introduction of batch notification processing.

Currently, the SQL driver's record_events() method [1] is written in a
way that forces a new INSERT transaction for every record supplied to
the method. If the record_events() method is called with 10K events,
then 10K BEGIN; INSERT ...; COMMIT; transactions are executed against
the server.

Suffice to say, this isn't efficient. :)

Ostensibly, from looking at the code, the reason that this approach was
taken was to allow for the collection of duplicate event IDs, and return
those duplicate event IDs to the caller.

Because of this code:

 for event_model in event_models:
 event = None
 try:
 with session.begin():
 event = self._record_event(session, event_model)
 except dbexc.DBDuplicateEntry:
 problem_events.append((api_models.Event.DUPLICATE,
event_model))
The session object will be commit()'d after the session.begin() context
manager exits, which will cause the aforementioned BEGIN; INSERT;
COMMIT; transaction to be executed against the server for each event
record.

If we want to actually take advantage of the performance benefits of
batching notification messages, the above code will need to be rewritten
so that a single transaction is executed against the database for the
entire batch of events.

Yeah, this makes sense. Working on this driver is definitely on the to-do
list (we also need to cache the event an trait types so several queries to
the db are not incurred for each event). In the above code, we still have
to deal with the dbduplicate error, but it gets much harder. The options I
can think of are: 1) comb through the batch of events, remove the
duplicate and try again or 2) allow the duplicate to be inserted and deal
with it later. 

-john


Best,
-jay

[1] 
https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/imp
l_sqlalchemy.py#L932


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2014-01-02 Thread Gordon Sim

On 12/20/2013 09:26 PM, Herndon, John Luke wrote:


On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote:


On 12/20/2013 05:27 PM, Herndon, John Luke wrote:


Other protocols may support bulk consumption. My one concern with
this approach is error handling. Currently the executors treat
each notification individually. So let’s say the broker hands
100 messages at a time. When client is done processing the
messages, the broker needs to know if message 25 had an error or
not. We would somehow need to communicate back to the broker
which messages failed. I think this may take some refactoring of
executors/dispatchers. What do you think?

[...]

(2) What would you want the broker to do with the failed messages?
What sort of things might fail? Is it related to the message
content itself? Or is it failures suspected to be of a temporal
nature?



There will be situations where the message can’t be parsed, and those
messages can’t just be thrown away. My current thought is that
ceilometer could provide some sort of mechanism for sending messages
that are invalid to an external data store (like a file, or a
different topic on the amqp server) where a living, breathing human
can look at them and try to parse out any meaningful information.


Right, in those cases simply requeueing probably is not the right thing 
and you really want it dead-lettered in some way. I guess the first 
question is whether that is part of the notification systems function, 
or if it is done by the application itself (e.g. by storing it or 
republishing it). If it is the latter you may not need any explicit 
negative acknowledgement.



Other errors might be “database not available”, in which case
re-queing the message is probably the right way to go.


That does mean however that the backlog of messages starts to grow on 
the broker, so some scheme for dealing with this if the database outage 
goes on for a bit is probably important. It also means that the messages 
will keep being retried without any 'backoff' waiting for the database 
to be restored which could increase the load.





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2014-01-02 Thread Herndon, John Luke


On 1/2/14, 11:36 AM, Gordon Sim g...@redhat.com wrote:

On 12/20/2013 09:26 PM, Herndon, John Luke wrote:

 On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote:

 On 12/20/2013 05:27 PM, Herndon, John Luke wrote:

 Other protocols may support bulk consumption. My one concern with
 this approach is error handling. Currently the executors treat
 each notification individually. So let¹s say the broker hands
 100 messages at a time. When client is done processing the
 messages, the broker needs to know if message 25 had an error or
 not. We would somehow need to communicate back to the broker
 which messages failed. I think this may take some refactoring of
 executors/dispatchers. What do you think?
[...]
 (2) What would you want the broker to do with the failed messages?
 What sort of things might fail? Is it related to the message
 content itself? Or is it failures suspected to be of a temporal
 nature?
 
 There will be situations where the message can¹t be parsed, and those
 messages can¹t just be thrown away. My current thought is that
 ceilometer could provide some sort of mechanism for sending messages
 that are invalid to an external data store (like a file, or a
 different topic on the amqp server) where a living, breathing human
 can look at them and try to parse out any meaningful information.

Right, in those cases simply requeueing probably is not the right thing
and you really want it dead-lettered in some way. I guess the first
question is whether that is part of the notification systems function,
or if it is done by the application itself (e.g. by storing it or
republishing it). If it is the latter you may not need any explicit
negative acknowledgement.

Exactly, I¹m thinking this is something we¹d build into ceilometer and not
oslo, since ceilometer is where the event parsing knowledge lives. From an
oslo point of view, the message would be 'acked¹.


 Other errors might be ³database not available², in which case
 re-queing the message is probably the right way to go.

That does mean however that the backlog of messages starts to grow on
the broker, so some scheme for dealing with this if the database outage
goes on for a bit is probably important. It also means that the messages
 
will keep being retried without any 'backoff' waiting for the database
to be restored which could increase the load.

This is a problem we already have :(
https://github.com/openstack/ceilometer/blob/master/ceilometer/notification
.py#L156-L158
Since notifications cannot be lost, overflow needs to be detected and the
messages need to be saved. I¹m thinking the database being down is a rare
occurrence that will be worthy of waking someone up in the middle of the
night. One possible solution: flip the collector into an emergency mode
and save notifications to disc until the issue is resolved. Once the db is
up and running, the collector inserts all of these saved messages (as one
big batch!). Thoughts?

I¹m not sure I understand what you are saying about retrying without a
backoff. Can you explain?

-john




___
OpenStack-dev mailing l
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-21 Thread Jay Pipes

On 12/21/2013 04:51 AM, Boris Pavlovic wrote:

Jay,

The session object will be commit()'d after the session.begin() context
manager exits, which will cause the aforementioned BEGIN; INSERT;
COMMIT; transaction to be executed against the server for each event record.

It is a just half of problem. We should use SQL Bulk Inserts as well.
And we should mention that declarative_base won't make this work for us
transparent. (Even if we make all in one transactions there will be N
Inserts).


Well, the performance benefit will show up if there is a single 
transaction with multiple INSERT statements in it. The slowdown in 
performance is due to the multiple COMMITs, which each typically cause 
an fsync() (or fdatasync()), which is the slow part of the operation. 
Having a transaction containing thousands of INSERT statements with one 
COMMIT is much better performing since there is only a single call to 
fsync() the log records.


Not quite sure what you mean about the declarative_base not working for 
this. Would you mind elaborating a bit more?


Thanks!
-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Nadya Privalova
Hi John,

As for me your ideas look very interesting. As I understood notification
messages will be kept in MQ for some time (during batch-basket is being
filled), right? I'm concerned about the additional load that will be on MQ
(Rabbit).

Thanks,
Nadya


On Fri, Dec 20, 2013 at 3:31 AM, Herndon, John Luke john.hern...@hp.comwrote:

 Hi Folks,

 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver.

 I¹d like to get feedback from the community about this feature, and how we
 are planning to implement it. Here is what I’m currently thinking:

 1) This seems to fit well into oslo.messaging - batching may be a feature
 that other projects will find useful. After reviewing the changes that
 sileht has been working on in oslo.messaging, I think the right way to
 start off is to create a new executor that builds up a batch of
 notifications, and sends the batch to the dispatcher. We’d also add a
 timeout, so if a certain amount of time passes and the batch isn’t filled
 up, the notifications will be dispatched anyway. I’ve started a
 blueprint for this change and am filling in the details as I go along [1].

 2) In ceilometer, initialize the notification listener with the batch
 executor instead of the eventlet executor (this should probably be
 configurable)[2]. We can then send the entire batch of notifications to
 the storage driver to be processed as events, while maintaining the
 current method for converting notifications into samples.

 3) Error handling becomes more difficult. The executor needs to know if
 any of the notifications should be requeued. I think the right way to
 solve this is to return a list of notifications to requeue from the
 handler. Any better ideas?

 Is this the right approach to take? I¹m not an oslo.messaging expert, so
 if there is a proper way to implement this change, I¹m all ears!

 Thanks, happy holidays!
 -john

 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
 1:
 https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages
 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke
Hi Nadya,

Yep, that’s right, the notifications stick around on the server until they
are acknowledged so there is extra overhead involved. I only have experience
with rabbitmq, so I can’t speak for other transports, but we have used this
strategy internally for other purposes, and have reached  10k
messages/second on a single consumer using batch message consumption (i.e.,
consume N messages, process them, then ack all N at once). We’ve found that
being able to acknowledge the entire batch of messages at a time leads to a
huge performance increase. This is another motivating factor for moving
towards batches. But to your point, making this configurable is the right
way to go just in case other transports don’t react as well.

Thanks,
-john


From:  Nadya Privalova nprival...@mirantis.com
Reply-To:  OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Date:  Fri, 20 Dec 2013 15:25:55 +0400
To:  OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Subject:  Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in
Batches

Hi John,

As for me your ideas look very interesting. As I understood notification
messages will be kept in MQ for some time (during batch-basket is being
filled), right? I'm concerned about the additional load that will be on MQ
(Rabbit). 

Thanks,
Nadya


On Fri, Dec 20, 2013 at 3:31 AM, Herndon, John Luke john.hern...@hp.com
wrote:
 Hi Folks,
 
 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver.
 
 I¹d like to get feedback from the community about this feature, and how we
 are planning to implement it. Here is what I’m currently thinking:
 
 1) This seems to fit well into oslo.messaging - batching may be a feature
 that other projects will find useful. After reviewing the changes that
 sileht has been working on in oslo.messaging, I think the right way to
 start off is to create a new executor that builds up a batch of
 notifications, and sends the batch to the dispatcher. We’d also add a
 timeout, so if a certain amount of time passes and the batch isn’t filled
 up, the notifications will be dispatched anyway. I’ve started a
 blueprint for this change and am filling in the details as I go along [1].
 
 2) In ceilometer, initialize the notification listener with the batch
 executor instead of the eventlet executor (this should probably be
 configurable)[2]. We can then send the entire batch of notifications to
 the storage driver to be processed as events, while maintaining the
 current method for converting notifications into samples.
 
 3) Error handling becomes more difficult. The executor needs to know if
 any of the notifications should be requeued. I think the right way to
 solve this is to return a list of notifications to requeue from the
 handler. Any better ideas?
 
 Is this the right approach to take? I¹m not an oslo.messaging expert, so
 if there is a proper way to implement this change, I¹m all ears!
 
 Thanks, happy holidays!
 -john
 
 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
 1:
 https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages
 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___ OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Doug Hellmann
On Thu, Dec 19, 2013 at 6:31 PM, Herndon, John Luke john.hern...@hp.comwrote:

 Hi Folks,

 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver.

 I¹d like to get feedback from the community about this feature, and how we
 are planning to implement it. Here is what I’m currently thinking:

 1) This seems to fit well into oslo.messaging - batching may be a feature
 that other projects will find useful. After reviewing the changes that
 sileht has been working on in oslo.messaging, I think the right way to
 start off is to create a new executor that builds up a batch of
 notifications, and sends the batch to the dispatcher. We’d also add a
 timeout, so if a certain amount of time passes and the batch isn’t filled
 up, the notifications will be dispatched anyway. I’ve started a
 blueprint for this change and am filling in the details as I go along [1].


IIRC, the executor is meant to differentiate between threading, eventlet,
other async implementations, or other methods for dealing with the I/O. It
might be better to implement the batching at the dispatcher level instead.
That way no matter what I/O processing is in place, the batching will occur.


 2) In ceilometer, initialize the notification listener with the batch
 executor instead of the eventlet executor (this should probably be
 configurable)[2]. We can then send the entire batch of notifications to
 the storage driver to be processed as events, while maintaining the
 current method for converting notifications into samples.

 3) Error handling becomes more difficult. The executor needs to know if
 any of the notifications should be requeued. I think the right way to
 solve this is to return a list of notifications to requeue from the
 handler. Any better ideas?


Which handler do you mean?

Doug




 Is this the right approach to take? I¹m not an oslo.messaging expert, so
 if there is a proper way to implement this change, I¹m all ears!

 Thanks, happy holidays!
 -john

 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
 1:
 https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages
 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Julien Danjou
On Thu, Dec 19 2013, Herndon, John Luke wrote:

Hi John,

 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver. 

I think that is overall a good idea. And in my mind it could also a
bigger consequences that you would think. When we will start using
notifications instead of RPC calls for sending the samples, we may be
able to leverage that too.

Anyway, my main concern here is that I am not very enthusiast about
using the executor to do that. I wonder if there is not a way to ask the
broker to get as many as message as it has up to a limit?

You would have 100 messages waiting in the notifications.info queue, and
you would be able to tell to oslo.messaging that you want to read up to
10 messages at a time. If the underlying protocol (e.g. AMQP) can
support that too, it would be more efficient too.

-- 
Julien Danjou
/* Free Software hacker * independent consultant
   http://julien.danjou.info */


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 8:10 AM, Doug Hellmann doug.hellm...@dreamhost.com wrote:

 
 
 
 On Thu, Dec 19, 2013 at 6:31 PM, Herndon, John Luke john.hern...@hp.com 
 wrote:
 Hi Folks,
 
 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver.
 
 I¹d like to get feedback from the community about this feature, and how we
 are planning to implement it. Here is what I’m currently thinking:
 
 1) This seems to fit well into oslo.messaging - batching may be a feature
 that other projects will find useful. After reviewing the changes that
 sileht has been working on in oslo.messaging, I think the right way to
 start off is to create a new executor that builds up a batch of
 notifications, and sends the batch to the dispatcher. We’d also add a
 timeout, so if a certain amount of time passes and the batch isn’t filled
 up, the notifications will be dispatched anyway. I’ve started a
 blueprint for this change and am filling in the details as I go along [1].
 
 IIRC, the executor is meant to differentiate between threading, eventlet, 
 other async implementations, or other methods for dealing with the I/O. It 
 might be better to implement the batching at the dispatcher level instead. 
 That way no matter what I/O processing is in place, the batching will occur.
 

I thought about doing it in the dispatcher. One problem I see is handling 
message acks. It looks like the current executors are built around single 
messages andre-queueing single messages if problems occur. If we build up a 
batch in the dispatcher, either the executor has to wait for the whole batch to 
be committed (which wouldn’t work in the case of the blocking executor, or 
would leave a lot of green threads hanging around in the case of the eventlet 
executor), or the executor has to be modified to allow acking to be handled out 
of band. So, I was thinking it would be cleaner to write a new executor that is 
responsible for acking/requeueing the entire batch. Maybe I’m missing something?

 
 2) In ceilometer, initialize the notification listener with the batch
 executor instead of the eventlet executor (this should probably be
 configurable)[2]. We can then send the entire batch of notifications to
 the storage driver to be processed as events, while maintaining the
 current method for converting notifications into samples.
 
 3) Error handling becomes more difficult. The executor needs to know if
 any of the notifications should be requeued. I think the right way to
 solve this is to return a list of notifications to requeue from the
 handler. Any better ideas?
 
 Which handler do you mean?

Ah, sorry - handler is whichever method is registered to receive the batch from 
the dispatcher. In ceilometer’s case, this would be process_notifications I 
think.

 Doug
 
  
 
 Is this the right approach to take? I¹m not an oslo.messaging expert, so
 if there is a proper way to implement this change, I¹m all ears!
 
 Thanks, happy holidays!
 -john
 
 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
 1:
 https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages
 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-
John Herndon
HP Cloud
john.hern...@hp.com





smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info wrote:

 On Thu, Dec 19 2013, Herndon, John Luke wrote:
 
 Hi John,
 
 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver. 
 
 I think that is overall a good idea. And in my mind it could also a
 bigger consequences that you would think. When we will start using
 notifications instead of RPC calls for sending the samples, we may be
 able to leverage that too.
Cool, glad to hear it!

 Anyway, my main concern here is that I am not very enthusiast about
 using the executor to do that. I wonder if there is not a way to ask the
 broker to get as many as message as it has up to a limit?
 
 You would have 100 messages waiting in the notifications.info queue, and
 you would be able to tell to oslo.messaging that you want to read up to
 10 messages at a time. If the underlying protocol (e.g. AMQP) can
 support that too, it would be more efficient too.

Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing 
more than a single message at a time, but we could definitely have the broker 
store up the batch before sending it along. Other protocols may support bulk 
consumption. My one concern with this approach is error handling. Currently the 
executors treat each notification individually. So let’s say the broker hands 
100 messages at a time. When client is done processing the messages, the broker 
needs to know if message 25 had an error or not. We would somehow need to 
communicate back to the broker which messages failed. I think this may take 
some refactoring of executors/dispatchers. What do you think?

 
 -- 
 Julien Danjou
 /* Free Software hacker * independent consultant
   http://julien.danjou.info */

-
John Herndon
HP Cloud
john.hern...@hp.com





smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Julien Danjou
On Fri, Dec 20 2013, Herndon, John Luke wrote:

 Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing
 more than a single message at a time, but we could definitely have the
 broker store up the batch before sending it along. Other protocols may
 support bulk consumption. My one concern with this approach is error
 handling. Currently the executors treat each notification individually. So
 let’s say the broker hands 100 messages at a time. When client is done
 processing the messages, the broker needs to know if message 25 had an error
 or not. We would somehow need to communicate back to the broker which
 messages failed. I think this may take some refactoring of
 executors/dispatchers. What do you think?

Yeah, it definitely needs to change the messaging API a bit to handle
such a case. But in the end that will be a good thing to support such a
case, it being natively supported by the broker or not.

For brokers where it's not possible, it may be simple enough to have a
get_one_notification_nb() method that would either return a
notification or None if there's none to read, and would that
consequently have to be _non-blocking_.

So if the transport is smart we write:

  # Return up to max_number_of_notifications_to_read
  notifications =
  transport.get_notificatations(conf.max_number_of_notifications_to_read)
  storage.record(notifications)

Otherwise we do:

  for i in range(conf.max_number_of_notifications_to_read):
  notification = transport.get_one_notification_nb():
  if notification:
  notifications.append(notification)
  else:
  break
   storage.record(notifications)

So it's just about having the right primitive in oslo.messaging, we can
then build on top of that wherever that is.

-- 
Julien Danjou
/* Free Software hacker * independent consultant
   http://julien.danjou.info */


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote:

 On Fri, Dec 20 2013, Herndon, John Luke wrote:
 
 Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing
 more than a single message at a time, but we could definitely have the
 broker store up the batch before sending it along. Other protocols may
 support bulk consumption. My one concern with this approach is error
 handling. Currently the executors treat each notification individually. So
 let’s say the broker hands 100 messages at a time. When client is done
 processing the messages, the broker needs to know if message 25 had an error
 or not. We would somehow need to communicate back to the broker which
 messages failed. I think this may take some refactoring of
 executors/dispatchers. What do you think?
 
 Yeah, it definitely needs to change the messaging API a bit to handle
 such a case. But in the end that will be a good thing to support such a
 case, it being natively supported by the broker or not.
 
 For brokers where it's not possible, it may be simple enough to have a
 get_one_notification_nb() method that would either return a
 notification or None if there's none to read, and would that
 consequently have to be _non-blocking_.
 
 So if the transport is smart we write:
 
  # Return up to max_number_of_notifications_to_read
  notifications =
  transport.get_notificatations(conf.max_number_of_notifications_to_read)
  storage.record(notifications)
 
 Otherwise we do:
 
  for i in range(conf.max_number_of_notifications_to_read):
  notification = transport.get_one_notification_nb():
  if notification:
  notifications.append(notification)
  else:
  break
   storage.record(notifications)
 
 So it's just about having the right primitive in oslo.messaging, we can
 then build on top of that wherever that is.
 

I think this will work. I was considering putting in a timeout so the broker 
would not send off all of the messages immediately, and implement using 
blocking calls. If the consumer consumes faster than the publishers are 
publishing, this just becomes single-notification batches. So it may be 
beneficial to wait for more messages to arrive before sending off the batch. If 
the batch is full before the timeout is reached, then the batch would be sent 
off.

 -- 
 Julien Danjou
 /* Free Software hacker * independent consultant
   http://julien.danjou.info */

-
John Herndon
HP Cloud
john.hern...@hp.com





smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Gordon Sim

On 12/20/2013 05:27 PM, Herndon, John Luke wrote:


On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info
wrote:

Anyway, my main concern here is that I am not very enthusiast
about using the executor to do that. I wonder if there is not a way
to ask the broker to get as many as message as it has up to a
limit?

You would have 100 messages waiting in the notifications.info
queue, and you would be able to tell to oslo.messaging that you
want to read up to 10 messages at a time. If the underlying
protocol (e.g. AMQP) can support that too, it would be more
efficient too.


Yeah, I like this idea. As far as I can tell, AMQP doesn’t support
grabbing more than a single message at a time, but we could
definitely have the broker store up the batch before sending it
along.


AMQP (in all it's versions) allows for a subscription with a 
configurable amount of 'prefetch', which means the broker can send lots 
of messages without waiting for the client to request them one at a time.


That's not quite the same as the batching I think you are looking for, 
but it does allow the broker to do its own batching. My guess is the 
rabbit driver is already using basic.consume rather than basic.get 
anyway(?), so the broker is free to batch as it sees fit.  (I haven't 
actually dug into the kombu code to verify that however, perhaps someone 
else here can confirm?)


However you still need the client to have some way of batching up the 
messages and then processing them together.



Other protocols may support bulk consumption. My one concern
with this approach is error handling. Currently the executors treat
each notification individually. So let’s say the broker hands 100
messages at a time. When client is done processing the messages, the
broker needs to know if message 25 had an error or not. We would
somehow need to communicate back to the broker which messages failed.
I think this may take some refactoring of executors/dispatchers. What
do you think?


I've have some related questions, that I haven't yet satisfactorily 
answered yet. The extra context here may be useful in doing so.


(1) What are the expectations around message delivery guarantees for 
insertion into a store? I.e. if there is a failure, is it ok to get 
duplicate entries for notifications? (I'm assuming losing notifications 
is not acceptable).


(2) What would you want the broker to do with the failed messages? What 
sort of things might fail? Is it related to the message content itself? 
Or is it failures suspected to be of a temporal nature?


(3) How important is ordering ? If a failure causes some notifications 
to be inserted out of order is that a problem at all?


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Gordon Sim

On 12/20/2013 07:13 PM, Gordon Sim wrote:

AMQP (in all it's versions) allows for a subscription with a
configurable amount of 'prefetch', which means the broker can send lots
of messages without waiting for the client to request them one at a time.


Just as an aside, the impl_qpid.py driver currently explicitly restricts 
the broker to sending one at a time. Probably not what we want for the 
notifications at any rate (more justifiable perhaps for the 'invoke on 
one of a group of servers' case).


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Dan Dyer

On 12/20/2013 11:18 AM, Herndon, John Luke wrote:

On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote:


On Fri, Dec 20 2013, Herndon, John Luke wrote:


Yeah, I like this idea. As far as I can tell, AMQP doesn't support grabbing
more than a single message at a time, but we could definitely have the
broker store up the batch before sending it along. Other protocols may
support bulk consumption. My one concern with this approach is error
handling. Currently the executors treat each notification individually. So
let's say the broker hands 100 messages at a time. When client is done
processing the messages, the broker needs to know if message 25 had an error
or not. We would somehow need to communicate back to the broker which
messages failed. I think this may take some refactoring of
executors/dispatchers. What do you think?

Yeah, it definitely needs to change the messaging API a bit to handle
such a case. But in the end that will be a good thing to support such a
case, it being natively supported by the broker or not.

For brokers where it's not possible, it may be simple enough to have a
get_one_notification_nb() method that would either return a
notification or None if there's none to read, and would that
consequently have to be _non-blocking_.

So if the transport is smart we write:

  # Return up to max_number_of_notifications_to_read
  notifications =
  transport.get_notificatations(conf.max_number_of_notifications_to_read)
  storage.record(notifications)

Otherwise we do:

  for i in range(conf.max_number_of_notifications_to_read):
  notification = transport.get_one_notification_nb():
  if notification:
  notifications.append(notification)
  else:
  break
   storage.record(notifications)

So it's just about having the right primitive in oslo.messaging, we can
then build on top of that wherever that is.


I think this will work. I was considering putting in a timeout so the broker 
would not send off all of the messages immediately, and implement using 
blocking calls. If the consumer consumes faster than the publishers are 
publishing, this just becomes single-notification batches. So it may be 
beneficial to wait for more messages to arrive before sending off the batch. If 
the batch is full before the timeout is reached, then the batch would be sent 
off.


--
Julien Danjou
/* Free Software hacker * independent consultant
   http://julien.danjou.info */

-
John Herndon
HP Cloud
john.hern...@hp.com





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

A couple of things that I think need to be emphasized here:
1. the mechanism needs to be configurable, so if you are more worried 
about reliability than performance you would be able to turn off bulk 
loading
2. the caching size should also be configurable, so that we can limit 
your exposure to lost messages
3. while you can have the message queue hold the messages until you 
acknowledge them, it seems like this adds a lot of complexity to the 
interaction. you will need to be able to propagate this information all 
the way back from the storage driver.
4. any integration that is depdendent on a specific configuration on the 
rabbit server is brittle, since we have seen a lot of variation between 
services on this. I would prefer to control the behavior on the 
collection side.


So in general, I would prefer a mechanism that pulls the data in a 
default manner, caches on the collection side based on configuration 
that allows you to determine your own risk level and then manager 
retries in the storage driver or at the cache controller level.


Dan Dyer
HP cloud
dan.d...@hp.com

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Doug Hellmann
On Fri, Dec 20, 2013 at 12:15 PM, Herndon, John Luke john.hern...@hp.comwrote:


 On Dec 20, 2013, at 8:10 AM, Doug Hellmann doug.hellm...@dreamhost.com
 wrote:




 On Thu, Dec 19, 2013 at 6:31 PM, Herndon, John Luke 
 john.hern...@hp.comwrote:

 Hi Folks,

 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver.

 I¹d like to get feedback from the community about this feature, and how we
 are planning to implement it. Here is what I’m currently thinking:

 1) This seems to fit well into oslo.messaging - batching may be a feature
 that other projects will find useful. After reviewing the changes that
 sileht has been working on in oslo.messaging, I think the right way to
 start off is to create a new executor that builds up a batch of
 notifications, and sends the batch to the dispatcher. We’d also add a
 timeout, so if a certain amount of time passes and the batch isn’t filled
 up, the notifications will be dispatched anyway. I’ve started a
 blueprint for this change and am filling in the details as I go along [1].


 IIRC, the executor is meant to differentiate between threading, eventlet,
 other async implementations, or other methods for dealing with the I/O. It
 might be better to implement the batching at the dispatcher level instead.
 That way no matter what I/O processing is in place, the batching will occur.


 I thought about doing it in the dispatcher. One problem I see is handling
 message acks. It looks like the current executors are built around single
 messages andre-queueing single messages if problems occur. If we build up a
 batch in the dispatcher, either the executor has to wait for the whole
 batch to be committed (which wouldn’t work in the case of the blocking
 executor, or would leave a lot of green threads hanging around in the case
 of the eventlet executor), or the executor has to be modified to allow
 acking to be handled out of band. So, I was thinking it would be cleaner to
 write a new executor that is responsible for acking/requeueing the entire
 batch. Maybe I’m missing something?


No, you're right. Were you going to use eventlet again for the new
executor?





 2) In ceilometer, initialize the notification listener with the batch
 executor instead of the eventlet executor (this should probably be
 configurable)[2]. We can then send the entire batch of notifications to
 the storage driver to be processed as events, while maintaining the
 current method for converting notifications into samples.

 3) Error handling becomes more difficult. The executor needs to know if
 any of the notifications should be requeued. I think the right way to
 solve this is to return a list of notifications to requeue from the
 handler. Any better ideas?


 Which handler do you mean?


 Ah, sorry - handler is whichever method is registered to receive the batch
 from the dispatcher. In ceilometer’s case, this would be
 process_notifications I think.


 Doug




 Is this the right approach to take? I¹m not an oslo.messaging expert, so
 if there is a proper way to implement this change, I¹m all ears!

 Thanks, happy holidays!
 -john

 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
 1:

 https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages
 2:
 https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 -
 John Herndon
 HP Cloud
 john.hern...@hp.com




 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote:

 On 12/20/2013 05:27 PM, Herndon, John Luke wrote:
 
 On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info
 wrote:
 Anyway, my main concern here is that I am not very enthusiast
 about using the executor to do that. I wonder if there is not a way
 to ask the broker to get as many as message as it has up to a
 limit?
 
 You would have 100 messages waiting in the notifications.info
 queue, and you would be able to tell to oslo.messaging that you
 want to read up to 10 messages at a time. If the underlying
 protocol (e.g. AMQP) can support that too, it would be more
 efficient too.
 
 Yeah, I like this idea. As far as I can tell, AMQP doesn’t support
 grabbing more than a single message at a time, but we could
 definitely have the broker store up the batch before sending it
 along.
 
 AMQP (in all it's versions) allows for a subscription with a configurable 
 amount of 'prefetch', which means the broker can send lots of messages 
 without waiting for the client to request them one at a time.
 
 That's not quite the same as the batching I think you are looking for, but it 
 does allow the broker to do its own batching. My guess is the rabbit driver 
 is already using basic.consume rather than basic.get anyway(?), so the broker 
 is free to batch as it sees fit.  (I haven't actually dug into the kombu code 
 to verify that however, perhaps someone else here can confirm?)
 
Yeah, that should help out the performance a bit, but we will still need to 
work out the batching logic. I think basic.consume is likely the best way to 
go, I think it will be straight forward to implement the timeout mechanism I’m 
looking for in this case. Thanks for the tip :).

 However you still need the client to have some way of batching up the 
 messages and then processing them together.
 
 Other protocols may support bulk consumption. My one concern
 with this approach is error handling. Currently the executors treat
 each notification individually. So let’s say the broker hands 100
 messages at a time. When client is done processing the messages, the
 broker needs to know if message 25 had an error or not. We would
 somehow need to communicate back to the broker which messages failed.
 I think this may take some refactoring of executors/dispatchers. What
 do you think?
 
 I've have some related questions, that I haven't yet satisfactorily answered 
 yet. The extra context here may be useful in doing so.
 
 (1) What are the expectations around message delivery guarantees for 
 insertion into a store? I.e. if there is a failure, is it ok to get duplicate 
 entries for notifications? (I'm assuming losing notifications is not 
 acceptable).
I think there is probably a tolerance for duplicates but you’re right, missing 
a notification is unacceptable. Can anyone weigh in on how big of a deal 
duplicates are for meters? Duplicates aren’t really unique to the batching 
approach, though. If a consumer dies after it’s inserted a message into the 
data store but before the message is acked, the message will be requeued and 
handled by another consumer resulting in a duplicate. 

 (2) What would you want the broker to do with the failed messages? What sort 
 of things might fail? Is it related to the message content itself? Or is it 
 failures suspected to be of a temporal nature?
There will be situations where the message can’t be parsed, and those messages 
can’t just be thrown away. My current thought is that ceilometer could provide 
some sort of mechanism for sending messages that are invalid to an external 
data store (like a file, or a different topic on the amqp server) where a 
living, breathing human can look at them and try to parse out any meaningful 
information. Other errors might be “database not available”, in which case 
re-queing the message is probably the right way to go. If the consumer process 
crashes, all of the unasked messages need to be requeued and handled by a 
different consumer. Any other error cases?

 (3) How important is ordering ? If a failure causes some notifications to be 
 inserted out of order is that a problem at all?
From an event point of view, I don’t think this is a problem since the events 
have a generated timestamp.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-
John Herndon
HP Cloud
john.hern...@hp.com





smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Julien Danjou
On Fri, Dec 20 2013, Herndon, John Luke wrote:

 I think there is probably a tolerance for duplicates but you’re right,
 missing a notification is unacceptable. Can anyone weigh in on how big of a
 deal duplicates are for meters? Duplicates aren’t really unique to the
 batching approach, though. If a consumer dies after it’s inserted a message
 into the data store but before the message is acked, the message will be
 requeued and handled by another consumer resulting in a duplicate.

Duplicates can be a problem for metering, as if you see twice the same
event it's possible you will think it happened twice.

As for event storage, it won't be a problem if you use a good storage
driver that can have unique constraint; you'll just drop it and log the
fact that this should not have happened, or something like that.

 There will be situations where the message can’t be parsed, and those
 messages can’t just be thrown away. My current thought is that ceilometer
 could provide some sort of mechanism for sending messages that are invalid
 to an external data store (like a file, or a different topic on the amqp
 server) where a living, breathing human can look at them and try to parse
 out any meaningful information. Other errors might be “database not
 available”, in which case re-queing the message is probably the right way to
 go. If the consumer process crashes, all of the unasked messages need to be
 requeued and handled by a different consumer. Any other error cases?

Sounds good to me.

-- 
Julien Danjou
# Free Software hacker # independent consultant
# http://julien.danjou.info


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 1:12 PM, Dan Dyer dan.dye...@gmail.com wrote:

 On 12/20/2013 11:18 AM, Herndon, John Luke wrote:
 On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote:
 
 On Fri, Dec 20 2013, Herndon, John Luke wrote:
 
 Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing
 more than a single message at a time, but we could definitely have the
 broker store up the batch before sending it along. Other protocols may
 support bulk consumption. My one concern with this approach is error
 handling. Currently the executors treat each notification individually. So
 let’s say the broker hands 100 messages at a time. When client is done
 processing the messages, the broker needs to know if message 25 had an 
 error
 or not. We would somehow need to communicate back to the broker which
 messages failed. I think this may take some refactoring of
 executors/dispatchers. What do you think?
 Yeah, it definitely needs to change the messaging API a bit to handle
 such a case. But in the end that will be a good thing to support such a
 case, it being natively supported by the broker or not.
 
 For brokers where it's not possible, it may be simple enough to have a
 get_one_notification_nb() method that would either return a
 notification or None if there's none to read, and would that
 consequently have to be _non-blocking_.
 
 So if the transport is smart we write:
 
  # Return up to max_number_of_notifications_to_read
  notifications =
  transport.get_notificatations(conf.max_number_of_notifications_to_read)
  storage.record(notifications)
 
 Otherwise we do:
 
  for i in range(conf.max_number_of_notifications_to_read):
  notification = transport.get_one_notification_nb():
  if notification:
  notifications.append(notification)
  else:
  break
   storage.record(notifications)
 
 So it's just about having the right primitive in oslo.messaging, we can
 then build on top of that wherever that is.
 
 I think this will work. I was considering putting in a timeout so the broker 
 would not send off all of the messages immediately, and implement using 
 blocking calls. If the consumer consumes faster than the publishers are 
 publishing, this just becomes single-notification batches. So it may be 
 beneficial to wait for more messages to arrive before sending off the batch. 
 If the batch is full before the timeout is reached, then the batch would be 
 sent off.
 
 -- 
 Julien Danjou
 /* Free Software hacker * independent consultant
   http://julien.danjou.info */
 -
 John Herndon
 HP Cloud
 john.hern...@hp.com
 
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 A couple of things that I think need to be emphasized here:
 1. the mechanism needs to be configurable, so if you are more worried about 
 reliability than performance you would be able to turn off bulk loading
Definitely will be configurable, but I don’t think batching is going to be any 
less reliable than individual inserts. Can you expand on what is concerning?
 2. the caching size should also be configurable, so that we can limit your 
 exposure to lost messages
Agreed.
 3. while you can have the message queue hold the messages until you 
 acknowledge them, it seems like this adds a lot of complexity to the 
 interaction. you will need to be able to propagate this information all the 
 way back from the storage driver.
This is actually a pretty standard use case for AMQP, we have done it several 
times on in-house projects. The basic.ack call lets you acknowledge a whole 
batch of messages at once. Yes, we do have to figure out how to propagate the 
error cases back up to the broker, but I don’t think it will be so complicated 
that it’s not worth doing.
 4. any integration that is depdendent on a specific configuration on the 
 rabbit server is brittle, since we have seen a lot of variation between 
 services on this. I would prefer to control the behavior on the collection 
 side
Hm, I don’t understand…?
 So in general, I would prefer a mechanism that pulls the data in a default 
 manner, caches on the collection side based on configuration that allows you 
 to determine your own risk level and then manager retries in the storage 
 driver or at the cache controller level.
If you’re caching on the collector and the collector dies, then you’ve lost the 
whole batch of messages.  Then you have to invent some way of persisting the 
messages to disk until they been committed to the db and removing them 
afterwards. We originally talked about implementing a batching layer in the 
storage driver, but dragondm pointed out that the message queue is already 
hanging on to the messages and ensuring delivery, so it’s better to not 
reinvent that piece of the pipeline. This is a huge motivating factor for 
pursuing batching in oslo in my opinion.

 

Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Julien Danjou
On Fri, Dec 20 2013, Herndon, John Luke wrote:

 I think this will work. I was considering putting in a timeout so the broker
 would not send off all of the messages immediately, and implement using
 blocking calls. If the consumer consumes faster than the publishers are
 publishing, this just becomes single-notification batches. So it may be
 beneficial to wait for more messages to arrive before sending off the batch.
 If the batch is full before the timeout is reached, then the batch would be
 sent off.

I don't think you want to wait for other messages if you only picked on,
event with a timeout. It's better to record this one right away; while
you do that messages will potentially queue up in queue so on your next
call you'll pick more than one anyway.

Otherwise, yeah that should work fine.

-- 
Julien Danjou
# Free Software hacker # independent consultant
# http://julien.danjou.info


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Jay Pipes

On 12/20/2013 04:43 PM, Julien Danjou wrote:

On Fri, Dec 20 2013, Herndon, John Luke wrote:


I think there is probably a tolerance for duplicates but you’re right,
missing a notification is unacceptable. Can anyone weigh in on how big of a
deal duplicates are for meters? Duplicates aren’t really unique to the
batching approach, though. If a consumer dies after it’s inserted a message
into the data store but before the message is acked, the message will be
requeued and handled by another consumer resulting in a duplicate.


Duplicates can be a problem for metering, as if you see twice the same
event it's possible you will think it happened twice.

As for event storage, it won't be a problem if you use a good storage
driver that can have unique constraint; you'll just drop it and log the
fact that this should not have happened, or something like that.


The above brings up a point related to the implementation of the 
existing SQL driver code that will need to be re-thought with the 
introduction of batch notification processing.


Currently, the SQL driver's record_events() method [1] is written in a 
way that forces a new INSERT transaction for every record supplied to 
the method. If the record_events() method is called with 10K events, 
then 10K BEGIN; INSERT ...; COMMIT; transactions are executed against 
the server.


Suffice to say, this isn't efficient. :)

Ostensibly, from looking at the code, the reason that this approach was 
taken was to allow for the collection of duplicate event IDs, and return 
those duplicate event IDs to the caller.


Because of this code:

for event_model in event_models:
event = None
try:
with session.begin():
event = self._record_event(session, event_model)
except dbexc.DBDuplicateEntry:
problem_events.append((api_models.Event.DUPLICATE,
   event_model))

The session object will be commit()'d after the session.begin() context 
manager exits, which will cause the aforementioned BEGIN; INSERT; 
COMMIT; transaction to be executed against the server for each event record.


If we want to actually take advantage of the performance benefits of 
batching notification messages, the above code will need to be rewritten 
so that a single transaction is executed against the database for the 
entire batch of events.


Best,
-jay

[1] 
https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L932



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-19 Thread Herndon, John Luke
Hi Folks,

The Rackspace-HP team has been putting a lot of effort into performance
testing event collection in the ceilometer storage drivers[0]. Based on
some results of this testing, we would like to support batch consumption
of notifications, as it will greatly improve insertion performance. Batch
consumption in this case means waiting for a certain number of
notifications to arrive before sending to the storage
driver. 

I¹d like to get feedback from the community about this feature, and how we
are planning to implement it. Here is what I’m currently thinking:

1) This seems to fit well into oslo.messaging - batching may be a feature
that other projects will find useful. After reviewing the changes that
sileht has been working on in oslo.messaging, I think the right way to
start off is to create a new executor that builds up a batch of
notifications, and sends the batch to the dispatcher. We’d also add a
timeout, so if a certain amount of time passes and the batch isn’t filled
up, the notifications will be dispatched anyway. I’ve started a
blueprint for this change and am filling in the details as I go along [1].

2) In ceilometer, initialize the notification listener with the batch
executor instead of the eventlet executor (this should probably be
configurable)[2]. We can then send the entire batch of notifications to
the storage driver to be processed as events, while maintaining the
current method for converting notifications into samples.

3) Error handling becomes more difficult. The executor needs to know if
any of the notifications should be requeued. I think the right way to
solve this is to return a list of notifications to requeue from the
handler. Any better ideas?

Is this the right approach to take? I¹m not an oslo.messaging expert, so
if there is a proper way to implement this change, I¹m all ears!

Thanks, happy holidays!
-john

0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
1: 
https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages
2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev