Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 01/02/2014 10:46 PM, Herndon, John Luke wrote: On 1/2/14, 11:36 AM, Gordon Sim g...@redhat.com wrote: On 12/20/2013 09:26 PM, Herndon, John Luke wrote: On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote: On 12/20/2013 05:27 PM, Herndon, John Luke wrote: Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let¹s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? [...] (2) What would you want the broker to do with the failed messages? What sort of things might fail? Is it related to the message content itself? Or is it failures suspected to be of a temporal nature? There will be situations where the message can¹t be parsed, and those messages can¹t just be thrown away. My current thought is that ceilometer could provide some sort of mechanism for sending messages that are invalid to an external data store (like a file, or a different topic on the amqp server) where a living, breathing human can look at them and try to parse out any meaningful information. Right, in those cases simply requeueing probably is not the right thing and you really want it dead-lettered in some way. I guess the first question is whether that is part of the notification systems function, or if it is done by the application itself (e.g. by storing it or republishing it). If it is the latter you may not need any explicit negative acknowledgement. Exactly, I¹m thinking this is something we¹d build into ceilometer and not oslo, since ceilometer is where the event parsing knowledge lives. From an oslo point of view, the message would be 'acked¹. Other errors might be ³database not available², in which case re-queing the message is probably the right way to go. That does mean however that the backlog of messages starts to grow on the broker, so some scheme for dealing with this if the database outage goes on for a bit is probably important. It also means that the messages will keep being retried without any 'backoff' waiting for the database to be restored which could increase the load. This is a problem we already have :( Agreed, it is a property of reliable (i.e. acknowledged) transfer from the broker, rather than batching. And of course, some degree of buffering here is exactly what message queues are supposed to provide. The point is simply to provide some way of configuring things so that this can be bounded, or prevented from taking down the entire broker. (And perhaps some way of altering the unfortunate someone!) https://github.com/openstack/ceilometer/blob/master/ceilometer/notification .py#L156-L158 Since notifications cannot be lost, overflow needs to be detected and the messages need to be saved. I¹m thinking the database being down is a rare occurrence that will be worthy of waking someone up in the middle of the night. One possible solution: flip the collector into an emergency mode and save notifications to disc until the issue is resolved. Once the db is up and running, the collector inserts all of these saved messages (as one big batch!). Thoughts? I¹m not sure I understand what you are saying about retrying without a backoff. Can you explain? I mean that if the messages are explicitly requeued and the original subscription is still active, they will be immediately redelivered and will thus keep cycling from broker to client, back to broker, back to client etc etc until the database is available again. Pulling messages off continually like this without actually being able to dequeue them may reduce the brokers effectiveness at e.g. paging out, and in any event involves some unnecessary load on top of the expanding queue. It might be better, just as an example, to abort the connection to the broker (implicitly requeueing all unacked messages), and only reconnect when the database becomes available (and that can be tried after 1 second, then 2, then 4 etc up to some maximum retry interval). Or another alternative would be to leave the connection to the broker, but by not requeing or acking ensure that once the prefetch has been reached, no further messages will be delivered. Then locally, on the client, retry the processing for the prefetched messages until the database is back again. The basic point I'm trying to make is that it seems to me there is little value in simply handing the messages back to the broker for immediate redelivery back to the client. It delays the retry certainly, but at unnecessary expense. More generally I wonder whether an explicit negative acknowledgement is actually needed in the notify API at all. If it isn't, that may simplify things for
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 12/20/13, 11:57 PM, Jay Pipes jaypi...@gmail.com wrote: On 12/20/2013 04:43 PM, Julien Danjou wrote: On Fri, Dec 20 2013, Herndon, John Luke wrote: I think there is probably a tolerance for duplicates but you¹re right, missing a notification is unacceptable. Can anyone weigh in on how big of a deal duplicates are for meters? Duplicates aren¹t really unique to the batching approach, though. If a consumer dies after it¹s inserted a message into the data store but before the message is acked, the message will be requeued and handled by another consumer resulting in a duplicate. Duplicates can be a problem for metering, as if you see twice the same event it's possible you will think it happened twice. As for event storage, it won't be a problem if you use a good storage driver that can have unique constraint; you'll just drop it and log the fact that this should not have happened, or something like that. The above brings up a point related to the implementation of the existing SQL driver code that will need to be re-thought with the introduction of batch notification processing. Currently, the SQL driver's record_events() method [1] is written in a way that forces a new INSERT transaction for every record supplied to the method. If the record_events() method is called with 10K events, then 10K BEGIN; INSERT ...; COMMIT; transactions are executed against the server. Suffice to say, this isn't efficient. :) Ostensibly, from looking at the code, the reason that this approach was taken was to allow for the collection of duplicate event IDs, and return those duplicate event IDs to the caller. Because of this code: for event_model in event_models: event = None try: with session.begin(): event = self._record_event(session, event_model) except dbexc.DBDuplicateEntry: problem_events.append((api_models.Event.DUPLICATE, event_model)) The session object will be commit()'d after the session.begin() context manager exits, which will cause the aforementioned BEGIN; INSERT; COMMIT; transaction to be executed against the server for each event record. If we want to actually take advantage of the performance benefits of batching notification messages, the above code will need to be rewritten so that a single transaction is executed against the database for the entire batch of events. Yeah, this makes sense. Working on this driver is definitely on the to-do list (we also need to cache the event an trait types so several queries to the db are not incurred for each event). In the above code, we still have to deal with the dbduplicate error, but it gets much harder. The options I can think of are: 1) comb through the batch of events, remove the duplicate and try again or 2) allow the duplicate to be inserted and deal with it later. -john Best, -jay [1] https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/imp l_sqlalchemy.py#L932 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 12/20/2013 09:26 PM, Herndon, John Luke wrote: On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote: On 12/20/2013 05:27 PM, Herndon, John Luke wrote: Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? [...] (2) What would you want the broker to do with the failed messages? What sort of things might fail? Is it related to the message content itself? Or is it failures suspected to be of a temporal nature? There will be situations where the message can’t be parsed, and those messages can’t just be thrown away. My current thought is that ceilometer could provide some sort of mechanism for sending messages that are invalid to an external data store (like a file, or a different topic on the amqp server) where a living, breathing human can look at them and try to parse out any meaningful information. Right, in those cases simply requeueing probably is not the right thing and you really want it dead-lettered in some way. I guess the first question is whether that is part of the notification systems function, or if it is done by the application itself (e.g. by storing it or republishing it). If it is the latter you may not need any explicit negative acknowledgement. Other errors might be “database not available”, in which case re-queing the message is probably the right way to go. That does mean however that the backlog of messages starts to grow on the broker, so some scheme for dealing with this if the database outage goes on for a bit is probably important. It also means that the messages will keep being retried without any 'backoff' waiting for the database to be restored which could increase the load. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 1/2/14, 11:36 AM, Gordon Sim g...@redhat.com wrote: On 12/20/2013 09:26 PM, Herndon, John Luke wrote: On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote: On 12/20/2013 05:27 PM, Herndon, John Luke wrote: Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let¹s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? [...] (2) What would you want the broker to do with the failed messages? What sort of things might fail? Is it related to the message content itself? Or is it failures suspected to be of a temporal nature? There will be situations where the message can¹t be parsed, and those messages can¹t just be thrown away. My current thought is that ceilometer could provide some sort of mechanism for sending messages that are invalid to an external data store (like a file, or a different topic on the amqp server) where a living, breathing human can look at them and try to parse out any meaningful information. Right, in those cases simply requeueing probably is not the right thing and you really want it dead-lettered in some way. I guess the first question is whether that is part of the notification systems function, or if it is done by the application itself (e.g. by storing it or republishing it). If it is the latter you may not need any explicit negative acknowledgement. Exactly, I¹m thinking this is something we¹d build into ceilometer and not oslo, since ceilometer is where the event parsing knowledge lives. From an oslo point of view, the message would be 'acked¹. Other errors might be ³database not available², in which case re-queing the message is probably the right way to go. That does mean however that the backlog of messages starts to grow on the broker, so some scheme for dealing with this if the database outage goes on for a bit is probably important. It also means that the messages will keep being retried without any 'backoff' waiting for the database to be restored which could increase the load. This is a problem we already have :( https://github.com/openstack/ceilometer/blob/master/ceilometer/notification .py#L156-L158 Since notifications cannot be lost, overflow needs to be detected and the messages need to be saved. I¹m thinking the database being down is a rare occurrence that will be worthy of waking someone up in the middle of the night. One possible solution: flip the collector into an emergency mode and save notifications to disc until the issue is resolved. Once the db is up and running, the collector inserts all of these saved messages (as one big batch!). Thoughts? I¹m not sure I understand what you are saying about retrying without a backoff. Can you explain? -john ___ OpenStack-dev mailing l OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 12/21/2013 04:51 AM, Boris Pavlovic wrote: Jay, The session object will be commit()'d after the session.begin() context manager exits, which will cause the aforementioned BEGIN; INSERT; COMMIT; transaction to be executed against the server for each event record. It is a just half of problem. We should use SQL Bulk Inserts as well. And we should mention that declarative_base won't make this work for us transparent. (Even if we make all in one transactions there will be N Inserts). Well, the performance benefit will show up if there is a single transaction with multiple INSERT statements in it. The slowdown in performance is due to the multiple COMMITs, which each typically cause an fsync() (or fdatasync()), which is the slow part of the operation. Having a transaction containing thousands of INSERT statements with one COMMIT is much better performing since there is only a single call to fsync() the log records. Not quite sure what you mean about the declarative_base not working for this. Would you mind elaborating a bit more? Thanks! -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
Hi John, As for me your ideas look very interesting. As I understood notification messages will be kept in MQ for some time (during batch-basket is being filled), right? I'm concerned about the additional load that will be on MQ (Rabbit). Thanks, Nadya On Fri, Dec 20, 2013 at 3:31 AM, Herndon, John Luke john.hern...@hp.comwrote: Hi Folks, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I¹d like to get feedback from the community about this feature, and how we are planning to implement it. Here is what I’m currently thinking: 1) This seems to fit well into oslo.messaging - batching may be a feature that other projects will find useful. After reviewing the changes that sileht has been working on in oslo.messaging, I think the right way to start off is to create a new executor that builds up a batch of notifications, and sends the batch to the dispatcher. We’d also add a timeout, so if a certain amount of time passes and the batch isn’t filled up, the notifications will be dispatched anyway. I’ve started a blueprint for this change and am filling in the details as I go along [1]. 2) In ceilometer, initialize the notification listener with the batch executor instead of the eventlet executor (this should probably be configurable)[2]. We can then send the entire batch of notifications to the storage driver to be processed as events, while maintaining the current method for converting notifications into samples. 3) Error handling becomes more difficult. The executor needs to know if any of the notifications should be requeued. I think the right way to solve this is to return a list of notifications to requeue from the handler. Any better ideas? Is this the right approach to take? I¹m not an oslo.messaging expert, so if there is a proper way to implement this change, I¹m all ears! Thanks, happy holidays! -john 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing 1: https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
Hi Nadya, Yep, that’s right, the notifications stick around on the server until they are acknowledged so there is extra overhead involved. I only have experience with rabbitmq, so I can’t speak for other transports, but we have used this strategy internally for other purposes, and have reached 10k messages/second on a single consumer using batch message consumption (i.e., consume N messages, process them, then ack all N at once). We’ve found that being able to acknowledge the entire batch of messages at a time leads to a huge performance increase. This is another motivating factor for moving towards batches. But to your point, making this configurable is the right way to go just in case other transports don’t react as well. Thanks, -john From: Nadya Privalova nprival...@mirantis.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: Fri, 20 Dec 2013 15:25:55 +0400 To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches Hi John, As for me your ideas look very interesting. As I understood notification messages will be kept in MQ for some time (during batch-basket is being filled), right? I'm concerned about the additional load that will be on MQ (Rabbit). Thanks, Nadya On Fri, Dec 20, 2013 at 3:31 AM, Herndon, John Luke john.hern...@hp.com wrote: Hi Folks, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I¹d like to get feedback from the community about this feature, and how we are planning to implement it. Here is what I’m currently thinking: 1) This seems to fit well into oslo.messaging - batching may be a feature that other projects will find useful. After reviewing the changes that sileht has been working on in oslo.messaging, I think the right way to start off is to create a new executor that builds up a batch of notifications, and sends the batch to the dispatcher. We’d also add a timeout, so if a certain amount of time passes and the batch isn’t filled up, the notifications will be dispatched anyway. I’ve started a blueprint for this change and am filling in the details as I go along [1]. 2) In ceilometer, initialize the notification listener with the batch executor instead of the eventlet executor (this should probably be configurable)[2]. We can then send the entire batch of notifications to the storage driver to be processed as events, while maintaining the current method for converting notifications into samples. 3) Error handling becomes more difficult. The executor needs to know if any of the notifications should be requeued. I think the right way to solve this is to return a list of notifications to requeue from the handler. Any better ideas? Is this the right approach to take? I¹m not an oslo.messaging expert, so if there is a proper way to implement this change, I¹m all ears! Thanks, happy holidays! -john 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing 1: https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Thu, Dec 19, 2013 at 6:31 PM, Herndon, John Luke john.hern...@hp.comwrote: Hi Folks, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I¹d like to get feedback from the community about this feature, and how we are planning to implement it. Here is what I’m currently thinking: 1) This seems to fit well into oslo.messaging - batching may be a feature that other projects will find useful. After reviewing the changes that sileht has been working on in oslo.messaging, I think the right way to start off is to create a new executor that builds up a batch of notifications, and sends the batch to the dispatcher. We’d also add a timeout, so if a certain amount of time passes and the batch isn’t filled up, the notifications will be dispatched anyway. I’ve started a blueprint for this change and am filling in the details as I go along [1]. IIRC, the executor is meant to differentiate between threading, eventlet, other async implementations, or other methods for dealing with the I/O. It might be better to implement the batching at the dispatcher level instead. That way no matter what I/O processing is in place, the batching will occur. 2) In ceilometer, initialize the notification listener with the batch executor instead of the eventlet executor (this should probably be configurable)[2]. We can then send the entire batch of notifications to the storage driver to be processed as events, while maintaining the current method for converting notifications into samples. 3) Error handling becomes more difficult. The executor needs to know if any of the notifications should be requeued. I think the right way to solve this is to return a list of notifications to requeue from the handler. Any better ideas? Which handler do you mean? Doug Is this the right approach to take? I¹m not an oslo.messaging expert, so if there is a proper way to implement this change, I¹m all ears! Thanks, happy holidays! -john 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing 1: https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Thu, Dec 19 2013, Herndon, John Luke wrote: Hi John, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I think that is overall a good idea. And in my mind it could also a bigger consequences that you would think. When we will start using notifications instead of RPC calls for sending the samples, we may be able to leverage that too. Anyway, my main concern here is that I am not very enthusiast about using the executor to do that. I wonder if there is not a way to ask the broker to get as many as message as it has up to a limit? You would have 100 messages waiting in the notifications.info queue, and you would be able to tell to oslo.messaging that you want to read up to 10 messages at a time. If the underlying protocol (e.g. AMQP) can support that too, it would be more efficient too. -- Julien Danjou /* Free Software hacker * independent consultant http://julien.danjou.info */ signature.asc Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 8:10 AM, Doug Hellmann doug.hellm...@dreamhost.com wrote: On Thu, Dec 19, 2013 at 6:31 PM, Herndon, John Luke john.hern...@hp.com wrote: Hi Folks, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I¹d like to get feedback from the community about this feature, and how we are planning to implement it. Here is what I’m currently thinking: 1) This seems to fit well into oslo.messaging - batching may be a feature that other projects will find useful. After reviewing the changes that sileht has been working on in oslo.messaging, I think the right way to start off is to create a new executor that builds up a batch of notifications, and sends the batch to the dispatcher. We’d also add a timeout, so if a certain amount of time passes and the batch isn’t filled up, the notifications will be dispatched anyway. I’ve started a blueprint for this change and am filling in the details as I go along [1]. IIRC, the executor is meant to differentiate between threading, eventlet, other async implementations, or other methods for dealing with the I/O. It might be better to implement the batching at the dispatcher level instead. That way no matter what I/O processing is in place, the batching will occur. I thought about doing it in the dispatcher. One problem I see is handling message acks. It looks like the current executors are built around single messages andre-queueing single messages if problems occur. If we build up a batch in the dispatcher, either the executor has to wait for the whole batch to be committed (which wouldn’t work in the case of the blocking executor, or would leave a lot of green threads hanging around in the case of the eventlet executor), or the executor has to be modified to allow acking to be handled out of band. So, I was thinking it would be cleaner to write a new executor that is responsible for acking/requeueing the entire batch. Maybe I’m missing something? 2) In ceilometer, initialize the notification listener with the batch executor instead of the eventlet executor (this should probably be configurable)[2]. We can then send the entire batch of notifications to the storage driver to be processed as events, while maintaining the current method for converting notifications into samples. 3) Error handling becomes more difficult. The executor needs to know if any of the notifications should be requeued. I think the right way to solve this is to return a list of notifications to requeue from the handler. Any better ideas? Which handler do you mean? Ah, sorry - handler is whichever method is registered to receive the batch from the dispatcher. In ceilometer’s case, this would be process_notifications I think. Doug Is this the right approach to take? I¹m not an oslo.messaging expert, so if there is a proper way to implement this change, I¹m all ears! Thanks, happy holidays! -john 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing 1: https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev - John Herndon HP Cloud john.hern...@hp.com smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info wrote: On Thu, Dec 19 2013, Herndon, John Luke wrote: Hi John, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I think that is overall a good idea. And in my mind it could also a bigger consequences that you would think. When we will start using notifications instead of RPC calls for sending the samples, we may be able to leverage that too. Cool, glad to hear it! Anyway, my main concern here is that I am not very enthusiast about using the executor to do that. I wonder if there is not a way to ask the broker to get as many as message as it has up to a limit? You would have 100 messages waiting in the notifications.info queue, and you would be able to tell to oslo.messaging that you want to read up to 10 messages at a time. If the underlying protocol (e.g. AMQP) can support that too, it would be more efficient too. Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? -- Julien Danjou /* Free Software hacker * independent consultant http://julien.danjou.info */ - John Herndon HP Cloud john.hern...@hp.com smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Fri, Dec 20 2013, Herndon, John Luke wrote: Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? Yeah, it definitely needs to change the messaging API a bit to handle such a case. But in the end that will be a good thing to support such a case, it being natively supported by the broker or not. For brokers where it's not possible, it may be simple enough to have a get_one_notification_nb() method that would either return a notification or None if there's none to read, and would that consequently have to be _non-blocking_. So if the transport is smart we write: # Return up to max_number_of_notifications_to_read notifications = transport.get_notificatations(conf.max_number_of_notifications_to_read) storage.record(notifications) Otherwise we do: for i in range(conf.max_number_of_notifications_to_read): notification = transport.get_one_notification_nb(): if notification: notifications.append(notification) else: break storage.record(notifications) So it's just about having the right primitive in oslo.messaging, we can then build on top of that wherever that is. -- Julien Danjou /* Free Software hacker * independent consultant http://julien.danjou.info */ signature.asc Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote: On Fri, Dec 20 2013, Herndon, John Luke wrote: Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? Yeah, it definitely needs to change the messaging API a bit to handle such a case. But in the end that will be a good thing to support such a case, it being natively supported by the broker or not. For brokers where it's not possible, it may be simple enough to have a get_one_notification_nb() method that would either return a notification or None if there's none to read, and would that consequently have to be _non-blocking_. So if the transport is smart we write: # Return up to max_number_of_notifications_to_read notifications = transport.get_notificatations(conf.max_number_of_notifications_to_read) storage.record(notifications) Otherwise we do: for i in range(conf.max_number_of_notifications_to_read): notification = transport.get_one_notification_nb(): if notification: notifications.append(notification) else: break storage.record(notifications) So it's just about having the right primitive in oslo.messaging, we can then build on top of that wherever that is. I think this will work. I was considering putting in a timeout so the broker would not send off all of the messages immediately, and implement using blocking calls. If the consumer consumes faster than the publishers are publishing, this just becomes single-notification batches. So it may be beneficial to wait for more messages to arrive before sending off the batch. If the batch is full before the timeout is reached, then the batch would be sent off. -- Julien Danjou /* Free Software hacker * independent consultant http://julien.danjou.info */ - John Herndon HP Cloud john.hern...@hp.com smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 12/20/2013 05:27 PM, Herndon, John Luke wrote: On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info wrote: Anyway, my main concern here is that I am not very enthusiast about using the executor to do that. I wonder if there is not a way to ask the broker to get as many as message as it has up to a limit? You would have 100 messages waiting in the notifications.info queue, and you would be able to tell to oslo.messaging that you want to read up to 10 messages at a time. If the underlying protocol (e.g. AMQP) can support that too, it would be more efficient too. Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. AMQP (in all it's versions) allows for a subscription with a configurable amount of 'prefetch', which means the broker can send lots of messages without waiting for the client to request them one at a time. That's not quite the same as the batching I think you are looking for, but it does allow the broker to do its own batching. My guess is the rabbit driver is already using basic.consume rather than basic.get anyway(?), so the broker is free to batch as it sees fit. (I haven't actually dug into the kombu code to verify that however, perhaps someone else here can confirm?) However you still need the client to have some way of batching up the messages and then processing them together. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? I've have some related questions, that I haven't yet satisfactorily answered yet. The extra context here may be useful in doing so. (1) What are the expectations around message delivery guarantees for insertion into a store? I.e. if there is a failure, is it ok to get duplicate entries for notifications? (I'm assuming losing notifications is not acceptable). (2) What would you want the broker to do with the failed messages? What sort of things might fail? Is it related to the message content itself? Or is it failures suspected to be of a temporal nature? (3) How important is ordering ? If a failure causes some notifications to be inserted out of order is that a problem at all? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 12/20/2013 07:13 PM, Gordon Sim wrote: AMQP (in all it's versions) allows for a subscription with a configurable amount of 'prefetch', which means the broker can send lots of messages without waiting for the client to request them one at a time. Just as an aside, the impl_qpid.py driver currently explicitly restricts the broker to sending one at a time. Probably not what we want for the notifications at any rate (more justifiable perhaps for the 'invoke on one of a group of servers' case). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 12/20/2013 11:18 AM, Herndon, John Luke wrote: On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote: On Fri, Dec 20 2013, Herndon, John Luke wrote: Yeah, I like this idea. As far as I can tell, AMQP doesn't support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let's say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? Yeah, it definitely needs to change the messaging API a bit to handle such a case. But in the end that will be a good thing to support such a case, it being natively supported by the broker or not. For brokers where it's not possible, it may be simple enough to have a get_one_notification_nb() method that would either return a notification or None if there's none to read, and would that consequently have to be _non-blocking_. So if the transport is smart we write: # Return up to max_number_of_notifications_to_read notifications = transport.get_notificatations(conf.max_number_of_notifications_to_read) storage.record(notifications) Otherwise we do: for i in range(conf.max_number_of_notifications_to_read): notification = transport.get_one_notification_nb(): if notification: notifications.append(notification) else: break storage.record(notifications) So it's just about having the right primitive in oslo.messaging, we can then build on top of that wherever that is. I think this will work. I was considering putting in a timeout so the broker would not send off all of the messages immediately, and implement using blocking calls. If the consumer consumes faster than the publishers are publishing, this just becomes single-notification batches. So it may be beneficial to wait for more messages to arrive before sending off the batch. If the batch is full before the timeout is reached, then the batch would be sent off. -- Julien Danjou /* Free Software hacker * independent consultant http://julien.danjou.info */ - John Herndon HP Cloud john.hern...@hp.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev A couple of things that I think need to be emphasized here: 1. the mechanism needs to be configurable, so if you are more worried about reliability than performance you would be able to turn off bulk loading 2. the caching size should also be configurable, so that we can limit your exposure to lost messages 3. while you can have the message queue hold the messages until you acknowledge them, it seems like this adds a lot of complexity to the interaction. you will need to be able to propagate this information all the way back from the storage driver. 4. any integration that is depdendent on a specific configuration on the rabbit server is brittle, since we have seen a lot of variation between services on this. I would prefer to control the behavior on the collection side. So in general, I would prefer a mechanism that pulls the data in a default manner, caches on the collection side based on configuration that allows you to determine your own risk level and then manager retries in the storage driver or at the cache controller level. Dan Dyer HP cloud dan.d...@hp.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Fri, Dec 20, 2013 at 12:15 PM, Herndon, John Luke john.hern...@hp.comwrote: On Dec 20, 2013, at 8:10 AM, Doug Hellmann doug.hellm...@dreamhost.com wrote: On Thu, Dec 19, 2013 at 6:31 PM, Herndon, John Luke john.hern...@hp.comwrote: Hi Folks, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I¹d like to get feedback from the community about this feature, and how we are planning to implement it. Here is what I’m currently thinking: 1) This seems to fit well into oslo.messaging - batching may be a feature that other projects will find useful. After reviewing the changes that sileht has been working on in oslo.messaging, I think the right way to start off is to create a new executor that builds up a batch of notifications, and sends the batch to the dispatcher. We’d also add a timeout, so if a certain amount of time passes and the batch isn’t filled up, the notifications will be dispatched anyway. I’ve started a blueprint for this change and am filling in the details as I go along [1]. IIRC, the executor is meant to differentiate between threading, eventlet, other async implementations, or other methods for dealing with the I/O. It might be better to implement the batching at the dispatcher level instead. That way no matter what I/O processing is in place, the batching will occur. I thought about doing it in the dispatcher. One problem I see is handling message acks. It looks like the current executors are built around single messages andre-queueing single messages if problems occur. If we build up a batch in the dispatcher, either the executor has to wait for the whole batch to be committed (which wouldn’t work in the case of the blocking executor, or would leave a lot of green threads hanging around in the case of the eventlet executor), or the executor has to be modified to allow acking to be handled out of band. So, I was thinking it would be cleaner to write a new executor that is responsible for acking/requeueing the entire batch. Maybe I’m missing something? No, you're right. Were you going to use eventlet again for the new executor? 2) In ceilometer, initialize the notification listener with the batch executor instead of the eventlet executor (this should probably be configurable)[2]. We can then send the entire batch of notifications to the storage driver to be processed as events, while maintaining the current method for converting notifications into samples. 3) Error handling becomes more difficult. The executor needs to know if any of the notifications should be requeued. I think the right way to solve this is to return a list of notifications to requeue from the handler. Any better ideas? Which handler do you mean? Ah, sorry - handler is whichever method is registered to receive the batch from the dispatcher. In ceilometer’s case, this would be process_notifications I think. Doug Is this the right approach to take? I¹m not an oslo.messaging expert, so if there is a proper way to implement this change, I¹m all ears! Thanks, happy holidays! -john 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing 1: https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev - John Herndon HP Cloud john.hern...@hp.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote: On 12/20/2013 05:27 PM, Herndon, John Luke wrote: On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info wrote: Anyway, my main concern here is that I am not very enthusiast about using the executor to do that. I wonder if there is not a way to ask the broker to get as many as message as it has up to a limit? You would have 100 messages waiting in the notifications.info queue, and you would be able to tell to oslo.messaging that you want to read up to 10 messages at a time. If the underlying protocol (e.g. AMQP) can support that too, it would be more efficient too. Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. AMQP (in all it's versions) allows for a subscription with a configurable amount of 'prefetch', which means the broker can send lots of messages without waiting for the client to request them one at a time. That's not quite the same as the batching I think you are looking for, but it does allow the broker to do its own batching. My guess is the rabbit driver is already using basic.consume rather than basic.get anyway(?), so the broker is free to batch as it sees fit. (I haven't actually dug into the kombu code to verify that however, perhaps someone else here can confirm?) Yeah, that should help out the performance a bit, but we will still need to work out the batching logic. I think basic.consume is likely the best way to go, I think it will be straight forward to implement the timeout mechanism I’m looking for in this case. Thanks for the tip :). However you still need the client to have some way of batching up the messages and then processing them together. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? I've have some related questions, that I haven't yet satisfactorily answered yet. The extra context here may be useful in doing so. (1) What are the expectations around message delivery guarantees for insertion into a store? I.e. if there is a failure, is it ok to get duplicate entries for notifications? (I'm assuming losing notifications is not acceptable). I think there is probably a tolerance for duplicates but you’re right, missing a notification is unacceptable. Can anyone weigh in on how big of a deal duplicates are for meters? Duplicates aren’t really unique to the batching approach, though. If a consumer dies after it’s inserted a message into the data store but before the message is acked, the message will be requeued and handled by another consumer resulting in a duplicate. (2) What would you want the broker to do with the failed messages? What sort of things might fail? Is it related to the message content itself? Or is it failures suspected to be of a temporal nature? There will be situations where the message can’t be parsed, and those messages can’t just be thrown away. My current thought is that ceilometer could provide some sort of mechanism for sending messages that are invalid to an external data store (like a file, or a different topic on the amqp server) where a living, breathing human can look at them and try to parse out any meaningful information. Other errors might be “database not available”, in which case re-queing the message is probably the right way to go. If the consumer process crashes, all of the unasked messages need to be requeued and handled by a different consumer. Any other error cases? (3) How important is ordering ? If a failure causes some notifications to be inserted out of order is that a problem at all? From an event point of view, I don’t think this is a problem since the events have a generated timestamp. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev - John Herndon HP Cloud john.hern...@hp.com smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Fri, Dec 20 2013, Herndon, John Luke wrote: I think there is probably a tolerance for duplicates but you’re right, missing a notification is unacceptable. Can anyone weigh in on how big of a deal duplicates are for meters? Duplicates aren’t really unique to the batching approach, though. If a consumer dies after it’s inserted a message into the data store but before the message is acked, the message will be requeued and handled by another consumer resulting in a duplicate. Duplicates can be a problem for metering, as if you see twice the same event it's possible you will think it happened twice. As for event storage, it won't be a problem if you use a good storage driver that can have unique constraint; you'll just drop it and log the fact that this should not have happened, or something like that. There will be situations where the message can’t be parsed, and those messages can’t just be thrown away. My current thought is that ceilometer could provide some sort of mechanism for sending messages that are invalid to an external data store (like a file, or a different topic on the amqp server) where a living, breathing human can look at them and try to parse out any meaningful information. Other errors might be “database not available”, in which case re-queing the message is probably the right way to go. If the consumer process crashes, all of the unasked messages need to be requeued and handled by a different consumer. Any other error cases? Sounds good to me. -- Julien Danjou # Free Software hacker # independent consultant # http://julien.danjou.info signature.asc Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 1:12 PM, Dan Dyer dan.dye...@gmail.com wrote: On 12/20/2013 11:18 AM, Herndon, John Luke wrote: On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote: On Fri, Dec 20 2013, Herndon, John Luke wrote: Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? Yeah, it definitely needs to change the messaging API a bit to handle such a case. But in the end that will be a good thing to support such a case, it being natively supported by the broker or not. For brokers where it's not possible, it may be simple enough to have a get_one_notification_nb() method that would either return a notification or None if there's none to read, and would that consequently have to be _non-blocking_. So if the transport is smart we write: # Return up to max_number_of_notifications_to_read notifications = transport.get_notificatations(conf.max_number_of_notifications_to_read) storage.record(notifications) Otherwise we do: for i in range(conf.max_number_of_notifications_to_read): notification = transport.get_one_notification_nb(): if notification: notifications.append(notification) else: break storage.record(notifications) So it's just about having the right primitive in oslo.messaging, we can then build on top of that wherever that is. I think this will work. I was considering putting in a timeout so the broker would not send off all of the messages immediately, and implement using blocking calls. If the consumer consumes faster than the publishers are publishing, this just becomes single-notification batches. So it may be beneficial to wait for more messages to arrive before sending off the batch. If the batch is full before the timeout is reached, then the batch would be sent off. -- Julien Danjou /* Free Software hacker * independent consultant http://julien.danjou.info */ - John Herndon HP Cloud john.hern...@hp.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev A couple of things that I think need to be emphasized here: 1. the mechanism needs to be configurable, so if you are more worried about reliability than performance you would be able to turn off bulk loading Definitely will be configurable, but I don’t think batching is going to be any less reliable than individual inserts. Can you expand on what is concerning? 2. the caching size should also be configurable, so that we can limit your exposure to lost messages Agreed. 3. while you can have the message queue hold the messages until you acknowledge them, it seems like this adds a lot of complexity to the interaction. you will need to be able to propagate this information all the way back from the storage driver. This is actually a pretty standard use case for AMQP, we have done it several times on in-house projects. The basic.ack call lets you acknowledge a whole batch of messages at once. Yes, we do have to figure out how to propagate the error cases back up to the broker, but I don’t think it will be so complicated that it’s not worth doing. 4. any integration that is depdendent on a specific configuration on the rabbit server is brittle, since we have seen a lot of variation between services on this. I would prefer to control the behavior on the collection side Hm, I don’t understand…? So in general, I would prefer a mechanism that pulls the data in a default manner, caches on the collection side based on configuration that allows you to determine your own risk level and then manager retries in the storage driver or at the cache controller level. If you’re caching on the collector and the collector dies, then you’ve lost the whole batch of messages. Then you have to invent some way of persisting the messages to disk until they been committed to the db and removing them afterwards. We originally talked about implementing a batching layer in the storage driver, but dragondm pointed out that the message queue is already hanging on to the messages and ensuring delivery, so it’s better to not reinvent that piece of the pipeline. This is a huge motivating factor for pursuing batching in oslo in my opinion.
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Fri, Dec 20 2013, Herndon, John Luke wrote: I think this will work. I was considering putting in a timeout so the broker would not send off all of the messages immediately, and implement using blocking calls. If the consumer consumes faster than the publishers are publishing, this just becomes single-notification batches. So it may be beneficial to wait for more messages to arrive before sending off the batch. If the batch is full before the timeout is reached, then the batch would be sent off. I don't think you want to wait for other messages if you only picked on, event with a timeout. It's better to record this one right away; while you do that messages will potentially queue up in queue so on your next call you'll pick more than one anyway. Otherwise, yeah that should work fine. -- Julien Danjou # Free Software hacker # independent consultant # http://julien.danjou.info signature.asc Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 12/20/2013 04:43 PM, Julien Danjou wrote: On Fri, Dec 20 2013, Herndon, John Luke wrote: I think there is probably a tolerance for duplicates but you’re right, missing a notification is unacceptable. Can anyone weigh in on how big of a deal duplicates are for meters? Duplicates aren’t really unique to the batching approach, though. If a consumer dies after it’s inserted a message into the data store but before the message is acked, the message will be requeued and handled by another consumer resulting in a duplicate. Duplicates can be a problem for metering, as if you see twice the same event it's possible you will think it happened twice. As for event storage, it won't be a problem if you use a good storage driver that can have unique constraint; you'll just drop it and log the fact that this should not have happened, or something like that. The above brings up a point related to the implementation of the existing SQL driver code that will need to be re-thought with the introduction of batch notification processing. Currently, the SQL driver's record_events() method [1] is written in a way that forces a new INSERT transaction for every record supplied to the method. If the record_events() method is called with 10K events, then 10K BEGIN; INSERT ...; COMMIT; transactions are executed against the server. Suffice to say, this isn't efficient. :) Ostensibly, from looking at the code, the reason that this approach was taken was to allow for the collection of duplicate event IDs, and return those duplicate event IDs to the caller. Because of this code: for event_model in event_models: event = None try: with session.begin(): event = self._record_event(session, event_model) except dbexc.DBDuplicateEntry: problem_events.append((api_models.Event.DUPLICATE, event_model)) The session object will be commit()'d after the session.begin() context manager exits, which will cause the aforementioned BEGIN; INSERT; COMMIT; transaction to be executed against the server for each event record. If we want to actually take advantage of the performance benefits of batching notification messages, the above code will need to be rewritten so that a single transaction is executed against the database for the entire batch of events. Best, -jay [1] https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L932 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
Hi Folks, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I¹d like to get feedback from the community about this feature, and how we are planning to implement it. Here is what I’m currently thinking: 1) This seems to fit well into oslo.messaging - batching may be a feature that other projects will find useful. After reviewing the changes that sileht has been working on in oslo.messaging, I think the right way to start off is to create a new executor that builds up a batch of notifications, and sends the batch to the dispatcher. We’d also add a timeout, so if a certain amount of time passes and the batch isn’t filled up, the notifications will be dispatched anyway. I’ve started a blueprint for this change and am filling in the details as I go along [1]. 2) In ceilometer, initialize the notification listener with the batch executor instead of the eventlet executor (this should probably be configurable)[2]. We can then send the entire batch of notifications to the storage driver to be processed as events, while maintaining the current method for converting notifications into samples. 3) Error handling becomes more difficult. The executor needs to know if any of the notifications should be requeued. I think the right way to solve this is to return a list of notifications to requeue from the handler. Any better ideas? Is this the right approach to take? I¹m not an oslo.messaging expert, so if there is a proper way to implement this change, I¹m all ears! Thanks, happy holidays! -john 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing 1: https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev