Re: [openstack-dev] [Ceilometer] Vertica Storage Driver Testing
On 1/2/14, 7:06 PM, Sean Dague s...@dague.net wrote: On 01/02/2014 08:36 PM, Robert Collins wrote: On 3 January 2014 14:34, Robert Collins robe...@robertcollins.net wrote: On 3 January 2014 12:40, Herndon, John Luke john.hern...@hp.com wrote: On 1/2/14, 4:27 PM, Clint Byrum cl...@fewbar.com wrote: I don¹t think it would be that hard to get the review or gate jobs to use a real vertica instance, actually. Who do I talk to about that? http://ci.openstack.org/third_party.html Oh, if you meant setting up a gate variant to use vertica community edition - I'd run it past the ceilometer folk and then just submit patches to devstack, devstack-gate and infra/config to do it. devstack - code for setting up a real vertica devstack-gate - handles passing the right flags to devstack for the configuration scenarios we test against infra/config - has the jenkins job builder definitions to define the jobs I think general policy (thus far) has been that we're not going to put non Open Source software into upstream gate jobs. So you really should approach this via 3rd party testing instead. The DB2 folks are approaching it that way, for that reason. Ok, that makes sense, but tbh, setting up 3rd party testing is going to be as much or more work than writing the driver. Given schedule constraints, it probably isn¹t feasible right now. I think for starters, I will write some unit tests that ensure that changes to the storage interface don¹t break the driver, and will work on a 3rd party testing strategy over time. Thanks! -john -Sean -- Sean Dague Samsung Research America s...@dague.net / sean.da...@samsung.com http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 12/20/13, 11:57 PM, Jay Pipes jaypi...@gmail.com wrote: On 12/20/2013 04:43 PM, Julien Danjou wrote: On Fri, Dec 20 2013, Herndon, John Luke wrote: I think there is probably a tolerance for duplicates but you¹re right, missing a notification is unacceptable. Can anyone weigh in on how big of a deal duplicates are for meters? Duplicates aren¹t really unique to the batching approach, though. If a consumer dies after it¹s inserted a message into the data store but before the message is acked, the message will be requeued and handled by another consumer resulting in a duplicate. Duplicates can be a problem for metering, as if you see twice the same event it's possible you will think it happened twice. As for event storage, it won't be a problem if you use a good storage driver that can have unique constraint; you'll just drop it and log the fact that this should not have happened, or something like that. The above brings up a point related to the implementation of the existing SQL driver code that will need to be re-thought with the introduction of batch notification processing. Currently, the SQL driver's record_events() method [1] is written in a way that forces a new INSERT transaction for every record supplied to the method. If the record_events() method is called with 10K events, then 10K BEGIN; INSERT ...; COMMIT; transactions are executed against the server. Suffice to say, this isn't efficient. :) Ostensibly, from looking at the code, the reason that this approach was taken was to allow for the collection of duplicate event IDs, and return those duplicate event IDs to the caller. Because of this code: for event_model in event_models: event = None try: with session.begin(): event = self._record_event(session, event_model) except dbexc.DBDuplicateEntry: problem_events.append((api_models.Event.DUPLICATE, event_model)) The session object will be commit()'d after the session.begin() context manager exits, which will cause the aforementioned BEGIN; INSERT; COMMIT; transaction to be executed against the server for each event record. If we want to actually take advantage of the performance benefits of batching notification messages, the above code will need to be rewritten so that a single transaction is executed against the database for the entire batch of events. Yeah, this makes sense. Working on this driver is definitely on the to-do list (we also need to cache the event an trait types so several queries to the db are not incurred for each event). In the above code, we still have to deal with the dbduplicate error, but it gets much harder. The options I can think of are: 1) comb through the batch of events, remove the duplicate and try again or 2) allow the duplicate to be inserted and deal with it later. -john Best, -jay [1] https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/imp l_sqlalchemy.py#L932 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On 1/2/14, 11:36 AM, Gordon Sim g...@redhat.com wrote: On 12/20/2013 09:26 PM, Herndon, John Luke wrote: On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote: On 12/20/2013 05:27 PM, Herndon, John Luke wrote: Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let¹s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? [...] (2) What would you want the broker to do with the failed messages? What sort of things might fail? Is it related to the message content itself? Or is it failures suspected to be of a temporal nature? There will be situations where the message can¹t be parsed, and those messages can¹t just be thrown away. My current thought is that ceilometer could provide some sort of mechanism for sending messages that are invalid to an external data store (like a file, or a different topic on the amqp server) where a living, breathing human can look at them and try to parse out any meaningful information. Right, in those cases simply requeueing probably is not the right thing and you really want it dead-lettered in some way. I guess the first question is whether that is part of the notification systems function, or if it is done by the application itself (e.g. by storing it or republishing it). If it is the latter you may not need any explicit negative acknowledgement. Exactly, I¹m thinking this is something we¹d build into ceilometer and not oslo, since ceilometer is where the event parsing knowledge lives. From an oslo point of view, the message would be 'acked¹. Other errors might be ³database not available², in which case re-queing the message is probably the right way to go. That does mean however that the backlog of messages starts to grow on the broker, so some scheme for dealing with this if the database outage goes on for a bit is probably important. It also means that the messages will keep being retried without any 'backoff' waiting for the database to be restored which could increase the load. This is a problem we already have :( https://github.com/openstack/ceilometer/blob/master/ceilometer/notification .py#L156-L158 Since notifications cannot be lost, overflow needs to be detected and the messages need to be saved. I¹m thinking the database being down is a rare occurrence that will be worthy of waking someone up in the middle of the night. One possible solution: flip the collector into an emergency mode and save notifications to disc until the issue is resolved. Once the db is up and running, the collector inserts all of these saved messages (as one big batch!). Thoughts? I¹m not sure I understand what you are saying about retrying without a backoff. Can you explain? -john ___ OpenStack-dev mailing l OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer] Vertica Storage Driver Testing
Hi, I¹m working on adding a vertica (www.vertica.com) storage driver to ceilometer. I would love to get this driver into upstream. However, I¹ve run into a bit of a snag with the tests. It looks like all of the existing storage drivers have ³in-memory² versions that are used for unit tests. Vertica does not have an in-memory implementation, and is not trivial to set-up. Given this constraint, I don¹t think it will be possible to run unit tests ³out-of-the-box² against a real vertica database. Vertica is mostly sql compliant, so I could use a sqlite or h2 backend to test the query parts of the driver. Data loading can¹t be done with sqlite, and will probably need to be tested with mocks. Is this an acceptable approach for unit tests, or do the tests absolutely need to run against the database under test? Thanks! -john smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] Vertica Storage Driver Testing
On 1/2/14, 4:27 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Herndon, John Luke's message of 2014-01-02 15:16:26 -0800: Hi, I¹m working on adding a vertica (www.vertica.com) storage driver to ceilometer. I would love to get this driver into upstream. However, I¹ve run into a bit of a snag with the tests. It looks like all of the existing storage drivers have ³in-memory² versions that are used for unit tests. Vertica does not have an in-memory implementation, and is not trivial to set-up. Given this constraint, I don¹t think it will be possible to run unit tests ³out-of-the-box² against a real vertica database. Well arguably those other implementations aren't really running against a real database either so I don't see a problem with this. Vertica is mostly sql compliant, so I could use a sqlite or h2 backend to test the query parts of the driver. Data loading can¹t be done with sqlite, and will probably need to be tested with mocks. Is this an acceptable approach for unit tests, or do the tests absolutely need to run against the database under test? A fake Vertica or mocking it out seems like a good idea. I'm not deeply involved with Ceilometer, but in general I think it is preferable to test only the _code_ in unit tests. However, it may be a good idea to adopt an approach similar to Nova's approach and require that a 3rd party run Vertica integration tests in the gate. I don’t think it would be that hard to get the review or gate jobs to use a real vertica instance, actually. Who do I talk to about that? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
Hi Nadya, Yep, that’s right, the notifications stick around on the server until they are acknowledged so there is extra overhead involved. I only have experience with rabbitmq, so I can’t speak for other transports, but we have used this strategy internally for other purposes, and have reached 10k messages/second on a single consumer using batch message consumption (i.e., consume N messages, process them, then ack all N at once). We’ve found that being able to acknowledge the entire batch of messages at a time leads to a huge performance increase. This is another motivating factor for moving towards batches. But to your point, making this configurable is the right way to go just in case other transports don’t react as well. Thanks, -john From: Nadya Privalova nprival...@mirantis.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: Fri, 20 Dec 2013 15:25:55 +0400 To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches Hi John, As for me your ideas look very interesting. As I understood notification messages will be kept in MQ for some time (during batch-basket is being filled), right? I'm concerned about the additional load that will be on MQ (Rabbit). Thanks, Nadya On Fri, Dec 20, 2013 at 3:31 AM, Herndon, John Luke john.hern...@hp.com wrote: Hi Folks, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I¹d like to get feedback from the community about this feature, and how we are planning to implement it. Here is what I’m currently thinking: 1) This seems to fit well into oslo.messaging - batching may be a feature that other projects will find useful. After reviewing the changes that sileht has been working on in oslo.messaging, I think the right way to start off is to create a new executor that builds up a batch of notifications, and sends the batch to the dispatcher. We’d also add a timeout, so if a certain amount of time passes and the batch isn’t filled up, the notifications will be dispatched anyway. I’ve started a blueprint for this change and am filling in the details as I go along [1]. 2) In ceilometer, initialize the notification listener with the batch executor instead of the eventlet executor (this should probably be configurable)[2]. We can then send the entire batch of notifications to the storage driver to be processed as events, while maintaining the current method for converting notifications into samples. 3) Error handling becomes more difficult. The executor needs to know if any of the notifications should be requeued. I think the right way to solve this is to return a list of notifications to requeue from the handler. Any better ideas? Is this the right approach to take? I¹m not an oslo.messaging expert, so if there is a proper way to implement this change, I¹m all ears! Thanks, happy holidays! -john 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing 1: https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 8:10 AM, Doug Hellmann doug.hellm...@dreamhost.com wrote: On Thu, Dec 19, 2013 at 6:31 PM, Herndon, John Luke john.hern...@hp.com wrote: Hi Folks, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I¹d like to get feedback from the community about this feature, and how we are planning to implement it. Here is what I’m currently thinking: 1) This seems to fit well into oslo.messaging - batching may be a feature that other projects will find useful. After reviewing the changes that sileht has been working on in oslo.messaging, I think the right way to start off is to create a new executor that builds up a batch of notifications, and sends the batch to the dispatcher. We’d also add a timeout, so if a certain amount of time passes and the batch isn’t filled up, the notifications will be dispatched anyway. I’ve started a blueprint for this change and am filling in the details as I go along [1]. IIRC, the executor is meant to differentiate between threading, eventlet, other async implementations, or other methods for dealing with the I/O. It might be better to implement the batching at the dispatcher level instead. That way no matter what I/O processing is in place, the batching will occur. I thought about doing it in the dispatcher. One problem I see is handling message acks. It looks like the current executors are built around single messages andre-queueing single messages if problems occur. If we build up a batch in the dispatcher, either the executor has to wait for the whole batch to be committed (which wouldn’t work in the case of the blocking executor, or would leave a lot of green threads hanging around in the case of the eventlet executor), or the executor has to be modified to allow acking to be handled out of band. So, I was thinking it would be cleaner to write a new executor that is responsible for acking/requeueing the entire batch. Maybe I’m missing something? 2) In ceilometer, initialize the notification listener with the batch executor instead of the eventlet executor (this should probably be configurable)[2]. We can then send the entire batch of notifications to the storage driver to be processed as events, while maintaining the current method for converting notifications into samples. 3) Error handling becomes more difficult. The executor needs to know if any of the notifications should be requeued. I think the right way to solve this is to return a list of notifications to requeue from the handler. Any better ideas? Which handler do you mean? Ah, sorry - handler is whichever method is registered to receive the batch from the dispatcher. In ceilometer’s case, this would be process_notifications I think. Doug Is this the right approach to take? I¹m not an oslo.messaging expert, so if there is a proper way to implement this change, I¹m all ears! Thanks, happy holidays! -john 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing 1: https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev - John Herndon HP Cloud john.hern...@hp.com smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info wrote: On Thu, Dec 19 2013, Herndon, John Luke wrote: Hi John, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I think that is overall a good idea. And in my mind it could also a bigger consequences that you would think. When we will start using notifications instead of RPC calls for sending the samples, we may be able to leverage that too. Cool, glad to hear it! Anyway, my main concern here is that I am not very enthusiast about using the executor to do that. I wonder if there is not a way to ask the broker to get as many as message as it has up to a limit? You would have 100 messages waiting in the notifications.info queue, and you would be able to tell to oslo.messaging that you want to read up to 10 messages at a time. If the underlying protocol (e.g. AMQP) can support that too, it would be more efficient too. Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? -- Julien Danjou /* Free Software hacker * independent consultant http://julien.danjou.info */ - John Herndon HP Cloud john.hern...@hp.com smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote: On Fri, Dec 20 2013, Herndon, John Luke wrote: Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? Yeah, it definitely needs to change the messaging API a bit to handle such a case. But in the end that will be a good thing to support such a case, it being natively supported by the broker or not. For brokers where it's not possible, it may be simple enough to have a get_one_notification_nb() method that would either return a notification or None if there's none to read, and would that consequently have to be _non-blocking_. So if the transport is smart we write: # Return up to max_number_of_notifications_to_read notifications = transport.get_notificatations(conf.max_number_of_notifications_to_read) storage.record(notifications) Otherwise we do: for i in range(conf.max_number_of_notifications_to_read): notification = transport.get_one_notification_nb(): if notification: notifications.append(notification) else: break storage.record(notifications) So it's just about having the right primitive in oslo.messaging, we can then build on top of that wherever that is. I think this will work. I was considering putting in a timeout so the broker would not send off all of the messages immediately, and implement using blocking calls. If the consumer consumes faster than the publishers are publishing, this just becomes single-notification batches. So it may be beneficial to wait for more messages to arrive before sending off the batch. If the batch is full before the timeout is reached, then the batch would be sent off. -- Julien Danjou /* Free Software hacker * independent consultant http://julien.danjou.info */ - John Herndon HP Cloud john.hern...@hp.com smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote: On 12/20/2013 05:27 PM, Herndon, John Luke wrote: On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info wrote: Anyway, my main concern here is that I am not very enthusiast about using the executor to do that. I wonder if there is not a way to ask the broker to get as many as message as it has up to a limit? You would have 100 messages waiting in the notifications.info queue, and you would be able to tell to oslo.messaging that you want to read up to 10 messages at a time. If the underlying protocol (e.g. AMQP) can support that too, it would be more efficient too. Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. AMQP (in all it's versions) allows for a subscription with a configurable amount of 'prefetch', which means the broker can send lots of messages without waiting for the client to request them one at a time. That's not quite the same as the batching I think you are looking for, but it does allow the broker to do its own batching. My guess is the rabbit driver is already using basic.consume rather than basic.get anyway(?), so the broker is free to batch as it sees fit. (I haven't actually dug into the kombu code to verify that however, perhaps someone else here can confirm?) Yeah, that should help out the performance a bit, but we will still need to work out the batching logic. I think basic.consume is likely the best way to go, I think it will be straight forward to implement the timeout mechanism I’m looking for in this case. Thanks for the tip :). However you still need the client to have some way of batching up the messages and then processing them together. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? I've have some related questions, that I haven't yet satisfactorily answered yet. The extra context here may be useful in doing so. (1) What are the expectations around message delivery guarantees for insertion into a store? I.e. if there is a failure, is it ok to get duplicate entries for notifications? (I'm assuming losing notifications is not acceptable). I think there is probably a tolerance for duplicates but you’re right, missing a notification is unacceptable. Can anyone weigh in on how big of a deal duplicates are for meters? Duplicates aren’t really unique to the batching approach, though. If a consumer dies after it’s inserted a message into the data store but before the message is acked, the message will be requeued and handled by another consumer resulting in a duplicate. (2) What would you want the broker to do with the failed messages? What sort of things might fail? Is it related to the message content itself? Or is it failures suspected to be of a temporal nature? There will be situations where the message can’t be parsed, and those messages can’t just be thrown away. My current thought is that ceilometer could provide some sort of mechanism for sending messages that are invalid to an external data store (like a file, or a different topic on the amqp server) where a living, breathing human can look at them and try to parse out any meaningful information. Other errors might be “database not available”, in which case re-queing the message is probably the right way to go. If the consumer process crashes, all of the unasked messages need to be requeued and handled by a different consumer. Any other error cases? (3) How important is ordering ? If a failure causes some notifications to be inserted out of order is that a problem at all? From an event point of view, I don’t think this is a problem since the events have a generated timestamp. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev - John Herndon HP Cloud john.hern...@hp.com smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
On Dec 20, 2013, at 1:12 PM, Dan Dyer dan.dye...@gmail.com wrote: On 12/20/2013 11:18 AM, Herndon, John Luke wrote: On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote: On Fri, Dec 20 2013, Herndon, John Luke wrote: Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing more than a single message at a time, but we could definitely have the broker store up the batch before sending it along. Other protocols may support bulk consumption. My one concern with this approach is error handling. Currently the executors treat each notification individually. So let’s say the broker hands 100 messages at a time. When client is done processing the messages, the broker needs to know if message 25 had an error or not. We would somehow need to communicate back to the broker which messages failed. I think this may take some refactoring of executors/dispatchers. What do you think? Yeah, it definitely needs to change the messaging API a bit to handle such a case. But in the end that will be a good thing to support such a case, it being natively supported by the broker or not. For brokers where it's not possible, it may be simple enough to have a get_one_notification_nb() method that would either return a notification or None if there's none to read, and would that consequently have to be _non-blocking_. So if the transport is smart we write: # Return up to max_number_of_notifications_to_read notifications = transport.get_notificatations(conf.max_number_of_notifications_to_read) storage.record(notifications) Otherwise we do: for i in range(conf.max_number_of_notifications_to_read): notification = transport.get_one_notification_nb(): if notification: notifications.append(notification) else: break storage.record(notifications) So it's just about having the right primitive in oslo.messaging, we can then build on top of that wherever that is. I think this will work. I was considering putting in a timeout so the broker would not send off all of the messages immediately, and implement using blocking calls. If the consumer consumes faster than the publishers are publishing, this just becomes single-notification batches. So it may be beneficial to wait for more messages to arrive before sending off the batch. If the batch is full before the timeout is reached, then the batch would be sent off. -- Julien Danjou /* Free Software hacker * independent consultant http://julien.danjou.info */ - John Herndon HP Cloud john.hern...@hp.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev A couple of things that I think need to be emphasized here: 1. the mechanism needs to be configurable, so if you are more worried about reliability than performance you would be able to turn off bulk loading Definitely will be configurable, but I don’t think batching is going to be any less reliable than individual inserts. Can you expand on what is concerning? 2. the caching size should also be configurable, so that we can limit your exposure to lost messages Agreed. 3. while you can have the message queue hold the messages until you acknowledge them, it seems like this adds a lot of complexity to the interaction. you will need to be able to propagate this information all the way back from the storage driver. This is actually a pretty standard use case for AMQP, we have done it several times on in-house projects. The basic.ack call lets you acknowledge a whole batch of messages at once. Yes, we do have to figure out how to propagate the error cases back up to the broker, but I don’t think it will be so complicated that it’s not worth doing. 4. any integration that is depdendent on a specific configuration on the rabbit server is brittle, since we have seen a lot of variation between services on this. I would prefer to control the behavior on the collection side Hm, I don’t understand…? So in general, I would prefer a mechanism that pulls the data in a default manner, caches on the collection side based on configuration that allows you to determine your own risk level and then manager retries in the storage driver or at the cache controller level. If you’re caching on the collector and the collector dies, then you’ve lost the whole batch of messages. Then you have to invent some way of persisting the messages to disk until they been committed to the db and removing them afterwards. We originally talked about implementing a batching layer in the storage driver, but dragondm pointed out that the message queue is already hanging on to the messages and ensuring delivery, so it’s better to not reinvent that piece of the pipeline. This is a huge motivating factor for pursuing batching in oslo in my opinion
[openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches
Hi Folks, The Rackspace-HP team has been putting a lot of effort into performance testing event collection in the ceilometer storage drivers[0]. Based on some results of this testing, we would like to support batch consumption of notifications, as it will greatly improve insertion performance. Batch consumption in this case means waiting for a certain number of notifications to arrive before sending to the storage driver. I¹d like to get feedback from the community about this feature, and how we are planning to implement it. Here is what I’m currently thinking: 1) This seems to fit well into oslo.messaging - batching may be a feature that other projects will find useful. After reviewing the changes that sileht has been working on in oslo.messaging, I think the right way to start off is to create a new executor that builds up a batch of notifications, and sends the batch to the dispatcher. We’d also add a timeout, so if a certain amount of time passes and the batch isn’t filled up, the notifications will be dispatched anyway. I’ve started a blueprint for this change and am filling in the details as I go along [1]. 2) In ceilometer, initialize the notification listener with the batch executor instead of the eventlet executor (this should probably be configurable)[2]. We can then send the entire batch of notifications to the storage driver to be processed as events, while maintaining the current method for converting notifications into samples. 3) Error handling becomes more difficult. The executor needs to know if any of the notifications should be requeued. I think the right way to solve this is to return a list of notifications to requeue from the handler. Any better ideas? Is this the right approach to take? I¹m not an oslo.messaging expert, so if there is a proper way to implement this change, I¹m all ears! Thanks, happy holidays! -john 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing 1: https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer] Nomination of Sandy Walsh to core team
Hi There! I¹m not 100% sure what the process is around electing an individual to the core team (i.e., can a non-core person nominate someone?). However, I believe the ceilometer core team could use a member who is more active in the development of the event pipeline. A core developer in this area will not only speed up review times for event patches, but will also help keep new contributions focused on the overall eventing vision. To that end, I would like to nominate Sandy Walsh from Rackspace to ceilometer-core. Sandy is one of the original authors of StackTach, and spearheaded the original stacktach-ceilometer integration. He has been instrumental in many of my codes reviews, and has contributed much of the existing event storage and querying code. Thanks, John Herndon Software Engineer HP Cloud smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer] Alembic or SA Migrate (again)
Hi Folks! Sorry to dig up a really old topic[1][2], but I¹d like to know the status of ceilometer db migrations. Rehash: I¹d like to submit two branches to modify the Event and Trait tables. If I were to do that now, I would need to write SQLAlchemy scripts to do the database migration [3]. Since the unit tests use db migrations to build up the db schema, there’s currently no way to get the unit tests to run if your new code uses an alembic migration and needs to alter columns�. A couple of questions: 1) What is the progress of creating the schema from the models for unit tests? 2) What is the time frame for requiring alembic migrations? 3) Should I push these branches up now, or wait and use an alembic migration? 4) Is there anything I can do to help with 1 or 2? Thanks, -john 1: http://lists.openstack.org/pipermail/openstack-dev/2013-August/014214.html 2: http://lists.openstack.org/pipermail/openstack-dev/2013-September/014593.ht ml 3: https://bitbucket.org/zzzeek/alembic/issue/21/column-renames-not-supported- on-sqlite smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Alembic or SA Migrate (again)
Hi Folks! Sorry to dig up a really old topic, but I¹d like to know the status of ceilometer db migrations. I¹d like to submit two branches to modify the Event and Trait tables. If I were to do that now, I would need to write SQLAlchemy scripts to do the database migration - (background: https://bitbucket.org/zzzeek/alembic/issue/21/column-renames-not-supported- on-sqlite). Since the unit tests use db migrations to build up the db schema, there¹s currently no way to get the unit tests to run if your new code uses an alembic migration and needs to alter columns which mine does :( A couple of questions: 1) What is the progress of creating the schema from the models for unit tests? 2) What is the time frame for requiring alembic migrations? 3) Should I push these branches up now, or wait and use an alembic migration? 4) Is there anything I can do to help with 1 or 2? Thanks! -john Related threads: http://lists.openstack.org/pipermail/openstack-dev/2013-August/014214.html http://lists.openstack.org/pipermail/openstack-dev/2013-September/014593.ht ml smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] Need help with Alembic...
Jay - It looks there is an error in the migration script that causes it to abort: AttributeError: 'ForeignKeyConstraint' object has no attribute 'drop' My guess is the migration runs on the first test, creates event types table fine, but exits with the above error, so migration is not complete. Thus every subsequent test tries to migrate the db, and notices that event types already exists. -john On 8/26/13 1:15 PM, Jay Pipes jaypi...@gmail.com wrote: I just noticed that every single test case for SQL-driver storage is executing every single migration upgrade before every single test case run: https://github.com/openstack/ceilometer/blob/master/ceilometer/tests/db.py #L46 https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/imp l_sqlalchemy.py#L153 instead of simply creating a new database schema from the models in the current source code base using a call to sqlalchemy.MetaData.create_all(). This results in re-running migrations over and over again, instead of having dedicated migration tests that would test each migration individually, as is the case in projects like Glance... Is this intentional? Best, -jay On 08/26/2013 02:59 PM, Sandy Walsh wrote: I'm getting the same problem with a different migration (mine is complaining that a column already exists) http://paste.openstack.org/show/44512/ I've compared it to the other migrations and it seems fine. -S On 08/26/2013 02:34 PM, Jay Pipes wrote: Hey all, I'm trying to figure out what is going wrong with my code for this patch: https://review.openstack.org/41316 I had previously added a sqlalchemy-migrate migration script to add an event_type table, and had that working, but then was asked to instead use Alembic for migrations. So, I removed the sqlalchemy-migrate migration file and added an Alembic migration [1]. Unfortunately, I am getting the following error when running tests: OperationalError: (OperationalError) table event_type already exists u'\nCREATE TABLE event_type (\n\tid INTEGER NOT NULL, \n\tdesc VARCHAR(255), \n\tPRIMARY KEY (id), \n\tUNIQUE (desc)\n)\n\n' () The migration adds the event_type table. I've seen this error occur before when using SQLite due to SQLite's ALTER TABLE statement not allowing the rename of a column. In the sqlalchemy-migrate migration, I had a specialized SQLite migration upgrade [2] and downgrade [3] script, but I'm not sure how I am supposed to handle this in Alembic. Could someone help me out? Thanks, -jay [1] https://review.openstack.org/#/c/41316/16/ceilometer/storage/sqlalchemy/ alembic/versions/49036dfd_add_event_types.py [2] https://review.openstack.org/#/c/41316/14/ceilometer/storage/sqlalchemy/ migrate_repo/versions/013_sqlite_upgrade.sql [3] https://review.openstack.org/#/c/41316/14/ceilometer/storage/sqlalchemy/ migrate_repo/versions/013_sqlite_downgrade.sql ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer] Adding TraitType Storage Model
Hi - After discussion with Jay Pipes about this bug (https://bugs.launchpad.net/ceilometer/+bug/1211015), I'd like to split out a new TraitType table in the storage layer, and remove the UniqueName table. A TraitType is the name of the trait and the data type (i.e., string, int, float). I'd like to get some input on this change - maybe it's over kill? Here is my rationale for adding this table: 1) The current query to return all trait names is slow, as stated in the bug report. 3) All instances of event X will have the same traits, and all instances of a trait have the same trait type name and data type. I think it is cleaner to model this relationship in the db with an explicit TraitType table. 2) The api needs a model for trait types in order to fulfill the /v2/event_types/Foo/trait_type query. This call will return the set of trait names and data types, but no trait data. Related patches: sqlalchemy layer: https://review.openstack.org/#/c/42407/ (not sure the migration is correct) storage layer: https://review.openstack.org/#/c/41596/ Thanks! -john smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer] Nova_tests failing in jenkins
Hi - The nova_tests are failing for a couple of different Ceilometer reviews, due to 'module' object has no attribute 'add_driver'. This review (https://review.openstack.org/#/c/41316/) had nothing to do with the nova_tests, yet they are failing. Any clue what's going on? Apologies if there is an obvious answer - I've never encountered this before. Thanks, -john smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ceilometer] Event API Access Controls
Hi Julien, On 8/5/13 2:04 AM, Julien Danjou jul...@danjou.info wrote: On Sat, Aug 03 2013, Herndon, John Luke (HPCS - Ft. Collins) wrote: Hi John, Hello, I'm currently implementing the event api blueprint[0], and am wondering what access controls we should impose on the event api. The purpose of the blueprint is to provide a StackTach equivalent in the ceilometer api. I believe that StackTach is used as an internal tool which end with no access to end users. Given that the event api is targeted at administrators, I am currently thinking that it should be limited to admin users only. However, I wanted to ask for input on this topic. Any arguments for opening it up so users can look at events for their resources? Any arguments for not doing so? You should definitely use the policy system we has in Ceilometer to check that the user is authenticated and has admin privileges. We already have such a mechanism in ceilometer.api.acl. I don't see any point to expose raw operator system data to the users. That could even be dangerous security wise. This plans sounds good to me. We can enable/disable the event api for users, but is there a way to restrict a user to viewing only his/her events using the policy system? Or do we not need to do that? -john -- Julien Danjou // Free Software hacker / freelance consultant // http://julien.danjou.info smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer] Event API Access Controls
Hello, I'm currently implementing the event api blueprint[0], and am wondering what access controls we should impose on the event api. The purpose of the blueprint is to provide a StackTach equivalent in the ceilometer api. I believe that StackTach is used as an internal tool which end with no access to end users. Given that the event api is targeted at administrators, I am currently thinking that it should be limited to admin users only. However, I wanted to ask for input on this topic. Any arguments for opening it up so users can look at events for their resources? Any arguments for not doing so? PS -I'm new to the ceilometer project, so let me introduce myself. My name is John Herndon, and I work for HP. I've been freed up from a different project and will be working on ceilometer. Thanks, looking forward to working with everyone! -john 0: https://blueprints.launchpad.net/ceilometer/+spec/specify-event-api smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev