Re: [openstack-dev] [Ceilometer] Vertica Storage Driver Testing

2014-01-03 Thread Herndon, John Luke


On 1/2/14, 7:06 PM, Sean Dague s...@dague.net wrote:

On 01/02/2014 08:36 PM, Robert Collins wrote:
 On 3 January 2014 14:34, Robert Collins robe...@robertcollins.net
wrote:
 On 3 January 2014 12:40, Herndon, John Luke john.hern...@hp.com
wrote:


 On 1/2/14, 4:27 PM, Clint Byrum cl...@fewbar.com wrote:



 I don¹t think it would be that hard to get the review or gate jobs to
use
 a real vertica instance, actually. Who do I talk to about that?

 http://ci.openstack.org/third_party.html

 Oh, if you meant setting up a gate variant to use vertica community
 edition - I'd run it past the ceilometer folk and then just submit
 patches to devstack, devstack-gate and infra/config to do it.

 devstack - code for setting up a real vertica
 devstack-gate - handles passing the right flags to devstack for the
 configuration scenarios we test against
 infra/config - has the jenkins job builder definitions to define the
jobs

I think general policy (thus far) has been that we're not going to put
non Open Source software into upstream gate jobs.

So you really should approach this via 3rd party testing instead. The
DB2 folks are approaching it that way, for that reason.

Ok, that makes sense, but tbh, setting up 3rd party testing is going to be
as much or more work than writing the driver. Given schedule constraints,
it probably isn¹t feasible right now. I think for starters, I will write
some unit tests that ensure that changes to the storage interface don¹t
break the driver, and will work on a 3rd party testing strategy over time.

Thanks!
-john


   -Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2014-01-02 Thread Herndon, John Luke


On 12/20/13, 11:57 PM, Jay Pipes jaypi...@gmail.com wrote:

On 12/20/2013 04:43 PM, Julien Danjou wrote:
 On Fri, Dec 20 2013, Herndon, John Luke wrote:

 I think there is probably a tolerance for duplicates but you¹re right,
 missing a notification is unacceptable. Can anyone weigh in on how big
of a
 deal duplicates are for meters? Duplicates aren¹t really unique to the
 batching approach, though. If a consumer dies after it¹s inserted a
message
 into the data store but before the message is acked, the message will
be
 requeued and handled by another consumer resulting in a duplicate.

 Duplicates can be a problem for metering, as if you see twice the same
 event it's possible you will think it happened twice.

 As for event storage, it won't be a problem if you use a good storage
 driver that can have unique constraint; you'll just drop it and log the
 fact that this should not have happened, or something like that.

The above brings up a point related to the implementation of the
existing SQL driver code that will need to be re-thought with the
introduction of batch notification processing.

Currently, the SQL driver's record_events() method [1] is written in a
way that forces a new INSERT transaction for every record supplied to
the method. If the record_events() method is called with 10K events,
then 10K BEGIN; INSERT ...; COMMIT; transactions are executed against
the server.

Suffice to say, this isn't efficient. :)

Ostensibly, from looking at the code, the reason that this approach was
taken was to allow for the collection of duplicate event IDs, and return
those duplicate event IDs to the caller.

Because of this code:

 for event_model in event_models:
 event = None
 try:
 with session.begin():
 event = self._record_event(session, event_model)
 except dbexc.DBDuplicateEntry:
 problem_events.append((api_models.Event.DUPLICATE,
event_model))
The session object will be commit()'d after the session.begin() context
manager exits, which will cause the aforementioned BEGIN; INSERT;
COMMIT; transaction to be executed against the server for each event
record.

If we want to actually take advantage of the performance benefits of
batching notification messages, the above code will need to be rewritten
so that a single transaction is executed against the database for the
entire batch of events.

Yeah, this makes sense. Working on this driver is definitely on the to-do
list (we also need to cache the event an trait types so several queries to
the db are not incurred for each event). In the above code, we still have
to deal with the dbduplicate error, but it gets much harder. The options I
can think of are: 1) comb through the batch of events, remove the
duplicate and try again or 2) allow the duplicate to be inserted and deal
with it later. 

-john


Best,
-jay

[1] 
https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/imp
l_sqlalchemy.py#L932


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2014-01-02 Thread Herndon, John Luke


On 1/2/14, 11:36 AM, Gordon Sim g...@redhat.com wrote:

On 12/20/2013 09:26 PM, Herndon, John Luke wrote:

 On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote:

 On 12/20/2013 05:27 PM, Herndon, John Luke wrote:

 Other protocols may support bulk consumption. My one concern with
 this approach is error handling. Currently the executors treat
 each notification individually. So let¹s say the broker hands
 100 messages at a time. When client is done processing the
 messages, the broker needs to know if message 25 had an error or
 not. We would somehow need to communicate back to the broker
 which messages failed. I think this may take some refactoring of
 executors/dispatchers. What do you think?
[...]
 (2) What would you want the broker to do with the failed messages?
 What sort of things might fail? Is it related to the message
 content itself? Or is it failures suspected to be of a temporal
 nature?
 
 There will be situations where the message can¹t be parsed, and those
 messages can¹t just be thrown away. My current thought is that
 ceilometer could provide some sort of mechanism for sending messages
 that are invalid to an external data store (like a file, or a
 different topic on the amqp server) where a living, breathing human
 can look at them and try to parse out any meaningful information.

Right, in those cases simply requeueing probably is not the right thing
and you really want it dead-lettered in some way. I guess the first
question is whether that is part of the notification systems function,
or if it is done by the application itself (e.g. by storing it or
republishing it). If it is the latter you may not need any explicit
negative acknowledgement.

Exactly, I¹m thinking this is something we¹d build into ceilometer and not
oslo, since ceilometer is where the event parsing knowledge lives. From an
oslo point of view, the message would be 'acked¹.


 Other errors might be ³database not available², in which case
 re-queing the message is probably the right way to go.

That does mean however that the backlog of messages starts to grow on
the broker, so some scheme for dealing with this if the database outage
goes on for a bit is probably important. It also means that the messages
 
will keep being retried without any 'backoff' waiting for the database
to be restored which could increase the load.

This is a problem we already have :(
https://github.com/openstack/ceilometer/blob/master/ceilometer/notification
.py#L156-L158
Since notifications cannot be lost, overflow needs to be detected and the
messages need to be saved. I¹m thinking the database being down is a rare
occurrence that will be worthy of waking someone up in the middle of the
night. One possible solution: flip the collector into an emergency mode
and save notifications to disc until the issue is resolved. Once the db is
up and running, the collector inserts all of these saved messages (as one
big batch!). Thoughts?

I¹m not sure I understand what you are saying about retrying without a
backoff. Can you explain?

-john




___
OpenStack-dev mailing l
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer] Vertica Storage Driver Testing

2014-01-02 Thread Herndon, John Luke
Hi,

I¹m working on adding a vertica (www.vertica.com) storage driver to
ceilometer. I would love to get this driver into upstream. However, I¹ve
run into a bit of a snag with the tests. It looks like all of the existing
storage drivers have ³in-memory² versions that are used for unit tests.
Vertica does not have an in-memory implementation, and is not trivial to
set-up. Given this constraint, I don¹t think it will be possible to run
unit tests ³out-of-the-box² against a real vertica database.

Vertica is mostly sql compliant, so I could use a sqlite or h2 backend to
test the query parts of the driver. Data loading can¹t be done with
sqlite, and will probably need to be tested with mocks. Is this an
acceptable approach for unit tests, or do the tests absolutely need to run
against the database under test?

Thanks!
-john


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] Vertica Storage Driver Testing

2014-01-02 Thread Herndon, John Luke


On 1/2/14, 4:27 PM, Clint Byrum cl...@fewbar.com wrote:

Excerpts from Herndon, John Luke's message of 2014-01-02 15:16:26 -0800:
 Hi,
 
 I¹m working on adding a vertica (www.vertica.com) storage driver to
 ceilometer. I would love to get this driver into upstream. However, I¹ve
 run into a bit of a snag with the tests. It looks like all of the
existing
 storage drivers have ³in-memory² versions that are used for unit tests.
 Vertica does not have an in-memory implementation, and is not trivial to
 set-up. Given this constraint, I don¹t think it will be possible to run
 unit tests ³out-of-the-box² against a real vertica database.

Well arguably those other implementations aren't really running against
a real database either so I don't see a problem with this.

 
 Vertica is mostly sql compliant, so I could use a sqlite or h2 backend
to
 test the query parts of the driver. Data loading can¹t be done with
 sqlite, and will probably need to be tested with mocks. Is this an
 acceptable approach for unit tests, or do the tests absolutely need to
run
 against the database under test?


A fake Vertica or mocking it out seems like a good idea. I'm not deeply
involved with Ceilometer, but in general I think it is preferable to
test only the _code_ in unit tests. However, it may be a good idea to
adopt an approach similar to Nova's approach and require that a 3rd
party run Vertica integration tests in the gate.

I don’t think it would be that hard to get the review or gate jobs to use
a real vertica instance, actually. Who do I talk to about that?


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke
Hi Nadya,

Yep, that’s right, the notifications stick around on the server until they
are acknowledged so there is extra overhead involved. I only have experience
with rabbitmq, so I can’t speak for other transports, but we have used this
strategy internally for other purposes, and have reached  10k
messages/second on a single consumer using batch message consumption (i.e.,
consume N messages, process them, then ack all N at once). We’ve found that
being able to acknowledge the entire batch of messages at a time leads to a
huge performance increase. This is another motivating factor for moving
towards batches. But to your point, making this configurable is the right
way to go just in case other transports don’t react as well.

Thanks,
-john


From:  Nadya Privalova nprival...@mirantis.com
Reply-To:  OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Date:  Fri, 20 Dec 2013 15:25:55 +0400
To:  OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Subject:  Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in
Batches

Hi John,

As for me your ideas look very interesting. As I understood notification
messages will be kept in MQ for some time (during batch-basket is being
filled), right? I'm concerned about the additional load that will be on MQ
(Rabbit). 

Thanks,
Nadya


On Fri, Dec 20, 2013 at 3:31 AM, Herndon, John Luke john.hern...@hp.com
wrote:
 Hi Folks,
 
 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver.
 
 I¹d like to get feedback from the community about this feature, and how we
 are planning to implement it. Here is what I’m currently thinking:
 
 1) This seems to fit well into oslo.messaging - batching may be a feature
 that other projects will find useful. After reviewing the changes that
 sileht has been working on in oslo.messaging, I think the right way to
 start off is to create a new executor that builds up a batch of
 notifications, and sends the batch to the dispatcher. We’d also add a
 timeout, so if a certain amount of time passes and the batch isn’t filled
 up, the notifications will be dispatched anyway. I’ve started a
 blueprint for this change and am filling in the details as I go along [1].
 
 2) In ceilometer, initialize the notification listener with the batch
 executor instead of the eventlet executor (this should probably be
 configurable)[2]. We can then send the entire batch of notifications to
 the storage driver to be processed as events, while maintaining the
 current method for converting notifications into samples.
 
 3) Error handling becomes more difficult. The executor needs to know if
 any of the notifications should be requeued. I think the right way to
 solve this is to return a list of notifications to requeue from the
 handler. Any better ideas?
 
 Is this the right approach to take? I¹m not an oslo.messaging expert, so
 if there is a proper way to implement this change, I¹m all ears!
 
 Thanks, happy holidays!
 -john
 
 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
 1:
 https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages
 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___ OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 8:10 AM, Doug Hellmann doug.hellm...@dreamhost.com wrote:

 
 
 
 On Thu, Dec 19, 2013 at 6:31 PM, Herndon, John Luke john.hern...@hp.com 
 wrote:
 Hi Folks,
 
 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver.
 
 I¹d like to get feedback from the community about this feature, and how we
 are planning to implement it. Here is what I’m currently thinking:
 
 1) This seems to fit well into oslo.messaging - batching may be a feature
 that other projects will find useful. After reviewing the changes that
 sileht has been working on in oslo.messaging, I think the right way to
 start off is to create a new executor that builds up a batch of
 notifications, and sends the batch to the dispatcher. We’d also add a
 timeout, so if a certain amount of time passes and the batch isn’t filled
 up, the notifications will be dispatched anyway. I’ve started a
 blueprint for this change and am filling in the details as I go along [1].
 
 IIRC, the executor is meant to differentiate between threading, eventlet, 
 other async implementations, or other methods for dealing with the I/O. It 
 might be better to implement the batching at the dispatcher level instead. 
 That way no matter what I/O processing is in place, the batching will occur.
 

I thought about doing it in the dispatcher. One problem I see is handling 
message acks. It looks like the current executors are built around single 
messages andre-queueing single messages if problems occur. If we build up a 
batch in the dispatcher, either the executor has to wait for the whole batch to 
be committed (which wouldn’t work in the case of the blocking executor, or 
would leave a lot of green threads hanging around in the case of the eventlet 
executor), or the executor has to be modified to allow acking to be handled out 
of band. So, I was thinking it would be cleaner to write a new executor that is 
responsible for acking/requeueing the entire batch. Maybe I’m missing something?

 
 2) In ceilometer, initialize the notification listener with the batch
 executor instead of the eventlet executor (this should probably be
 configurable)[2]. We can then send the entire batch of notifications to
 the storage driver to be processed as events, while maintaining the
 current method for converting notifications into samples.
 
 3) Error handling becomes more difficult. The executor needs to know if
 any of the notifications should be requeued. I think the right way to
 solve this is to return a list of notifications to requeue from the
 handler. Any better ideas?
 
 Which handler do you mean?

Ah, sorry - handler is whichever method is registered to receive the batch from 
the dispatcher. In ceilometer’s case, this would be process_notifications I 
think.

 Doug
 
  
 
 Is this the right approach to take? I¹m not an oslo.messaging expert, so
 if there is a proper way to implement this change, I¹m all ears!
 
 Thanks, happy holidays!
 -john
 
 0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
 1:
 https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages
 2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-
John Herndon
HP Cloud
john.hern...@hp.com





smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info wrote:

 On Thu, Dec 19 2013, Herndon, John Luke wrote:
 
 Hi John,
 
 The Rackspace-HP team has been putting a lot of effort into performance
 testing event collection in the ceilometer storage drivers[0]. Based on
 some results of this testing, we would like to support batch consumption
 of notifications, as it will greatly improve insertion performance. Batch
 consumption in this case means waiting for a certain number of
 notifications to arrive before sending to the storage
 driver. 
 
 I think that is overall a good idea. And in my mind it could also a
 bigger consequences that you would think. When we will start using
 notifications instead of RPC calls for sending the samples, we may be
 able to leverage that too.
Cool, glad to hear it!

 Anyway, my main concern here is that I am not very enthusiast about
 using the executor to do that. I wonder if there is not a way to ask the
 broker to get as many as message as it has up to a limit?
 
 You would have 100 messages waiting in the notifications.info queue, and
 you would be able to tell to oslo.messaging that you want to read up to
 10 messages at a time. If the underlying protocol (e.g. AMQP) can
 support that too, it would be more efficient too.

Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing 
more than a single message at a time, but we could definitely have the broker 
store up the batch before sending it along. Other protocols may support bulk 
consumption. My one concern with this approach is error handling. Currently the 
executors treat each notification individually. So let’s say the broker hands 
100 messages at a time. When client is done processing the messages, the broker 
needs to know if message 25 had an error or not. We would somehow need to 
communicate back to the broker which messages failed. I think this may take 
some refactoring of executors/dispatchers. What do you think?

 
 -- 
 Julien Danjou
 /* Free Software hacker * independent consultant
   http://julien.danjou.info */

-
John Herndon
HP Cloud
john.hern...@hp.com





smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote:

 On Fri, Dec 20 2013, Herndon, John Luke wrote:
 
 Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing
 more than a single message at a time, but we could definitely have the
 broker store up the batch before sending it along. Other protocols may
 support bulk consumption. My one concern with this approach is error
 handling. Currently the executors treat each notification individually. So
 let’s say the broker hands 100 messages at a time. When client is done
 processing the messages, the broker needs to know if message 25 had an error
 or not. We would somehow need to communicate back to the broker which
 messages failed. I think this may take some refactoring of
 executors/dispatchers. What do you think?
 
 Yeah, it definitely needs to change the messaging API a bit to handle
 such a case. But in the end that will be a good thing to support such a
 case, it being natively supported by the broker or not.
 
 For brokers where it's not possible, it may be simple enough to have a
 get_one_notification_nb() method that would either return a
 notification or None if there's none to read, and would that
 consequently have to be _non-blocking_.
 
 So if the transport is smart we write:
 
  # Return up to max_number_of_notifications_to_read
  notifications =
  transport.get_notificatations(conf.max_number_of_notifications_to_read)
  storage.record(notifications)
 
 Otherwise we do:
 
  for i in range(conf.max_number_of_notifications_to_read):
  notification = transport.get_one_notification_nb():
  if notification:
  notifications.append(notification)
  else:
  break
   storage.record(notifications)
 
 So it's just about having the right primitive in oslo.messaging, we can
 then build on top of that wherever that is.
 

I think this will work. I was considering putting in a timeout so the broker 
would not send off all of the messages immediately, and implement using 
blocking calls. If the consumer consumes faster than the publishers are 
publishing, this just becomes single-notification batches. So it may be 
beneficial to wait for more messages to arrive before sending off the batch. If 
the batch is full before the timeout is reached, then the batch would be sent 
off.

 -- 
 Julien Danjou
 /* Free Software hacker * independent consultant
   http://julien.danjou.info */

-
John Herndon
HP Cloud
john.hern...@hp.com





smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 12:13 PM, Gordon Sim g...@redhat.com wrote:

 On 12/20/2013 05:27 PM, Herndon, John Luke wrote:
 
 On Dec 20, 2013, at 8:48 AM, Julien Danjou jul...@danjou.info
 wrote:
 Anyway, my main concern here is that I am not very enthusiast
 about using the executor to do that. I wonder if there is not a way
 to ask the broker to get as many as message as it has up to a
 limit?
 
 You would have 100 messages waiting in the notifications.info
 queue, and you would be able to tell to oslo.messaging that you
 want to read up to 10 messages at a time. If the underlying
 protocol (e.g. AMQP) can support that too, it would be more
 efficient too.
 
 Yeah, I like this idea. As far as I can tell, AMQP doesn’t support
 grabbing more than a single message at a time, but we could
 definitely have the broker store up the batch before sending it
 along.
 
 AMQP (in all it's versions) allows for a subscription with a configurable 
 amount of 'prefetch', which means the broker can send lots of messages 
 without waiting for the client to request them one at a time.
 
 That's not quite the same as the batching I think you are looking for, but it 
 does allow the broker to do its own batching. My guess is the rabbit driver 
 is already using basic.consume rather than basic.get anyway(?), so the broker 
 is free to batch as it sees fit.  (I haven't actually dug into the kombu code 
 to verify that however, perhaps someone else here can confirm?)
 
Yeah, that should help out the performance a bit, but we will still need to 
work out the batching logic. I think basic.consume is likely the best way to 
go, I think it will be straight forward to implement the timeout mechanism I’m 
looking for in this case. Thanks for the tip :).

 However you still need the client to have some way of batching up the 
 messages and then processing them together.
 
 Other protocols may support bulk consumption. My one concern
 with this approach is error handling. Currently the executors treat
 each notification individually. So let’s say the broker hands 100
 messages at a time. When client is done processing the messages, the
 broker needs to know if message 25 had an error or not. We would
 somehow need to communicate back to the broker which messages failed.
 I think this may take some refactoring of executors/dispatchers. What
 do you think?
 
 I've have some related questions, that I haven't yet satisfactorily answered 
 yet. The extra context here may be useful in doing so.
 
 (1) What are the expectations around message delivery guarantees for 
 insertion into a store? I.e. if there is a failure, is it ok to get duplicate 
 entries for notifications? (I'm assuming losing notifications is not 
 acceptable).
I think there is probably a tolerance for duplicates but you’re right, missing 
a notification is unacceptable. Can anyone weigh in on how big of a deal 
duplicates are for meters? Duplicates aren’t really unique to the batching 
approach, though. If a consumer dies after it’s inserted a message into the 
data store but before the message is acked, the message will be requeued and 
handled by another consumer resulting in a duplicate. 

 (2) What would you want the broker to do with the failed messages? What sort 
 of things might fail? Is it related to the message content itself? Or is it 
 failures suspected to be of a temporal nature?
There will be situations where the message can’t be parsed, and those messages 
can’t just be thrown away. My current thought is that ceilometer could provide 
some sort of mechanism for sending messages that are invalid to an external 
data store (like a file, or a different topic on the amqp server) where a 
living, breathing human can look at them and try to parse out any meaningful 
information. Other errors might be “database not available”, in which case 
re-queing the message is probably the right way to go. If the consumer process 
crashes, all of the unasked messages need to be requeued and handled by a 
different consumer. Any other error cases?

 (3) How important is ordering ? If a failure causes some notifications to be 
 inserted out of order is that a problem at all?
From an event point of view, I don’t think this is a problem since the events 
have a generated timestamp.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-
John Herndon
HP Cloud
john.hern...@hp.com





smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-20 Thread Herndon, John Luke

On Dec 20, 2013, at 1:12 PM, Dan Dyer dan.dye...@gmail.com wrote:

 On 12/20/2013 11:18 AM, Herndon, John Luke wrote:
 On Dec 20, 2013, at 10:47 AM, Julien Danjou jul...@danjou.info wrote:
 
 On Fri, Dec 20 2013, Herndon, John Luke wrote:
 
 Yeah, I like this idea. As far as I can tell, AMQP doesn’t support grabbing
 more than a single message at a time, but we could definitely have the
 broker store up the batch before sending it along. Other protocols may
 support bulk consumption. My one concern with this approach is error
 handling. Currently the executors treat each notification individually. So
 let’s say the broker hands 100 messages at a time. When client is done
 processing the messages, the broker needs to know if message 25 had an 
 error
 or not. We would somehow need to communicate back to the broker which
 messages failed. I think this may take some refactoring of
 executors/dispatchers. What do you think?
 Yeah, it definitely needs to change the messaging API a bit to handle
 such a case. But in the end that will be a good thing to support such a
 case, it being natively supported by the broker or not.
 
 For brokers where it's not possible, it may be simple enough to have a
 get_one_notification_nb() method that would either return a
 notification or None if there's none to read, and would that
 consequently have to be _non-blocking_.
 
 So if the transport is smart we write:
 
  # Return up to max_number_of_notifications_to_read
  notifications =
  transport.get_notificatations(conf.max_number_of_notifications_to_read)
  storage.record(notifications)
 
 Otherwise we do:
 
  for i in range(conf.max_number_of_notifications_to_read):
  notification = transport.get_one_notification_nb():
  if notification:
  notifications.append(notification)
  else:
  break
   storage.record(notifications)
 
 So it's just about having the right primitive in oslo.messaging, we can
 then build on top of that wherever that is.
 
 I think this will work. I was considering putting in a timeout so the broker 
 would not send off all of the messages immediately, and implement using 
 blocking calls. If the consumer consumes faster than the publishers are 
 publishing, this just becomes single-notification batches. So it may be 
 beneficial to wait for more messages to arrive before sending off the batch. 
 If the batch is full before the timeout is reached, then the batch would be 
 sent off.
 
 -- 
 Julien Danjou
 /* Free Software hacker * independent consultant
   http://julien.danjou.info */
 -
 John Herndon
 HP Cloud
 john.hern...@hp.com
 
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 A couple of things that I think need to be emphasized here:
 1. the mechanism needs to be configurable, so if you are more worried about 
 reliability than performance you would be able to turn off bulk loading
Definitely will be configurable, but I don’t think batching is going to be any 
less reliable than individual inserts. Can you expand on what is concerning?
 2. the caching size should also be configurable, so that we can limit your 
 exposure to lost messages
Agreed.
 3. while you can have the message queue hold the messages until you 
 acknowledge them, it seems like this adds a lot of complexity to the 
 interaction. you will need to be able to propagate this information all the 
 way back from the storage driver.
This is actually a pretty standard use case for AMQP, we have done it several 
times on in-house projects. The basic.ack call lets you acknowledge a whole 
batch of messages at once. Yes, we do have to figure out how to propagate the 
error cases back up to the broker, but I don’t think it will be so complicated 
that it’s not worth doing.
 4. any integration that is depdendent on a specific configuration on the 
 rabbit server is brittle, since we have seen a lot of variation between 
 services on this. I would prefer to control the behavior on the collection 
 side
Hm, I don’t understand…?
 So in general, I would prefer a mechanism that pulls the data in a default 
 manner, caches on the collection side based on configuration that allows you 
 to determine your own risk level and then manager retries in the storage 
 driver or at the cache controller level.
If you’re caching on the collector and the collector dies, then you’ve lost the 
whole batch of messages.  Then you have to invent some way of persisting the 
messages to disk until they been committed to the db and removing them 
afterwards. We originally talked about implementing a batching layer in the 
storage driver, but dragondm pointed out that the message queue is already 
hanging on to the messages and ensuring delivery, so it’s better to not 
reinvent that piece of the pipeline. This is a huge motivating factor for 
pursuing batching in oslo in my opinion

[openstack-dev] [Ceilometer][Oslo] Consuming Notifications in Batches

2013-12-19 Thread Herndon, John Luke
Hi Folks,

The Rackspace-HP team has been putting a lot of effort into performance
testing event collection in the ceilometer storage drivers[0]. Based on
some results of this testing, we would like to support batch consumption
of notifications, as it will greatly improve insertion performance. Batch
consumption in this case means waiting for a certain number of
notifications to arrive before sending to the storage
driver. 

I¹d like to get feedback from the community about this feature, and how we
are planning to implement it. Here is what I’m currently thinking:

1) This seems to fit well into oslo.messaging - batching may be a feature
that other projects will find useful. After reviewing the changes that
sileht has been working on in oslo.messaging, I think the right way to
start off is to create a new executor that builds up a batch of
notifications, and sends the batch to the dispatcher. We’d also add a
timeout, so if a certain amount of time passes and the batch isn’t filled
up, the notifications will be dispatched anyway. I’ve started a
blueprint for this change and am filling in the details as I go along [1].

2) In ceilometer, initialize the notification listener with the batch
executor instead of the eventlet executor (this should probably be
configurable)[2]. We can then send the entire batch of notifications to
the storage driver to be processed as events, while maintaining the
current method for converting notifications into samples.

3) Error handling becomes more difficult. The executor needs to know if
any of the notifications should be requeued. I think the right way to
solve this is to return a list of notifications to requeue from the
handler. Any better ideas?

Is this the right approach to take? I¹m not an oslo.messaging expert, so
if there is a proper way to implement this change, I¹m all ears!

Thanks, happy holidays!
-john

0: https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
1: 
https://blueprints.launchpad.net/oslo.messaging/+spec/bulk-consume-messages
2: https://blueprints.launchpad.net/ceilometer/+spec/use-bulk-notification


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer] Nomination of Sandy Walsh to core team

2013-12-09 Thread Herndon, John Luke
Hi There!

I¹m not 100% sure what the process is around electing an individual to the
core team (i.e., can a non-core person nominate someone?). However, I
believe the ceilometer core team could use a member who is more active in
the development of the event pipeline. A core developer in this area will
not only speed up review times for event patches, but will also help keep
new contributions focused on the overall eventing vision.

To that end, I would like to nominate Sandy Walsh from Rackspace to
ceilometer-core. Sandy is one of the original authors of StackTach, and
spearheaded the original stacktach-ceilometer integration. He has been
instrumental in many of my codes reviews, and has contributed much of the
existing event storage and querying code.

Thanks,
John Herndon
Software Engineer
HP Cloud


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer] Alembic or SA Migrate (again)

2013-11-13 Thread Herndon, John Luke
Hi Folks!

Sorry to dig up a really old topic[1][2], but I¹d like to know the status
of
ceilometer db migrations.

Rehash: I¹d like to submit two branches to modify the Event and Trait
tables. If I
were to do that now, I would need to write SQLAlchemy scripts to do the
database migration [3]. Since the unit tests use db migrations to build up
the db
schema, there’s currently no way to get the unit tests to run if your new
code uses an alembic migration and needs to alter columns�.

A couple of questions:
1) What is the progress of creating the schema from the models for unit
tests?
2) What is the time frame for requiring alembic migrations?
3) Should I push these branches up now, or wait and use an alembic
migration?
4) Is there anything I can do to help with 1 or 2?


Thanks,
-john

1: 
http://lists.openstack.org/pipermail/openstack-dev/2013-August/014214.html
2: 
http://lists.openstack.org/pipermail/openstack-dev/2013-September/014593.ht
ml
3: 
https://bitbucket.org/zzzeek/alembic/issue/21/column-renames-not-supported-
on-sqlite


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Alembic or SA Migrate (again)

2013-11-12 Thread Herndon, John Luke
Hi Folks!

Sorry to dig up a really old topic, but I¹d like to know the status of
ceilometer db migrations.

I¹d like to submit two branches to modify the Event and Trait tables. If I
were to do that now, I would need to write SQLAlchemy scripts to do the
database migration - (background:
https://bitbucket.org/zzzeek/alembic/issue/21/column-renames-not-supported-
on-sqlite). Since the unit tests use db migrations to build up the db
schema, there¹s currently no way to get the unit tests to run if your new
code uses an alembic migration and needs to alter columnsŠ which mine does
:(

A couple of questions:
1) What is the progress of creating the schema from the models for unit
tests?
2) What is the time frame for requiring alembic migrations?
3) Should I push these branches up now, or wait and use an alembic
migration?
4) Is there anything I can do to help with 1 or 2?


Thanks!
-john

Related threads: 
http://lists.openstack.org/pipermail/openstack-dev/2013-August/014214.html
http://lists.openstack.org/pipermail/openstack-dev/2013-September/014593.ht
ml


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] Need help with Alembic...

2013-08-26 Thread Herndon, John Luke (HPCS - Ft. Collins)
Jay - 

It looks there is an error in the migration script that causes it to abort:

AttributeError: 'ForeignKeyConstraint' object has no attribute 'drop'

My guess is the migration runs on the first test, creates event types
table fine, but exits with the above error, so migration is not
complete. Thus every subsequent test tries to migrate the db, and
notices that event types already exists.

-john

On 8/26/13 1:15 PM, Jay Pipes jaypi...@gmail.com wrote:

I just noticed that every single test case for SQL-driver storage is
executing every single migration upgrade before every single test case
run:

https://github.com/openstack/ceilometer/blob/master/ceilometer/tests/db.py
#L46

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/imp
l_sqlalchemy.py#L153

instead of simply creating a new database schema from the models in the
current source code base using a call to sqlalchemy.MetaData.create_all().

This results in re-running migrations over and over again, instead of
having dedicated migration tests that would test each migration
individually, as is the case in projects like Glance...

Is this intentional?

Best,
-jay

On 08/26/2013 02:59 PM, Sandy Walsh wrote:
 I'm getting the same problem with a different migration (mine is
 complaining that a column already exists)

 http://paste.openstack.org/show/44512/

 I've compared it to the other migrations and it seems fine.

 -S

 On 08/26/2013 02:34 PM, Jay Pipes wrote:
 Hey all,

 I'm trying to figure out what is going wrong with my code for this
patch:

 https://review.openstack.org/41316

 I had previously added a sqlalchemy-migrate migration script to add an
 event_type table, and had that working, but then was asked to instead
 use Alembic for migrations. So, I removed the sqlalchemy-migrate
 migration file and added an Alembic migration [1].

 Unfortunately, I am getting the following error when running tests:

 OperationalError: (OperationalError) table event_type already exists
 u'\nCREATE TABLE event_type (\n\tid INTEGER NOT NULL, \n\tdesc
 VARCHAR(255), \n\tPRIMARY KEY (id), \n\tUNIQUE (desc)\n)\n\n' ()

 The migration adds the event_type table. I've seen this error occur
 before when using SQLite due to SQLite's ALTER TABLE statement not
 allowing the rename of a column. In the sqlalchemy-migrate migration, I
 had a specialized SQLite migration upgrade [2] and downgrade [3]
script,
 but I'm not sure how I am supposed to handle this in Alembic. Could
 someone help me out?

 Thanks,
 -jay

 [1]
 
https://review.openstack.org/#/c/41316/16/ceilometer/storage/sqlalchemy/
alembic/versions/49036dfd_add_event_types.py

 [2]
 
https://review.openstack.org/#/c/41316/14/ceilometer/storage/sqlalchemy/
migrate_repo/versions/013_sqlite_upgrade.sql

 [3]
 
https://review.openstack.org/#/c/41316/14/ceilometer/storage/sqlalchemy/
migrate_repo/versions/013_sqlite_downgrade.sql


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer] Adding TraitType Storage Model

2013-08-17 Thread Herndon, John Luke (HPCS - Ft. Collins)
Hi - 

After discussion with Jay Pipes about this bug
(https://bugs.launchpad.net/ceilometer/+bug/1211015), I'd like to split
out a new TraitType table in the storage layer, and remove the UniqueName
table. A TraitType is the name of the trait and the data type (i.e.,
string, int, floatŠ). I'd like to get some input on this change - maybe
it's over kill? Here is my rationale for adding this table:

1) The current query to return all trait names is slow, as stated in the
bug report.
3) All instances of event X will have the same traits, and all instances
of a trait have the same trait type name and data type. I think it is
cleaner to model this relationship in the db with an explicit TraitType
table.
2) The api needs a model for trait types in order to fulfill the
/v2/event_types/Foo/trait_type query. This call will return the set of
trait names and data types, but no trait data.

Related patches:
sqlalchemy layer: https://review.openstack.org/#/c/42407/ (not sure the
migration is correct)

storage layer: https://review.openstack.org/#/c/41596/


Thanks!
-john


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer] Nova_tests failing in jenkins

2013-08-12 Thread Herndon, John Luke (HPCS - Ft. Collins)
Hi - 

The nova_tests are failing for a couple of different Ceilometer reviews,
due to 'module' object has no attribute 'add_driver'.

This review (https://review.openstack.org/#/c/41316/) had nothing to do
with the nova_tests, yet they are failing. Any clue what's going on?

Apologies if there is an obvious answer - I've never encountered this
before.

Thanks,
-john


smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] Event API Access Controls

2013-08-05 Thread Herndon, John Luke (HPCS - Ft. Collins)
Hi Julien,

On 8/5/13 2:04 AM, Julien Danjou jul...@danjou.info wrote:

On Sat, Aug 03 2013, Herndon, John Luke (HPCS - Ft. Collins) wrote:

Hi John,

 Hello, I'm currently implementing the event api blueprint[0], and am
 wondering what access controls we should impose on the event api. The
 purpose of the blueprint is to provide a StackTach equivalent in the
 ceilometer api. I believe that StackTach is used as an internal tool
which
 end with no access to end users. Given that the event api is targeted at
 administrators, I am currently thinking that it should be limited to
admin
 users only. However, I wanted to ask for input on this topic. Any
arguments
 for opening it up so users can look at events for their resources? Any
 arguments for not doing so?

You should definitely use the policy system we has in Ceilometer to
check that the user is authenticated and has admin privileges. We
already have such a mechanism in ceilometer.api.acl.

I don't see any point to expose raw operator system data to the users.
That could even be dangerous security wise.

This plans sounds good to me. We can enable/disable the event api for
users, but is there a way to restrict a user to viewing only his/her
events using the policy system? Or do we not need to do that?

-john


-- 
Julien Danjou
// Free Software hacker / freelance consultant
// http://julien.danjou.info



smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer] Event API Access Controls

2013-08-02 Thread Herndon, John Luke (HPCS - Ft. Collins)
Hello, I'm currently implementing the event api blueprint[0], and am
wondering what access controls we should impose on the event api. The
purpose of the blueprint is to provide a StackTach equivalent in the
ceilometer api. I believe that StackTach is used as an internal tool which
end with no access to end users. Given that the event api is targeted at
administrators, I am currently thinking that it should be limited to admin
users only. However, I wanted to ask for input on this topic. Any arguments
for opening it up so users can look at events for their resources? Any
arguments for not doing so? PS -I'm new to the ceilometer project, so let me
introduce myself. My name is John Herndon, and I work for HP. I've been
freed up from a different project and will be working on ceilometer. Thanks,
looking forward to working with everyone! -john  0:
https://blueprints.launchpad.net/ceilometer/+spec/specify-event-api



smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev