Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-22 Thread Sean Dague
On 03/21/2014 05:11 PM, Joe Gordon wrote:
 
 
 
 On Fri, Mar 21, 2014 at 4:04 AM, Sean Dague s...@dague.net
 mailto:s...@dague.net wrote:
 
 On 03/20/2014 06:18 PM, Joe Gordon wrote:
 
 
 
  On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
  alexei.kornie...@gmail.com mailto:alexei.kornie...@gmail.com
 mailto:alexei.kornie...@gmail.com
 mailto:alexei.kornie...@gmail.com wrote:
 
  Hello,
 
  We've done some profiling and results are quite interesting:
  during 1,5 hour ceilometer inserted 59755 events (59755 calls to
  record_metering_data)
  this calls resulted in total 2591573 SQL queries.
 
  And the most interesting part is that 291569 queries were ROLLBACK
  queries.
  We do around 5 rollbacks to record a single event!
 
  I guess it means that MySQL backend is currently totally
 unusable in
  production environment.
 
 
  It should be noticed that SQLAlchemy is horrible for performance, in
  nova we usually see sqlalchemy overheads of well over 10x (time
  nova.db.api call vs the time MySQL measures when slow log is recording
  everything).
 
 That's not really a fair assessment. Python object inflation takes time.
 I do get that there is SQLA overhead here, but even if you trimmed it
 out you would not get the the mysql query time.
 
 
 To give an example from nova:
 
 doing a nova list with no servers:
 
 stack@devstack:~/devstack$ nova --timing list 
 
 | GET
 http://10.0.0.16:8774/v2/a82ededa9a934b93a7184d06f302d745/servers/detail
 | 0.0817470550537 |
 
 So nova command takes 0.0817470550537 seconds.
 
 Inside the nova logs (when putting a timer around all nova.db.api calls
 [1] ), nova.db.api.instance_get_all_by_filters takes 0.06 seconds:
 
 2014-03-21 20:58:46.760 DEBUG nova.db.api
 [req-91879f86-7665-4943-8953-41c92c42c030 demo demo]
 'instance_get_all_by_filters' 0.06 seconds timed
 /mnt/stack/nova/nova/db/api.py:1940
 
 But the sql slow long reports the same query takes only 0.001006 seconds
 with a lock_time of 0.000269 for a total of  0.00127 seconds.
 
 # Query_time: 0.001006  Lock_time: 0.000269 Rows_sent: 0
  Rows_examined: 0
 
 
 So in this case only 2% of the time
 that  nova.db.api.instance_get_all_by_filters takes is spent inside of
 mysql. Or to put it differently  nova.db.api.instance_get_all_by_filters
 is 47 times slower then the raw DB call underneath.
 
 Yes I agree that that turning raw sql data into python objects should
 take time, but I just don't think it should take 98% of the time.
 
 [1] 
 https://github.com/jogo/nova/commit/7743ee366bbf8746f1c0f634f29ebf73bff16ea1
 
 That being said, having Ceilometer's write path be highly tuned and not
 use SQLA (and written for every back end natively) is probably
 appropriate.
 
 
 While I like this idea, they loose free postgresql support by dropping
 SQLA. But that is a solvable problem.

Joe, you're just trolling now, right? :)

I mean you picked the most pathological case possible. An empty table
with no data ever returned. So no actual work was done anywhere, and
this is just measure side effects which in no way are commensurate with
actual read / write profiles of a real system.

I 100% agree that SQLA provides overhead. However, removing SQLA is the
last in a series of optimizations that you do on a system. Because
taking it out doesn't solve having bad data usage (getting more data
than you need), bad schema, or bad queries. I would expect substantial
gains could be made tackling those first.

If after that, fast path drivers sounded like a good idea, go for it.

But realize that a fast path driver is more work to write and maintain.
And has the energy hasn't gone into optimizing things yet, I think a
proposal to put even more work on the team to write a new set of harder
to maintain drivers, is just a non starter.

All I'm asking is that we need profiling. Ceilometer is suppose to be
high performance / low overhead metrics collection. We have some
indication that it's not meeting that desire based on our gate runs.
Which means we can reproduce it. Which is great, because reproducing
means things are fixable, and we can easily know if we did fix it.

Optimizing is hard, but I think it's the right time to do it. Not just
with elasticity, but with old fashion analysis.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Sean Dague
On 03/20/2014 06:18 PM, Joe Gordon wrote:
 
 
 
 On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
 alexei.kornie...@gmail.com mailto:alexei.kornie...@gmail.com wrote:
 
 Hello,
 
 We've done some profiling and results are quite interesting:
 during 1,5 hour ceilometer inserted 59755 events (59755 calls to
 record_metering_data)
 this calls resulted in total 2591573 SQL queries.
 
 And the most interesting part is that 291569 queries were ROLLBACK
 queries.
 We do around 5 rollbacks to record a single event!
 
 I guess it means that MySQL backend is currently totally unusable in
 production environment.
 
 
 It should be noticed that SQLAlchemy is horrible for performance, in
 nova we usually see sqlalchemy overheads of well over 10x (time
 nova.db.api call vs the time MySQL measures when slow log is recording
 everything).

That's not really a fair assessment. Python object inflation takes time.
I do get that there is SQLA overhead here, but even if you trimmed it
out you would not get the the mysql query time.

That being said, having Ceilometer's write path be highly tuned and not
use SQLA (and written for every back end natively) is probably appropriate.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Boris Pavlovic
Sean,


Absolutely agree with you.
It's not the same to execute query and get plain text, and execute query
and get hierarchy of python objects.

Plus I disagree when I hear that SQLAlchemy is slow. It's slow when you are
using it wrong.

Like in Nova Scheduler [1] we were fetching full 3 tables with JOIN. Which
produce much more results from DB (in bytes and rows) then just make 3
separated selects and then join it by hand.

We should stop using next phrases:
1) python is slow
2) mysql is slow
3) sqlalchemy is slow
4) hardware is slow [2]

And start using these phrase:
1) Algorithms that we are using are bad
2) Architecture solutions that we are using are bad

And start thinking about how to improve them.


[1] https://review.openstack.org/#/c/43151/
[2] http://en.wikipedia.org/wiki/Buran_(spacecraft)

Best regards,
Boris Pavlovic



On Fri, Mar 21, 2014 at 3:04 PM, Sean Dague s...@dague.net wrote:

 On 03/20/2014 06:18 PM, Joe Gordon wrote:
 
 
 
  On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
  alexei.kornie...@gmail.com mailto:alexei.kornie...@gmail.com wrote:
 
  Hello,
 
  We've done some profiling and results are quite interesting:
  during 1,5 hour ceilometer inserted 59755 events (59755 calls to
  record_metering_data)
  this calls resulted in total 2591573 SQL queries.
 
  And the most interesting part is that 291569 queries were ROLLBACK
  queries.
  We do around 5 rollbacks to record a single event!
 
  I guess it means that MySQL backend is currently totally unusable in
  production environment.
 
 
  It should be noticed that SQLAlchemy is horrible for performance, in
  nova we usually see sqlalchemy overheads of well over 10x (time
  nova.db.api call vs the time MySQL measures when slow log is recording
  everything).

 That's not really a fair assessment. Python object inflation takes time.
 I do get that there is SQLA overhead here, but even if you trimmed it
 out you would not get the the mysql query time.

 That being said, having Ceilometer's write path be highly tuned and not
 use SQLA (and written for every back end natively) is probably appropriate.

 -Sean

 --
 Sean Dague
 Samsung Research America
 s...@dague.net / sean.da...@samsung.com
 http://dague.net


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Doug Hellmann
On Fri, Mar 21, 2014 at 7:04 AM, Sean Dague s...@dague.net wrote:

 On 03/20/2014 06:18 PM, Joe Gordon wrote:
 
 
 
  On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
  alexei.kornie...@gmail.com mailto:alexei.kornie...@gmail.com wrote:
 
  Hello,
 
  We've done some profiling and results are quite interesting:
  during 1,5 hour ceilometer inserted 59755 events (59755 calls to
  record_metering_data)
  this calls resulted in total 2591573 SQL queries.
 
  And the most interesting part is that 291569 queries were ROLLBACK
  queries.
  We do around 5 rollbacks to record a single event!
 
  I guess it means that MySQL backend is currently totally unusable in
  production environment.
 
 
  It should be noticed that SQLAlchemy is horrible for performance, in
  nova we usually see sqlalchemy overheads of well over 10x (time
  nova.db.api call vs the time MySQL measures when slow log is recording
  everything).

 That's not really a fair assessment. Python object inflation takes time.
 I do get that there is SQLA overhead here, but even if you trimmed it
 out you would not get the the mysql query time.

 That being said, having Ceilometer's write path be highly tuned and not
 use SQLA (and written for every back end natively) is probably appropriate.


I have been working to get Mike Bayer (author of SQLAlchemy) to the summit
in Atlanta. He is interested in working with us to improve SQLAlchemy, so
if we have specific performance or feature issues like this, it would be
good to make a list. If we have enough, maybe we can set aside a session
in the Oslo track, otherwise we can at least have some hallway
conversations.

Doug




 -Sean

 --
 Sean Dague
 Samsung Research America
 s...@dague.net / sean.da...@samsung.com
 http://dague.net


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Joe Gordon
On Fri, Mar 21, 2014 at 4:04 AM, Sean Dague s...@dague.net wrote:

 On 03/20/2014 06:18 PM, Joe Gordon wrote:
 
 
 
  On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
  alexei.kornie...@gmail.com mailto:alexei.kornie...@gmail.com wrote:
 
  Hello,
 
  We've done some profiling and results are quite interesting:
  during 1,5 hour ceilometer inserted 59755 events (59755 calls to
  record_metering_data)
  this calls resulted in total 2591573 SQL queries.
 
  And the most interesting part is that 291569 queries were ROLLBACK
  queries.
  We do around 5 rollbacks to record a single event!
 
  I guess it means that MySQL backend is currently totally unusable in
  production environment.
 
 
  It should be noticed that SQLAlchemy is horrible for performance, in
  nova we usually see sqlalchemy overheads of well over 10x (time
  nova.db.api call vs the time MySQL measures when slow log is recording
  everything).

 That's not really a fair assessment. Python object inflation takes time.
 I do get that there is SQLA overhead here, but even if you trimmed it
 out you would not get the the mysql query time.


To give an example from nova:

doing a nova list with no servers:

stack@devstack:~/devstack$ nova --timing list

| GET
http://10.0.0.16:8774/v2/a82ededa9a934b93a7184d06f302d745/servers/detail |
0.0817470550537 |

So nova command takes 0.0817470550537 seconds.

Inside the nova logs (when putting a timer around all nova.db.api calls [1]
), nova.db.api.instance_get_all_by_filters takes 0.06 seconds:

2014-03-21 20:58:46.760 DEBUG nova.db.api
[req-91879f86-7665-4943-8953-41c92c42c030 demo demo]
'instance_get_all_by_filters' 0.06 seconds timed
/mnt/stack/nova/nova/db/api.py:1940

But the sql slow long reports the same query takes only 0.001006 seconds
with a lock_time of 0.000269 for a total of  0.00127 seconds.

# Query_time: 0.001006  Lock_time: 0.000269 Rows_sent: 0
 Rows_examined: 0


So in this case only 2% of the time
that  nova.db.api.instance_get_all_by_filters takes is spent inside of
mysql. Or to put it differently  nova.db.api.instance_get_all_by_filters is
47 times slower then the raw DB call underneath.

Yes I agree that that turning raw sql data into python objects should take
time, but I just don't think it should take 98% of the time.

[1]
https://github.com/jogo/nova/commit/7743ee366bbf8746f1c0f634f29ebf73bff16ea1

That being said, having Ceilometer's write path be highly tuned and not
 use SQLA (and written for every back end natively) is probably appropriate.


While I like this idea, they loose free postgresql support by dropping
SQLA. But that is a solvable problem.



 -Sean

 --
 Sean Dague
 Samsung Research America
 s...@dague.net / sean.da...@samsung.com
 http://dague.net


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Joe Gordon
On Fri, Mar 21, 2014 at 8:58 AM, Doug Hellmann
doug.hellm...@dreamhost.comwrote:




 On Fri, Mar 21, 2014 at 7:04 AM, Sean Dague s...@dague.net wrote:

 On 03/20/2014 06:18 PM, Joe Gordon wrote:
 
 
 
  On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
  alexei.kornie...@gmail.com mailto:alexei.kornie...@gmail.com wrote:
 
  Hello,
 
  We've done some profiling and results are quite interesting:
  during 1,5 hour ceilometer inserted 59755 events (59755 calls to
  record_metering_data)
  this calls resulted in total 2591573 SQL queries.
 
  And the most interesting part is that 291569 queries were ROLLBACK
  queries.
  We do around 5 rollbacks to record a single event!
 
  I guess it means that MySQL backend is currently totally unusable in
  production environment.
 
 
  It should be noticed that SQLAlchemy is horrible for performance, in
  nova we usually see sqlalchemy overheads of well over 10x (time
  nova.db.api call vs the time MySQL measures when slow log is recording
  everything).

 That's not really a fair assessment. Python object inflation takes time.
 I do get that there is SQLA overhead here, but even if you trimmed it
 out you would not get the the mysql query time.

 That being said, having Ceilometer's write path be highly tuned and not
 use SQLA (and written for every back end natively) is probably
 appropriate.


 I have been working to get Mike Bayer (author of SQLAlchemy) to the summit
 in Atlanta. He is interested in working with us to improve SQLAlchemy, so
 if we have specific performance or feature issues like this, it would be
 good to make a list. If we have enough, maybe we can set aside a session in
 the Oslo track, otherwise we can at least have some hallway conversations.



That would be really amazing. Is he on IRC, so we can get the ball rolling?



 Doug




 -Sean

 --
 Sean Dague
 Samsung Research America
 s...@dague.net / sean.da...@samsung.com
 http://dague.net


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Alexei Kornienko

Hello,

Please see some comments inline.

Best Regards,
Alexei Kornienko

On 03/21/2014 11:11 PM, Joe Gordon wrote:




On Fri, Mar 21, 2014 at 4:04 AM, Sean Dague s...@dague.net 
mailto:s...@dague.net wrote:


On 03/20/2014 06:18 PM, Joe Gordon wrote:



 On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
 alexei.kornie...@gmail.com mailto:alexei.kornie...@gmail.com
mailto:alexei.kornie...@gmail.com
mailto:alexei.kornie...@gmail.com wrote:

 Hello,

 We've done some profiling and results are quite interesting:
 during 1,5 hour ceilometer inserted 59755 events (59755 calls to
 record_metering_data)
 this calls resulted in total 2591573 SQL queries.

 And the most interesting part is that 291569 queries were
ROLLBACK
 queries.
 We do around 5 rollbacks to record a single event!

 I guess it means that MySQL backend is currently totally
unusable in
 production environment.


 It should be noticed that SQLAlchemy is horrible for performance, in
 nova we usually see sqlalchemy overheads of well over 10x (time
 nova.db.api call vs the time MySQL measures when slow log is
recording
 everything).

That's not really a fair assessment. Python object inflation takes
time.
I do get that there is SQLA overhead here, but even if you trimmed it
out you would not get the the mysql query time.


To give an example from nova:

doing a nova list with no servers:

stack@devstack:~/devstack$ nova --timing list

| GET 
http://10.0.0.16:8774/v2/a82ededa9a934b93a7184d06f302d745/servers/detail 
| 0.0817470550537 |


So nova command takes 0.0817470550537 seconds.

Inside the nova logs (when putting a timer around all nova.db.api 
calls [1] ), nova.db.api.instance_get_all_by_filters takes 0.06 seconds:


2014-03-21 20:58:46.760 DEBUG nova.db.api 
[req-91879f86-7665-4943-8953-41c92c42c030 demo demo] 
'instance_get_all_by_filters' 0.06 seconds timed 
/mnt/stack/nova/nova/db/api.py:1940


But the sql slow long reports the same query takes only 0.001006 
seconds with a lock_time of 0.000269 for a total of  0.00127 seconds.


# Query_time: 0.001006  Lock_time: 0.000269 Rows_sent: 0 
 Rows_examined: 0



So in this case only 2% of the time 
that  nova.db.api.instance_get_all_by_filters takes is spent inside of 
mysql. Or to put it differently 
 nova.db.api.instance_get_all_by_filters is 47 times slower then the 
raw DB call underneath.


Yes I agree that that turning raw sql data into python objects should 
take time, but I just don't think it should take 98% of the time.
If you would open actual code of nova.db.api.instance_get_all_by_filters 
- 
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1817

You will find out that python code is actually doing lot's of things:
1) setup join conditions
2) create query filters
3) doing some heavy matching, loops in exact_filter, regex_filter, 
tag_filter
This code won't go away with python objects since it's related to 
busyness logic.
I think that it's quite hypocritical to say that the problem is turning 
raw sql data into python objects




[1] 
https://github.com/jogo/nova/commit/7743ee366bbf8746f1c0f634f29ebf73bff16ea1


That being said, having Ceilometer's write path be highly tuned
and not
use SQLA (and written for every back end natively) is probably
appropriate.


While I like this idea, they loose free postgresql support by dropping 
SQLA. But that is a solvable problem.



-Sean

--
Sean Dague
Samsung Research America
s...@dague.net mailto:s...@dague.net / sean.da...@samsung.com
mailto:sean.da...@samsung.com
http://dague.net


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
mailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Doug Hellmann
On Fri, Mar 21, 2014 at 5:13 PM, Joe Gordon joe.gord...@gmail.com wrote:




 On Fri, Mar 21, 2014 at 8:58 AM, Doug Hellmann 
 doug.hellm...@dreamhost.com wrote:




 On Fri, Mar 21, 2014 at 7:04 AM, Sean Dague s...@dague.net wrote:

 On 03/20/2014 06:18 PM, Joe Gordon wrote:
 
 
 
  On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
  alexei.kornie...@gmail.com mailto:alexei.kornie...@gmail.com
 wrote:
 
  Hello,
 
  We've done some profiling and results are quite interesting:
  during 1,5 hour ceilometer inserted 59755 events (59755 calls to
  record_metering_data)
  this calls resulted in total 2591573 SQL queries.
 
  And the most interesting part is that 291569 queries were ROLLBACK
  queries.
  We do around 5 rollbacks to record a single event!
 
  I guess it means that MySQL backend is currently totally unusable
 in
  production environment.
 
 
  It should be noticed that SQLAlchemy is horrible for performance, in
  nova we usually see sqlalchemy overheads of well over 10x (time
  nova.db.api call vs the time MySQL measures when slow log is recording
  everything).

 That's not really a fair assessment. Python object inflation takes time.
 I do get that there is SQLA overhead here, but even if you trimmed it
 out you would not get the the mysql query time.

 That being said, having Ceilometer's write path be highly tuned and not
 use SQLA (and written for every back end natively) is probably
 appropriate.


 I have been working to get Mike Bayer (author of SQLAlchemy) to the
 summit in Atlanta. He is interested in working with us to improve
 SQLAlchemy, so if we have specific performance or feature issues like this,
 it would be good to make a list. If we have enough, maybe we can set aside
 a session in the Oslo track, otherwise we can at least have some hallway
 conversations.



 That would be really amazing. Is he on IRC, so we can get the ball rolling?


I'll ask him to join #openstack-dev if he is.

Doug






 Doug




 -Sean

 --
 Sean Dague
 Samsung Research America
 s...@dague.net / sean.da...@samsung.com
 http://dague.net


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Nadya Privalova
Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because is's wrong ?)
2. Idea about spawning several collectors is suspicious (btw there is a
patch that run several collectors: https://review.openstack.org/#/c/79962/.)

Let's try to get back to original problem. All these solutions were
suggested to solve the problem of high load on Ceilometer. AFAIK, Tempest's
goal is to test projects` interactions, not performance testing. The
perfect tempest's behaviour would be start ceilometer only for Ceilometer
tests. From one hand it will allow not to load db during other tests, from
the other hand projects` interactions will be tested because during
Ceilometer test we create volums, images and instances. But I'm afraid that
this scenario is not possible technically.
There is one more idea. Make Ceilometer able to monitor not all messages
but filtered set of messages. But anyway this is a new feature and cannot
be added right now.

Tempest guys, if you have any thoughts about first suggestion start
ceilometer only for Ceilometer tests please share.

Thanks,
Nadya




On Thu, Mar 20, 2014 at 3:23 AM, Sean Dague s...@dague.net wrote:

 On 03/19/2014 06:09 PM, Doug Hellmann wrote:
  The ceilometer collector is meant to scale horizontally. Have you tried
  configuring the test environment to run more than one copy, to process
  the notifications more quickly?

 The ceilometer collector is already one of the top running processes on
 the box -

 http://logs.openstack.org/82/81282/2/check/check-tempest-dsvm-full/693dc3b/logs/dstat.txt.gz


 Often consuming  1/2 a core (25% == 1 core in that run, as can been
 seen when qemu boots and pegs one).

 So while we could spin up more collectors, I think it's unreasonable
 that the majority of our cpu has to be handed over to the metric
 collector to make it function responsively. I thought the design point
 was that this was low impact.

 -Sean

 --
 Sean Dague
 Samsung Research America
 s...@dague.net / sean.da...@samsung.com
 http://dague.net


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Sean Dague
On 03/20/2014 05:49 AM, Nadya Privalova wrote:
 Hi all,
 First of all, thanks for your suggestions!
 
 To summarize the discussions here:
 1. We are not going to install Mongo (because is's wrong ?)

We are not going to install Mongo not from base distribution, because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.

 2. Idea about spawning several collectors is suspicious (btw there is a
 patch that run several collectors:
 https://review.openstack.org/#/c/79962/ .)

Correct, given that the collector is already one of the most expensive
processes in a devstack run.

 Let's try to get back to original problem. All these solutions were
 suggested to solve the problem of high load on Ceilometer. AFAIK,
 Tempest's goal is to test projects` interactions, not performance
 testing. The perfect tempest's behaviour would be start ceilometer only
 for Ceilometer tests. From one hand it will allow not to load db during
 other tests, from the other hand projects` interactions will be tested
 because during Ceilometer test we create volums, images and instances.
 But I'm afraid that this scenario is not possible technically.
 There is one more idea. Make Ceilometer able to monitor not all messages
 but filtered set of messages. But anyway this is a new feature and
 cannot be added right now.
 
 Tempest guys, if you have any thoughts about first suggestion start
 ceilometer only for Ceilometer tests please share.

The point of the gate is that it's integrated and testing the
interaction between projects. Ceilometer can be tested on it's own in
ceilometer unit tests, or by creating ceilometer functional tests that
only run on the ceilometer jobs.

While I agree that Tempest's job is not to test performance, we do have
to give some basic sanity checking here that the software is running in
some performance profile that we believe is base usable.

Based on the latest dstat results, I think that's a dubious assessment.
The answer on the collector side has to be something other than
horizontal scaling. Because we're talking about the collector being the
3rd highest utilized process on the box right now (we should write a
dstat plugin to give us cumulative data, just haven't gotten there yet).

So right now, I think performance analysis for ceilometer on sqla is
important, really important. Not just horizontal scaling, but actual
performance profiling.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Nadya Privalova
Sean, thank for analysis.
JFYI, I did some initial profiling, it's described here
https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg19030.html.


On Thu, Mar 20, 2014 at 2:15 PM, Sean Dague s...@dague.net wrote:

 On 03/20/2014 05:49 AM, Nadya Privalova wrote:
  Hi all,
  First of all, thanks for your suggestions!
 
  To summarize the discussions here:
  1. We are not going to install Mongo (because is's wrong ?)

 We are not going to install Mongo not from base distribution, because
 we don't do that for things that aren't python. Our assumption is
 dependent services come from the base OS.

 That being said, being an integrated project means you have to be able
 to function, sanely, on an sqla backend, as that will always be part of
 your gate.

  2. Idea about spawning several collectors is suspicious (btw there is a
  patch that run several collectors:
  https://review.openstack.org/#/c/79962/ .)

 Correct, given that the collector is already one of the most expensive
 processes in a devstack run.

  Let's try to get back to original problem. All these solutions were
  suggested to solve the problem of high load on Ceilometer. AFAIK,
  Tempest's goal is to test projects` interactions, not performance
  testing. The perfect tempest's behaviour would be start ceilometer only
  for Ceilometer tests. From one hand it will allow not to load db during
  other tests, from the other hand projects` interactions will be tested
  because during Ceilometer test we create volums, images and instances.
  But I'm afraid that this scenario is not possible technically.
  There is one more idea. Make Ceilometer able to monitor not all messages
  but filtered set of messages. But anyway this is a new feature and
  cannot be added right now.
 
  Tempest guys, if you have any thoughts about first suggestion start
  ceilometer only for Ceilometer tests please share.

 The point of the gate is that it's integrated and testing the
 interaction between projects. Ceilometer can be tested on it's own in
 ceilometer unit tests, or by creating ceilometer functional tests that
 only run on the ceilometer jobs.

 While I agree that Tempest's job is not to test performance, we do have
 to give some basic sanity checking here that the software is running in
 some performance profile that we believe is base usable.

 Based on the latest dstat results, I think that's a dubious assessment.
 The answer on the collector side has to be something other than
 horizontal scaling. Because we're talking about the collector being the
 3rd highest utilized process on the box right now (we should write a
 dstat plugin to give us cumulative data, just haven't gotten there yet).

 So right now, I think performance analysis for ceilometer on sqla is
 important, really important. Not just horizontal scaling, but actual
 performance profiling.

 -Sean

 --
 Sean Dague
 Samsung Research America
 s...@dague.net / sean.da...@samsung.com
 http://dague.net


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Tim Bell

+1 for performance analysis to understand what needs to be optimised. Metering 
should be light-weight.

For those of us running in production, we don't have an option to turn 
ceilometer off some of the time. That we are not able to run through the gate 
tests hints that there are optimisations that are needed.

For example, turning on ceilometer caused a 16x increase in our Nova API call 
rate, see 
http://openstack-in-production.blogspot.ch/2014/03/cern-cloud-architecture-update-for.html
 for details.

Tim

 -Original Message-
 From: Sean Dague [mailto:s...@dague.net]
 Sent: 20 March 2014 11:16
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer 
 tempest testing in gate
 
...
 
 While I agree that Tempest's job is not to test performance, we do have to 
 give some basic sanity checking here that the software is running in some
 performance profile that we believe is base usable.

 Based on the latest dstat results, I think that's a dubious assessment.
 The answer on the collector side has to be something other than horizontal 
 scaling. Because we're talking about the collector being the 3rd highest
 utilized process on the box right now (we should write a dstat plugin to give 
 us cumulative data, just haven't gotten there yet).

 So right now, I think performance analysis for ceilometer on sqla is 
 important, really important. Not just horizontal scaling, but actual
 performance profiling.
 
   -Sean
 
 --
 Sean Dague
 Samsung Research America
 s...@dague.net / sean.da...@samsung.com
 http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Nadya Privalova
Tim, yep. If you use one db for Ceilometer and Nova then nova's performance
may be affected. I've seen this issue.
Will start profiling ASAP.


On Thu, Mar 20, 2014 at 3:59 PM, Tim Bell tim.b...@cern.ch wrote:


 +1 for performance analysis to understand what needs to be optimised.
 Metering should be light-weight.

 For those of us running in production, we don't have an option to turn
 ceilometer off some of the time. That we are not able to run through the
 gate tests hints that there are optimisations that are needed.

 For example, turning on ceilometer caused a 16x increase in our Nova API
 call rate, see
 http://openstack-in-production.blogspot.ch/2014/03/cern-cloud-architecture-update-for.htmlfor
  details.

 Tim

  -Original Message-
  From: Sean Dague [mailto:s...@dague.net]
  Sent: 20 March 2014 11:16
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer
 tempest testing in gate
 
 ...
 
  While I agree that Tempest's job is not to test performance, we do have
 to give some basic sanity checking here that the software is running in some
  performance profile that we believe is base usable.
 
  Based on the latest dstat results, I think that's a dubious assessment.
  The answer on the collector side has to be something other than
 horizontal scaling. Because we're talking about the collector being the 3rd
 highest
  utilized process on the box right now (we should write a dstat plugin to
 give us cumulative data, just haven't gotten there yet).
 
  So right now, I think performance analysis for ceilometer on sqla is
 important, really important. Not just horizontal scaling, but actual
  performance profiling.
 
-Sean
 
  --
  Sean Dague
  Samsung Research America
  s...@dague.net / sean.da...@samsung.com
  http://dague.net

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Christian Berendt
On 03/20/2014 01:27 PM, Nadya Privalova wrote:
 Tim, yep. If you use one db for Ceilometer and Nova then nova's
 performance may be affected.

If I understood it correctly the problem is not the higher load produced
directly by Ceilometer on the database. The problem is that the
Ceilometer compute agent sends a lot of Nova API calls and this results
in a higher load on the nova-api services. Tim mentioned a factor of 16.

Christian.

-- 
Christian Berendt
Cloud Computing Solution Architect
Mail: bere...@b1-systems.de

B1 Systems GmbH
Osterfeldstra├če 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Tim Bell
We're using a dedicated MongoDB instance for ceilometer and a distinct DB for 
each of the Nova cells.

Tim

From: Nadya Privalova [mailto:nprival...@mirantis.com]
Sent: 20 March 2014 13:27
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer 
tempest testing in gate

Tim, yep. If you use one db for Ceilometer and Nova then nova's performance may 
be affected. I've seen this issue.
Will start profiling ASAP.

On Thu, Mar 20, 2014 at 3:59 PM, Tim Bell 
tim.b...@cern.chmailto:tim.b...@cern.ch wrote:

+1 for performance analysis to understand what needs to be optimised. Metering 
should be light-weight.

For those of us running in production, we don't have an option to turn 
ceilometer off some of the time. That we are not able to run through the gate 
tests hints that there are optimisations that are needed.

For example, turning on ceilometer caused a 16x increase in our Nova API call 
rate, see 
http://openstack-in-production.blogspot.ch/2014/03/cern-cloud-architecture-update-for.html
 for details.

Tim

 -Original Message-
 From: Sean Dague [mailto:s...@dague.netmailto:s...@dague.net]
 Sent: 20 March 2014 11:16
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer 
 tempest testing in gate

...

 While I agree that Tempest's job is not to test performance, we do have to 
 give some basic sanity checking here that the software is running in some
 performance profile that we believe is base usable.

 Based on the latest dstat results, I think that's a dubious assessment.
 The answer on the collector side has to be something other than horizontal 
 scaling. Because we're talking about the collector being the 3rd highest
 utilized process on the box right now (we should write a dstat plugin to give 
 us cumulative data, just haven't gotten there yet).

 So right now, I think performance analysis for ceilometer on sqla is 
 important, really important. Not just horizontal scaling, but actual
 performance profiling.

   -Sean

 --
 Sean Dague
 Samsung Research America
 s...@dague.netmailto:s...@dague.net / 
 sean.da...@samsung.commailto:sean.da...@samsung.com
 http://dague.net
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread David Kranz

On 03/20/2014 06:15 AM, Sean Dague wrote:

On 03/20/2014 05:49 AM, Nadya Privalova wrote:

Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because is's wrong ?)

We are not going to install Mongo not from base distribution, because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.
This is a claim I think needs a bit more scrutiny if by sanely you 
mean performant. It seems we have an integrated project that no one 
would deploy using the sql db driver we have in the gate. Is any one 
doing that?  Is having a scalable sql back end a goal of ceilometer?


More generally, if there is functionality that is of great importance to 
any cloud deployment (and we would not integrate it if we didn't think 
it was) that cannot be deployed at scale using sqla, are we really going 
to say it should not be a part of OpenStack because we refuse, for 
whatever reason, to run it in our gate using a driver that would 
actually be used? And if we do demand an sqla backend, how much time 
should we spend trying to optimize it if no one will really use it? 
Though the slow heat job is a little different because the slowness 
comes directly from running real use cases, perhaps we should just set 
up a slow ceilometer job if the sql version is too slow for its budget 
in the main job.


It seems like there is a similar thread, at least in part, about this 
around marconi.


 -David








___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Sean Dague
On 03/20/2014 11:35 AM, David Kranz wrote:
 On 03/20/2014 06:15 AM, Sean Dague wrote:
 On 03/20/2014 05:49 AM, Nadya Privalova wrote:
 Hi all,
 First of all, thanks for your suggestions!

 To summarize the discussions here:
 1. We are not going to install Mongo (because is's wrong ?)
 We are not going to install Mongo not from base distribution, because
 we don't do that for things that aren't python. Our assumption is
 dependent services come from the base OS.

 That being said, being an integrated project means you have to be able
 to function, sanely, on an sqla backend, as that will always be part of
 your gate.
 This is a claim I think needs a bit more scrutiny if by sanely you
 mean performant. It seems we have an integrated project that no one
 would deploy using the sql db driver we have in the gate. Is any one
 doing that?  Is having a scalable sql back end a goal of ceilometer?
 
 More generally, if there is functionality that is of great importance to
 any cloud deployment (and we would not integrate it if we didn't think
 it was) that cannot be deployed at scale using sqla, are we really going
 to say it should not be a part of OpenStack because we refuse, for
 whatever reason, to run it in our gate using a driver that would
 actually be used? And if we do demand an sqla backend, how much time
 should we spend trying to optimize it if no one will really use it?
 Though the slow heat job is a little different because the slowness
 comes directly from running real use cases, perhaps we should just set
 up a slow ceilometer job if the sql version is too slow for its budget
 in the main job.
 
 It seems like there is a similar thread, at least in part, about this
 around marconi.

We required a non mongo backend to graduate ceilometer. So I don't think
it's too much to ask that it actually works.

If the answer is that it will never work and it was a checkbox with no
intent to make it work, then it should be deprecated and removed from
the tree in Juno, with a big WARNING that you shouldn't ever use that
backend. Like Nova now does with all the virt drivers that aren't tested
upstream.

Shipping in tree code that you don't want people to use is bad for
users. Either commit to making it work, or deprecate it and remove it.

I don't see this as the same issue as the slow heat job. Heat,
architecturally, is going to be slow. It spins up real OSes and does
real thinks to them. There is no way that's ever going to be fast, and
the dedicated job was a recognition that to support this level of
services in OpenStack we need to give them more breathing room.

Architecturally Ceilometer should not be this expensive. We've got some
data showing it to be aberrant from where we believe it should be. We
should fix that.

Once we get a base OS in the gate that lets us direct install mongo from
base packages, we can also do that. Or someone can 3rd party it today.
Then we'll even have comparative results to understand the differences.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread David Kranz

On 03/20/2014 12:31 PM, Sean Dague wrote:

On 03/20/2014 11:35 AM, David Kranz wrote:

On 03/20/2014 06:15 AM, Sean Dague wrote:

On 03/20/2014 05:49 AM, Nadya Privalova wrote:

Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because is's wrong ?)

We are not going to install Mongo not from base distribution, because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.

This is a claim I think needs a bit more scrutiny if by sanely you
mean performant. It seems we have an integrated project that no one
would deploy using the sql db driver we have in the gate. Is any one
doing that?  Is having a scalable sql back end a goal of ceilometer?

More generally, if there is functionality that is of great importance to
any cloud deployment (and we would not integrate it if we didn't think
it was) that cannot be deployed at scale using sqla, are we really going
to say it should not be a part of OpenStack because we refuse, for
whatever reason, to run it in our gate using a driver that would
actually be used? And if we do demand an sqla backend, how much time
should we spend trying to optimize it if no one will really use it?
Though the slow heat job is a little different because the slowness
comes directly from running real use cases, perhaps we should just set
up a slow ceilometer job if the sql version is too slow for its budget
in the main job.

It seems like there is a similar thread, at least in part, about this
around marconi.

We required a non mongo backend to graduate ceilometer. So I don't think
it's too much to ask that it actually works.

If the answer is that it will never work and it was a checkbox with no
intent to make it work, then it should be deprecated and removed from
the tree in Juno, with a big WARNING that you shouldn't ever use that
backend. Like Nova now does with all the virt drivers that aren't tested
upstream.

Shipping in tree code that you don't want people to use is bad for
users. Either commit to making it work, or deprecate it and remove it.

I don't see this as the same issue as the slow heat job. Heat,
architecturally, is going to be slow. It spins up real OSes and does
real thinks to them. There is no way that's ever going to be fast, and
the dedicated job was a recognition that to support this level of
services in OpenStack we need to give them more breathing room.
Peace. I specifically noted that difference in my original comment. And 
for that reason the heat slow job may not be temporary.


Architecturally Ceilometer should not be this expensive. We've got some
data showing it to be aberrant from where we believe it should be. We
should fix that.
There are plenty of cases where we have had code that passes gate tests 
with acceptable performance but falls over in real deployment. I'm just 
saying that having a driver that works ok in the gate but does not work 
for real deployments is of no more value that not having it at all. 
Maybe less value.
How do you propose to solve the problem of getting more ceilometer tests 
into the gate in the short-run? As a practical measure l don't see why 
it is so bad to have a separate job until the complex issue of whether 
it is possible to have a real-world performant sqla backend is resolved. 
Or did I miss something and it has already been determined that sqla 
could be used for large-scale deployments if we just fixed our code?


Once we get a base OS in the gate that lets us direct install mongo from
base packages, we can also do that. Or someone can 3rd party it today.
Then we'll even have comparative results to understand the differences.

Yes. Do you know which base OS's are candidates for that?

 -David



-Sean



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Sean Dague
On 03/20/2014 01:01 PM, David Kranz wrote:
 On 03/20/2014 12:31 PM, Sean Dague wrote:
 On 03/20/2014 11:35 AM, David Kranz wrote:
 On 03/20/2014 06:15 AM, Sean Dague wrote:
 On 03/20/2014 05:49 AM, Nadya Privalova wrote:
 Hi all,
 First of all, thanks for your suggestions!

 To summarize the discussions here:
 1. We are not going to install Mongo (because is's wrong ?)
 We are not going to install Mongo not from base distribution, because
 we don't do that for things that aren't python. Our assumption is
 dependent services come from the base OS.

 That being said, being an integrated project means you have to be able
 to function, sanely, on an sqla backend, as that will always be part of
 your gate.
 This is a claim I think needs a bit more scrutiny if by sanely you
 mean performant. It seems we have an integrated project that no one
 would deploy using the sql db driver we have in the gate. Is any one
 doing that?  Is having a scalable sql back end a goal of ceilometer?

 More generally, if there is functionality that is of great importance to
 any cloud deployment (and we would not integrate it if we didn't think
 it was) that cannot be deployed at scale using sqla, are we really going
 to say it should not be a part of OpenStack because we refuse, for
 whatever reason, to run it in our gate using a driver that would
 actually be used? And if we do demand an sqla backend, how much time
 should we spend trying to optimize it if no one will really use it?
 Though the slow heat job is a little different because the slowness
 comes directly from running real use cases, perhaps we should just set
 up a slow ceilometer job if the sql version is too slow for its budget
 in the main job.

 It seems like there is a similar thread, at least in part, about this
 around marconi.
 We required a non mongo backend to graduate ceilometer. So I don't think
 it's too much to ask that it actually works.

 If the answer is that it will never work and it was a checkbox with no
 intent to make it work, then it should be deprecated and removed from
 the tree in Juno, with a big WARNING that you shouldn't ever use that
 backend. Like Nova now does with all the virt drivers that aren't tested
 upstream.

 Shipping in tree code that you don't want people to use is bad for
 users. Either commit to making it work, or deprecate it and remove it.

 I don't see this as the same issue as the slow heat job. Heat,
 architecturally, is going to be slow. It spins up real OSes and does
 real thinks to them. There is no way that's ever going to be fast, and
 the dedicated job was a recognition that to support this level of
 services in OpenStack we need to give them more breathing room.
 Peace. I specifically noted that difference in my original comment. And
 for that reason the heat slow job may not be temporary.

 Architecturally Ceilometer should not be this expensive. We've got some
 data showing it to be aberrant from where we believe it should be. We
 should fix that.
 There are plenty of cases where we have had code that passes gate tests
 with acceptable performance but falls over in real deployment. I'm just
 saying that having a driver that works ok in the gate but does not work
 for real deployments is of no more value that not having it at all.
 Maybe less value.
 How do you propose to solve the problem of getting more ceilometer tests
 into the gate in the short-run? As a practical measure l don't see why
 it is so bad to have a separate job until the complex issue of whether
 it is possible to have a real-world performant sqla backend is resolved.
 Or did I miss something and it has already been determined that sqla
 could be used for large-scale deployments if we just fixed our code?

I think right now the ball is back in the ceilometer court to do some
performance profiling, and lets see what comes of that. I don't think
we're getting more test before the release in any real way.

 Once we get a base OS in the gate that lets us direct install mongo from
 base packages, we can also do that. Or someone can 3rd party it today.
 Then we'll even have comparative results to understand the differences.
 Yes. Do you know which base OS's are candidates for that?

Ubuntu 14.04 will have a sufficient level of Mongo, so some time in the
Juno cycle we should have it in the gate.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Jay Pipes
On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:
 Hello,
 
 We've done some profiling and results are quite interesting:
 during 1,5 hour ceilometer inserted 59755 events (59755 calls to
 record_metering_data)
 this calls resulted in total 2591573 SQL queries.

Yes, this matches my own experience with Ceilo+MySQL. But do not assume
that there are 2591573/59755 or around 43 queries per record meter
event. That is misleading. In fact, the number of queries per record
meter event increases over time, as the number of retries climbs due to
contention between readers and writers.

 And the most interesting part is that 291569 queries were ROLLBACK
 queries.

Yep, I noted that as well. But, this is not unique to Ceilometer by any
means. Just take a look at any database serving Nova, Cinder, Glance, or
anything that uses the common SQLAlchemy code. You will see a huge
percentage of entire number of queries taken up by ROLLBACK statements.
The problem in Ceilometer is just that the write:read ratio is much
higher than any of the other projects.

I had a suspicion that the rollbacks have to do with the way that the
oslo.db retry logic works, but I never had a chance to investigate it
further. Would be really interested to see similar stats against
PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
it is).

Best,
-jay

 We do around 5 rollbacks to record a single event!
 
 I guess it means that MySQL backend is currently totally unusable in
 production environment.
 
 Please find a full profiling graph attached.
 
 Regards,
 
 On 03/20/2014 10:31 PM, Sean Dague wrote:
 
  On 03/20/2014 01:01 PM, David Kranz wrote:
   On 03/20/2014 12:31 PM, Sean Dague wrote:
On 03/20/2014 11:35 AM, David Kranz wrote:
 On 03/20/2014 06:15 AM, Sean Dague wrote:
  On 03/20/2014 05:49 AM, Nadya Privalova wrote:
   Hi all,
   First of all, thanks for your suggestions!
   
   To summarize the discussions here:
   1. We are not going to install Mongo (because is's wrong ?)
  We are not going to install Mongo not from base distribution, 
  because
  we don't do that for things that aren't python. Our assumption is
  dependent services come from the base OS.
  
  That being said, being an integrated project means you have to be 
  able
  to function, sanely, on an sqla backend, as that will always be 
  part of
  your gate.
 This is a claim I think needs a bit more scrutiny if by sanely you
 mean performant. It seems we have an integrated project that no one
 would deploy using the sql db driver we have in the gate. Is any one
 doing that?  Is having a scalable sql back end a goal of ceilometer?
 
 More generally, if there is functionality that is of great importance 
 to
 any cloud deployment (and we would not integrate it if we didn't think
 it was) that cannot be deployed at scale using sqla, are we really 
 going
 to say it should not be a part of OpenStack because we refuse, for
 whatever reason, to run it in our gate using a driver that would
 actually be used? And if we do demand an sqla backend, how much time
 should we spend trying to optimize it if no one will really use it?
 Though the slow heat job is a little different because the slowness
 comes directly from running real use cases, perhaps we should just set
 up a slow ceilometer job if the sql version is too slow for its 
 budget
 in the main job.
 
 It seems like there is a similar thread, at least in part, about this
 around marconi.
We required a non mongo backend to graduate ceilometer. So I don't think
it's too much to ask that it actually works.

If the answer is that it will never work and it was a checkbox with no
intent to make it work, then it should be deprecated and removed from
the tree in Juno, with a big WARNING that you shouldn't ever use that
backend. Like Nova now does with all the virt drivers that aren't tested
upstream.

Shipping in tree code that you don't want people to use is bad for
users. Either commit to making it work, or deprecate it and remove it.

I don't see this as the same issue as the slow heat job. Heat,
architecturally, is going to be slow. It spins up real OSes and does
real thinks to them. There is no way that's ever going to be fast, and
the dedicated job was a recognition that to support this level of
services in OpenStack we need to give them more breathing room.
   Peace. I specifically noted that difference in my original comment. And
   for that reason the heat slow job may not be temporary.
Architecturally Ceilometer should not be this expensive. We've got some
data showing it to be aberrant from where we believe it should be. We
should fix that.
   There are plenty of cases where we have had code that passes gate tests
   with acceptable performance but falls 

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Joe Gordon
On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko 
alexei.kornie...@gmail.com wrote:

  Hello,

 We've done some profiling and results are quite interesting:
 during 1,5 hour ceilometer inserted 59755 events (59755 calls to
 record_metering_data)
 this calls resulted in total 2591573 SQL queries.

 And the most interesting part is that 291569 queries were ROLLBACK queries.
 We do around 5 rollbacks to record a single event!

 I guess it means that MySQL backend is currently totally unusable in
 production environment.


It should be noticed that SQLAlchemy is horrible for performance, in nova
we usually see sqlalchemy overheads of well over 10x (time nova.db.api call
vs the time MySQL measures when slow log is recording everything).



 Please find a full profiling graph attached.

 Regards,


 On 03/20/2014 10:31 PM, Sean Dague wrote:

 On 03/20/2014 01:01 PM, David Kranz wrote:

  On 03/20/2014 12:31 PM, Sean Dague wrote:

  On 03/20/2014 11:35 AM, David Kranz wrote:

  On 03/20/2014 06:15 AM, Sean Dague wrote:

  On 03/20/2014 05:49 AM, Nadya Privalova wrote:

  Hi all,
 First of all, thanks for your suggestions!

 To summarize the discussions here:
 1. We are not going to install Mongo (because is's wrong ?)

  We are not going to install Mongo not from base distribution, because
 we don't do that for things that aren't python. Our assumption is
 dependent services come from the base OS.

 That being said, being an integrated project means you have to be able
 to function, sanely, on an sqla backend, as that will always be part of
 your gate.

  This is a claim I think needs a bit more scrutiny if by sanely you
 mean performant. It seems we have an integrated project that no one
 would deploy using the sql db driver we have in the gate. Is any one
 doing that?  Is having a scalable sql back end a goal of ceilometer?

 More generally, if there is functionality that is of great importance to
 any cloud deployment (and we would not integrate it if we didn't think
 it was) that cannot be deployed at scale using sqla, are we really going
 to say it should not be a part of OpenStack because we refuse, for
 whatever reason, to run it in our gate using a driver that would
 actually be used? And if we do demand an sqla backend, how much time
 should we spend trying to optimize it if no one will really use it?
 Though the slow heat job is a little different because the slowness
 comes directly from running real use cases, perhaps we should just set
 up a slow ceilometer job if the sql version is too slow for its budget
 in the main job.

 It seems like there is a similar thread, at least in part, about this
 around marconi.

  We required a non mongo backend to graduate ceilometer. So I don't think
 it's too much to ask that it actually works.

 If the answer is that it will never work and it was a checkbox with no
 intent to make it work, then it should be deprecated and removed from
 the tree in Juno, with a big WARNING that you shouldn't ever use that
 backend. Like Nova now does with all the virt drivers that aren't tested
 upstream.

 Shipping in tree code that you don't want people to use is bad for
 users. Either commit to making it work, or deprecate it and remove it.

 I don't see this as the same issue as the slow heat job. Heat,
 architecturally, is going to be slow. It spins up real OSes and does
 real thinks to them. There is no way that's ever going to be fast, and
 the dedicated job was a recognition that to support this level of
 services in OpenStack we need to give them more breathing room.

  Peace. I specifically noted that difference in my original comment. And
 for that reason the heat slow job may not be temporary.

  Architecturally Ceilometer should not be this expensive. We've got some
 data showing it to be aberrant from where we believe it should be. We
 should fix that.

  There are plenty of cases where we have had code that passes gate tests
 with acceptable performance but falls over in real deployment. I'm just
 saying that having a driver that works ok in the gate but does not work
 for real deployments is of no more value that not having it at all.
 Maybe less value.
 How do you propose to solve the problem of getting more ceilometer tests
 into the gate in the short-run? As a practical measure l don't see why
 it is so bad to have a separate job until the complex issue of whether
 it is possible to have a real-world performant sqla backend is resolved.
 Or did I miss something and it has already been determined that sqla
 could be used for large-scale deployments if we just fixed our code?

  I think right now the ball is back in the ceilometer court to do some
 performance profiling, and lets see what comes of that. I don't think
 we're getting more test before the release in any real way.


  Once we get a base OS in the gate that lets us direct install mongo from
 base packages, we can also do that. Or someone can 3rd party it today.
 Then we'll even have comparative 

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Alexei Kornienko


On 03/21/2014 12:15 AM, Jay Pipes wrote:

On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:

Hello,

We've done some profiling and results are quite interesting:
during 1,5 hour ceilometer inserted 59755 events (59755 calls to
record_metering_data)
this calls resulted in total 2591573 SQL queries.

Yes, this matches my own experience with Ceilo+MySQL. But do not assume
that there are 2591573/59755 or around 43 queries per record meter
event. That is misleading. In fact, the number of queries per record
meter event increases over time, as the number of retries climbs due to
contention between readers and writers.


And the most interesting part is that 291569 queries were ROLLBACK
queries.

Yep, I noted that as well. But, this is not unique to Ceilometer by any
means. Just take a look at any database serving Nova, Cinder, Glance, or
anything that uses the common SQLAlchemy code. You will see a huge
percentage of entire number of queries taken up by ROLLBACK statements.
The problem in Ceilometer is just that the write:read ratio is much
higher than any of the other projects.

I had a suspicion that the rollbacks have to do with the way that the
oslo.db retry logic works, but I never had a chance to investigate it
further. Would be really interested to see similar stats against
PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
it is).

Rollbacks are caused not by retry logic but by create_or_update logic:
We first try to do INSERT in sub-transaction when it fails we rollback 
this transaction and do update instead.

This is caused by poorly designed schema that requires such hacks.
Cause of this I suspect that we'll have similar results for PostgreSQL.

Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if 
there is any difference.




Best,
-jay


We do around 5 rollbacks to record a single event!

I guess it means that MySQL backend is currently totally unusable in
production environment.

Please find a full profiling graph attached.

Regards,

On 03/20/2014 10:31 PM, Sean Dague wrote:


On 03/20/2014 01:01 PM, David Kranz wrote:

On 03/20/2014 12:31 PM, Sean Dague wrote:

On 03/20/2014 11:35 AM, David Kranz wrote:

On 03/20/2014 06:15 AM, Sean Dague wrote:

On 03/20/2014 05:49 AM, Nadya Privalova wrote:

Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because is's wrong ?)

We are not going to install Mongo not from base distribution, because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.

This is a claim I think needs a bit more scrutiny if by sanely you
mean performant. It seems we have an integrated project that no one
would deploy using the sql db driver we have in the gate. Is any one
doing that?  Is having a scalable sql back end a goal of ceilometer?

More generally, if there is functionality that is of great importance to
any cloud deployment (and we would not integrate it if we didn't think
it was) that cannot be deployed at scale using sqla, are we really going
to say it should not be a part of OpenStack because we refuse, for
whatever reason, to run it in our gate using a driver that would
actually be used? And if we do demand an sqla backend, how much time
should we spend trying to optimize it if no one will really use it?
Though the slow heat job is a little different because the slowness
comes directly from running real use cases, perhaps we should just set
up a slow ceilometer job if the sql version is too slow for its budget
in the main job.

It seems like there is a similar thread, at least in part, about this
around marconi.

We required a non mongo backend to graduate ceilometer. So I don't think
it's too much to ask that it actually works.

If the answer is that it will never work and it was a checkbox with no
intent to make it work, then it should be deprecated and removed from
the tree in Juno, with a big WARNING that you shouldn't ever use that
backend. Like Nova now does with all the virt drivers that aren't tested
upstream.

Shipping in tree code that you don't want people to use is bad for
users. Either commit to making it work, or deprecate it and remove it.

I don't see this as the same issue as the slow heat job. Heat,
architecturally, is going to be slow. It spins up real OSes and does
real thinks to them. There is no way that's ever going to be fast, and
the dedicated job was a recognition that to support this level of
services in OpenStack we need to give them more breathing room.

Peace. I specifically noted that difference in my original comment. And
for that reason the heat slow job may not be temporary.

Architecturally Ceilometer should not be this expensive. We've got some
data showing it to be aberrant from where we believe it 

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Jay Pipes
On Fri, 2014-03-21 at 00:32 +0200, Alexei Kornienko wrote:
 On 03/21/2014 12:15 AM, Jay Pipes wrote:
  On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:
  Hello,
 
  We've done some profiling and results are quite interesting:
  during 1,5 hour ceilometer inserted 59755 events (59755 calls to
  record_metering_data)
  this calls resulted in total 2591573 SQL queries.
  Yes, this matches my own experience with Ceilo+MySQL. But do not assume
  that there are 2591573/59755 or around 43 queries per record meter
  event. That is misleading. In fact, the number of queries per record
  meter event increases over time, as the number of retries climbs due to
  contention between readers and writers.
 
  And the most interesting part is that 291569 queries were ROLLBACK
  queries.
  Yep, I noted that as well. But, this is not unique to Ceilometer by any
  means. Just take a look at any database serving Nova, Cinder, Glance, or
  anything that uses the common SQLAlchemy code. You will see a huge
  percentage of entire number of queries taken up by ROLLBACK statements.
  The problem in Ceilometer is just that the write:read ratio is much
  higher than any of the other projects.
 
  I had a suspicion that the rollbacks have to do with the way that the
  oslo.db retry logic works, but I never had a chance to investigate it
  further. Would be really interested to see similar stats against
  PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
  it is).
 Rollbacks are caused not by retry logic but by create_or_update logic:
 We first try to do INSERT in sub-transaction when it fails we rollback 
 this transaction and do update instead.

No, that isn't correct, AFAIK. We first do a SELECT into the table and
then if no result, try an insert:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L286-L292

The problem, IMO, is twofold. There does not need to be nested
transactional containers around these create_or_update lookups -- i.e.
the lookups can be done outside of the main transaction begin here:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L335

Secondly, given the volume of inserts (that also generate selects), a
simple memcache lookup cache would be highly beneficial in cutting down
on writer/reader contention in MySQL.

These are things that can be done without changing the schema (which has
other issues that can be looked at of course).

Best,
-jay

 This is caused by poorly designed schema that requires such hacks.
 Cause of this I suspect that we'll have similar results for PostgreSQL.
 
 Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if 
 there is any difference.
 
 
  Best,
  -jay
 
  We do around 5 rollbacks to record a single event!
 
  I guess it means that MySQL backend is currently totally unusable in
  production environment.
 
  Please find a full profiling graph attached.
 
  Regards,
 
  On 03/20/2014 10:31 PM, Sean Dague wrote:
 
  On 03/20/2014 01:01 PM, David Kranz wrote:
  On 03/20/2014 12:31 PM, Sean Dague wrote:
  On 03/20/2014 11:35 AM, David Kranz wrote:
  On 03/20/2014 06:15 AM, Sean Dague wrote:
  On 03/20/2014 05:49 AM, Nadya Privalova wrote:
  Hi all,
  First of all, thanks for your suggestions!
 
  To summarize the discussions here:
  1. We are not going to install Mongo (because is's wrong ?)
  We are not going to install Mongo not from base distribution, 
  because
  we don't do that for things that aren't python. Our assumption is
  dependent services come from the base OS.
 
  That being said, being an integrated project means you have to be able
  to function, sanely, on an sqla backend, as that will always be part 
  of
  your gate.
  This is a claim I think needs a bit more scrutiny if by sanely you
  mean performant. It seems we have an integrated project that no one
  would deploy using the sql db driver we have in the gate. Is any one
  doing that?  Is having a scalable sql back end a goal of ceilometer?
 
  More generally, if there is functionality that is of great importance 
  to
  any cloud deployment (and we would not integrate it if we didn't think
  it was) that cannot be deployed at scale using sqla, are we really 
  going
  to say it should not be a part of OpenStack because we refuse, for
  whatever reason, to run it in our gate using a driver that would
  actually be used? And if we do demand an sqla backend, how much time
  should we spend trying to optimize it if no one will really use it?
  Though the slow heat job is a little different because the slowness
  comes directly from running real use cases, perhaps we should just set
  up a slow ceilometer job if the sql version is too slow for its 
  budget
  in the main job.
 
  It seems like there is a similar thread, at least in part, about this
  around marconi.
  We required a non mongo backend to graduate ceilometer. So I don't think
  it's too much to ask that it actually 

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Alexei Kornienko

On 03/21/2014 12:53 AM, Jay Pipes wrote:

On Fri, 2014-03-21 at 00:32 +0200, Alexei Kornienko wrote:

On 03/21/2014 12:15 AM, Jay Pipes wrote:

On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:

Hello,

We've done some profiling and results are quite interesting:
during 1,5 hour ceilometer inserted 59755 events (59755 calls to
record_metering_data)
this calls resulted in total 2591573 SQL queries.

Yes, this matches my own experience with Ceilo+MySQL. But do not assume
that there are 2591573/59755 or around 43 queries per record meter
event. That is misleading. In fact, the number of queries per record
meter event increases over time, as the number of retries climbs due to
contention between readers and writers.


And the most interesting part is that 291569 queries were ROLLBACK
queries.

Yep, I noted that as well. But, this is not unique to Ceilometer by any
means. Just take a look at any database serving Nova, Cinder, Glance, or
anything that uses the common SQLAlchemy code. You will see a huge
percentage of entire number of queries taken up by ROLLBACK statements.
The problem in Ceilometer is just that the write:read ratio is much
higher than any of the other projects.

I had a suspicion that the rollbacks have to do with the way that the
oslo.db retry logic works, but I never had a chance to investigate it
further. Would be really interested to see similar stats against
PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
it is).

Rollbacks are caused not by retry logic but by create_or_update logic:
We first try to do INSERT in sub-transaction when it fails we rollback
this transaction and do update instead.

No, that isn't correct, AFAIK. We first do a SELECT into the table and
then if no result, try an insert:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L286-L292

The problem, IMO, is twofold. There does not need to be nested
transactional containers around these create_or_update lookups -- i.e.
the lookups can be done outside of the main transaction begin here:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L335

I'm afraid you are wrong here:

nested  =  session.connection().dialect.name  !=  'sqlite' # always True for 
MySQL
if  not  nested  and  session.query(model_class).get(str(_id)): # always False

Short circuit is used and no select is ever performed in MySQL.


Secondly, given the volume of inserts (that also generate selects), a
simple memcache lookup cache would be highly beneficial in cutting down
on writer/reader contention in MySQL.
You are right but I'm afraid that adding memcache will make deployment 
more complicated.


These are things that can be done without changing the schema (which has
other issues that can be looked at of course).

Best,
-jay


This is caused by poorly designed schema that requires such hacks.
Cause of this I suspect that we'll have similar results for PostgreSQL.

Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if
there is any difference.


Best,
-jay


We do around 5 rollbacks to record a single event!

I guess it means that MySQL backend is currently totally unusable in
production environment.

Please find a full profiling graph attached.

Regards,

On 03/20/2014 10:31 PM, Sean Dague wrote:


On 03/20/2014 01:01 PM, David Kranz wrote:

On 03/20/2014 12:31 PM, Sean Dague wrote:

On 03/20/2014 11:35 AM, David Kranz wrote:

On 03/20/2014 06:15 AM, Sean Dague wrote:

On 03/20/2014 05:49 AM, Nadya Privalova wrote:

Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because is's wrong ?)

We are not going to install Mongo not from base distribution, because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.

This is a claim I think needs a bit more scrutiny if by sanely you
mean performant. It seems we have an integrated project that no one
would deploy using the sql db driver we have in the gate. Is any one
doing that?  Is having a scalable sql back end a goal of ceilometer?

More generally, if there is functionality that is of great importance to
any cloud deployment (and we would not integrate it if we didn't think
it was) that cannot be deployed at scale using sqla, are we really going
to say it should not be a part of OpenStack because we refuse, for
whatever reason, to run it in our gate using a driver that would
actually be used? And if we do demand an sqla backend, how much time
should we spend trying to optimize it if no one will really use it?
Though the slow heat job is a little different because the slowness
comes directly from running real use cases, perhaps we should just set
up a slow ceilometer job if the sql version is too 

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Gordon Chung
Alexei, awesome work.

 Rollbacks are caused not by retry logic but by create_or_update logic:
 We first try to do INSERT in sub-transaction when it fails we rollback 
 this transaction and do update instead.

if you take a look at my patch addressing deadlocks(
https://review.openstack.org/#/c/80461/), i actually added a check to get 
rid of the blind insert logic we had so that should lower the number of 
rollbacks (except for race conditions, which is what the function was 
designed for). i did some minor performance testing as well and will add a 
few notes to the patch where performance can be improved but requires a 
larger schema change.  Jay, please take a look there as well if you have 
time.

 Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if 
 there is any difference.

i look forward to these results, from my quick testing with Mongo, we get 
about 10x the write speed vs mysql.

  We required a non mongo backend to graduate ceilometer. So I don't 
think
  it's too much to ask that it actually works.

i don't think sql is the recommended back in real deployments but that 
said, given the modest load of tempest tests, i would expect our sql 
backend be able to handle it.

cheers,
gordon chung
openstack, ibm software standards___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Jay Pipes
On Fri, 2014-03-21 at 01:02 +0200, Alexei Kornienko wrote:
 On 03/21/2014 12:53 AM, Jay Pipes wrote:
  On Fri, 2014-03-21 at 00:32 +0200, Alexei Kornienko wrote:
  On 03/21/2014 12:15 AM, Jay Pipes wrote:
  On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:
  Hello,
 
  We've done some profiling and results are quite interesting:
  during 1,5 hour ceilometer inserted 59755 events (59755 calls to
  record_metering_data)
  this calls resulted in total 2591573 SQL queries.
  Yes, this matches my own experience with Ceilo+MySQL. But do not assume
  that there are 2591573/59755 or around 43 queries per record meter
  event. That is misleading. In fact, the number of queries per record
  meter event increases over time, as the number of retries climbs due to
  contention between readers and writers.
 
  And the most interesting part is that 291569 queries were ROLLBACK
  queries.
  Yep, I noted that as well. But, this is not unique to Ceilometer by any
  means. Just take a look at any database serving Nova, Cinder, Glance, or
  anything that uses the common SQLAlchemy code. You will see a huge
  percentage of entire number of queries taken up by ROLLBACK statements.
  The problem in Ceilometer is just that the write:read ratio is much
  higher than any of the other projects.
 
  I had a suspicion that the rollbacks have to do with the way that the
  oslo.db retry logic works, but I never had a chance to investigate it
  further. Would be really interested to see similar stats against
  PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
  it is).
  Rollbacks are caused not by retry logic but by create_or_update logic:
  We first try to do INSERT in sub-transaction when it fails we rollback
  this transaction and do update instead.
  No, that isn't correct, AFAIK. We first do a SELECT into the table and
  then if no result, try an insert:
 
  https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L286-L292
 
  The problem, IMO, is twofold. There does not need to be nested
  transactional containers around these create_or_update lookups -- i.e.
  the lookups can be done outside of the main transaction begin here:
 
  https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L335
 I'm afraid you are wrong here:
 
 nested  =  session.connection().dialect.name  !=  'sqlite' # always True for 
 MySQL
 if  not  nested  and  session.query(model_class).get(str(_id)): # always False
 
 Short circuit is used and no select is ever performed in MySQL.

Doh, true enough! /me wonders why this is written like so (only for
sqlite...?)

  Secondly, given the volume of inserts (that also generate selects), a
  simple memcache lookup cache would be highly beneficial in cutting down
  on writer/reader contention in MySQL.
 You are right but I'm afraid that adding memcache will make deployment 
 more complicated.
 
  These are things that can be done without changing the schema (which has
  other issues that can be looked at of course).
 
  Best,
  -jay
 
  This is caused by poorly designed schema that requires such hacks.
  Cause of this I suspect that we'll have similar results for PostgreSQL.
 
  Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if
  there is any difference.
 
  Best,
  -jay
 
  We do around 5 rollbacks to record a single event!
 
  I guess it means that MySQL backend is currently totally unusable in
  production environment.
 
  Please find a full profiling graph attached.
 
  Regards,
 
  On 03/20/2014 10:31 PM, Sean Dague wrote:
 
  On 03/20/2014 01:01 PM, David Kranz wrote:
  On 03/20/2014 12:31 PM, Sean Dague wrote:
  On 03/20/2014 11:35 AM, David Kranz wrote:
  On 03/20/2014 06:15 AM, Sean Dague wrote:
  On 03/20/2014 05:49 AM, Nadya Privalova wrote:
  Hi all,
  First of all, thanks for your suggestions!
 
  To summarize the discussions here:
  1. We are not going to install Mongo (because is's wrong ?)
  We are not going to install Mongo not from base distribution, 
  because
  we don't do that for things that aren't python. Our assumption is
  dependent services come from the base OS.
 
  That being said, being an integrated project means you have to be 
  able
  to function, sanely, on an sqla backend, as that will always be 
  part of
  your gate.
  This is a claim I think needs a bit more scrutiny if by sanely you
  mean performant. It seems we have an integrated project that no one
  would deploy using the sql db driver we have in the gate. Is any one
  doing that?  Is having a scalable sql back end a goal of ceilometer?
 
  More generally, if there is functionality that is of great 
  importance to
  any cloud deployment (and we would not integrate it if we didn't 
  think
  it was) that cannot be deployed at scale using sqla, are we really 
  going
  to say it should not be a part of OpenStack because we refuse, for
  whatever reason, to run it in our gate using a driver that would
  

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Nadya Privalova
Ok, so we don't want to switch to UCA, let's consider this variant.
What options do we have to make possible to run Ceilometer jobs with Mongo
backend?
I see only  https://review.openstack.org/#/c/81001/ or making Ceilometer
able to work with old Mongo. But the last variant looks inappropriate at
least in Icehouse.
What am I missing here? Maybe there is smth else we can do?


On Tue, Mar 18, 2014 at 9:28 PM, Tim Bell tim.b...@cern.ch wrote:



 If UCA is required, what would be the upgrade path for a currently running
 OpenStack Havana site to Icehouse with this requirement ?



 Would it be an online upgrade (i.e. what order to upgrade the different
 components in order to keep things running at all times) ?



 Tim



 *From:* Chmouel Boudjnah [mailto:chmo...@enovance.com]
 *Sent:* 18 March 2014 17:58
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra]
 Ceilometer tempest testing in gate





 On Tue, Mar 18, 2014 at 5:21 PM, Sean Dague s...@dague.net wrote:

  So I'm still -1 at the point in making UCA our default run environment
 until it's provably functional for a period of time. Because working
 around upstream distro breaks is no fun.



 I agree, if UCA is not very stable ATM, this os going to cause us more
 pain, but what would be the plan of action? a non-voting gate for
 ceilometer as a start ? (if that's possible).

 Chmouel

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Joe Gordon
On Wed, Mar 19, 2014 at 1:52 AM, Nadya Privalova nprival...@mirantis.comwrote:

 Ok, so we don't want to switch to UCA, let's consider this variant.
 What options do we have to make possible to run Ceilometer jobs with Mongo
 backend?
 I see only  https://review.openstack.org/#/c/81001/ or making Ceilometer
 able to work with old Mongo. But the last variant looks inappropriate at
 least in Icehouse.
 What am I missing here? Maybe there is smth else we can do?


If ceilometer says it supports MySQL then it should work, we shouldn't be
forced to switch to an alternate backend.




 On Tue, Mar 18, 2014 at 9:28 PM, Tim Bell tim.b...@cern.ch wrote:



 If UCA is required, what would be the upgrade path for a currently
 running OpenStack Havana site to Icehouse with this requirement ?



 Would it be an online upgrade (i.e. what order to upgrade the different
 components in order to keep things running at all times) ?



 Tim



 *From:* Chmouel Boudjnah [mailto:chmo...@enovance.com]
 *Sent:* 18 March 2014 17:58
 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra]
 Ceilometer tempest testing in gate





 On Tue, Mar 18, 2014 at 5:21 PM, Sean Dague s...@dague.net wrote:

  So I'm still -1 at the point in making UCA our default run environment
 until it's provably functional for a period of time. Because working
 around upstream distro breaks is no fun.



 I agree, if UCA is not very stable ATM, this os going to cause us more
 pain, but what would be the plan of action? a non-voting gate for
 ceilometer as a start ? (if that's possible).

 Chmouel

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Doug Hellmann
The ceilometer collector is meant to scale horizontally. Have you tried
configuring the test environment to run more than one copy, to process the
notifications more quickly?

Doug


On Tue, Mar 18, 2014 at 8:09 AM, Nadya Privalova nprival...@mirantis.comwrote:

 Hi folks,

 I'd like to discuss Ceilometer's tempest situation with you.
 Now we have several patch sets on review that test core functionality of
 Ceilometer: notificaton and pollstering (topic
 https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:bp/add-basic-ceilometer-tests,n,z).
 But there is a problem: Ceilometer performance is very poor on mysql and
 postgresql because of the bug
 https://bugs.launchpad.net/ceilometer/+bug/1291054. Mongo behaves much
 better even in single thread and I hope that it's performance will be
 enough to successfully run Ceilometer tempest tests.
 Let me explain in several words why tempest tests is mostly performance
 tests for Ceilometer. The thing is that Ceilometer service is running
 during all other nova, cinder and so on tests run. All the tests create
 instances, volumes and each creation produces a lot of notifications. Each
 notification is the entry to database. So Ceilometer cannot process such a
 big amount of notifications quickly. Ceilometer tests have 'telemetry'
 prefix and it means that they will be started in the last turn. And it
 makes situation even worst.
 So my proposal:
 1. create a non-voting job with Mongo-backend
 2. make sure that tests pass on Mongo
 3. merge tests to tempest but skip that on postgres and mysql till
 bug/1291054 is resolved
 4. make the new job 'voting'

 The problem is only in Mongo installation. I have a cr
 https://review.openstack.org/#/c/81001/ that will allow us to install
 Mongo from deb. From the other hand there is
 https://review.openstack.org/#/c/74889/ that enables UCA. I'm
 collaborating with infra-team to make the decision ASAP because AFAIU we
 need tempest tests in Icehouse (for more discussion you are welcome to
 thread  [openstack-dev] Updating libvirt in gate jobs).

 If you have any thoughts on this please share.

 Thanks for attention,
 Nadya


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Joe Gordon
On Wed, Mar 19, 2014 at 3:09 PM, Doug Hellmann
doug.hellm...@dreamhost.comwrote:

 The ceilometer collector is meant to scale horizontally. Have you tried
 configuring the test environment to run more than one copy, to process the
 notifications more quickly?


FYI:
http://logs.openstack.org/82/79182/1/check/check-tempest-dsvm-neutron/156f1d4/logs/screen-dstat.txt.gz




 Doug


 On Tue, Mar 18, 2014 at 8:09 AM, Nadya Privalova 
 nprival...@mirantis.comwrote:

 Hi folks,

 I'd like to discuss Ceilometer's tempest situation with you.
 Now we have several patch sets on review that test core functionality of
 Ceilometer: notificaton and pollstering (topic
 https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:bp/add-basic-ceilometer-tests,n,z).
 But there is a problem: Ceilometer performance is very poor on mysql and
 postgresql because of the bug
 https://bugs.launchpad.net/ceilometer/+bug/1291054. Mongo behaves much
 better even in single thread and I hope that it's performance will be
 enough to successfully run Ceilometer tempest tests.
 Let me explain in several words why tempest tests is mostly performance
 tests for Ceilometer. The thing is that Ceilometer service is running
 during all other nova, cinder and so on tests run. All the tests create
 instances, volumes and each creation produces a lot of notifications. Each
 notification is the entry to database. So Ceilometer cannot process such a
 big amount of notifications quickly. Ceilometer tests have 'telemetry'
 prefix and it means that they will be started in the last turn. And it
 makes situation even worst.
 So my proposal:
 1. create a non-voting job with Mongo-backend
 2. make sure that tests pass on Mongo
 3. merge tests to tempest but skip that on postgres and mysql till
 bug/1291054 is resolved
 4. make the new job 'voting'

 The problem is only in Mongo installation. I have a cr
 https://review.openstack.org/#/c/81001/ that will allow us to install
 Mongo from deb. From the other hand there is
 https://review.openstack.org/#/c/74889/ that enables UCA. I'm
 collaborating with infra-team to make the decision ASAP because AFAIU we
 need tempest tests in Icehouse (for more discussion you are welcome to
 thread  [openstack-dev] Updating libvirt in gate jobs).

 If you have any thoughts on this please share.

 Thanks for attention,
 Nadya


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Sean Dague
On 03/19/2014 06:09 PM, Doug Hellmann wrote:
 The ceilometer collector is meant to scale horizontally. Have you tried
 configuring the test environment to run more than one copy, to process
 the notifications more quickly?

The ceilometer collector is already one of the top running processes on
the box -
http://logs.openstack.org/82/81282/2/check/check-tempest-dsvm-full/693dc3b/logs/dstat.txt.gz


Often consuming  1/2 a core (25% == 1 core in that run, as can been
seen when qemu boots and pegs one).

So while we could spin up more collectors, I think it's unreasonable
that the majority of our cpu has to be handed over to the metric
collector to make it function responsively. I thought the design point
was that this was low impact.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Sean Dague
On 03/18/2014 08:09 AM, Nadya Privalova wrote:
 Hi folks,
 
 I'd like to discuss Ceilometer's tempest situation with you.
 Now we have several patch sets on review that test core functionality of
 Ceilometer: notificaton and pollstering (topic
 https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:bp/add-basic-ceilometer-tests,n,z).
 But there is a problem: Ceilometer performance is very poor on mysql and
 postgresql because of the bug
 https://bugs.launchpad.net/ceilometer/+bug/1291054. Mongo behaves much
 better even in single thread and I hope that it's performance will be
 enough to successfully run Ceilometer tempest tests.
 Let me explain in several words why tempest tests is mostly performance
 tests for Ceilometer. The thing is that Ceilometer service is running
 during all other nova, cinder and so on tests run. All the tests create
 instances, volumes and each creation produces a lot of notifications.
 Each notification is the entry to database. So Ceilometer cannot process
 such a big amount of notifications quickly. Ceilometer tests have
 'telemetry' prefix and it means that they will be started in the last
 turn. And it makes situation even worst.
 So my proposal:
 1. create a non-voting job with Mongo-backend
 2. make sure that tests pass on Mongo
 3. merge tests to tempest but skip that on postgres and mysql till
 bug/1291054 is resolved
 4. make the new job 'voting'
 
 The problem is only in Mongo installation. I have a cr
 https://review.openstack.org/#/c/81001/ that will allow us to install
 Mongo from deb. From the other hand there is
 https://review.openstack.org/#/c/74889/ that enables UCA. I'm
 collaborating with infra-team to make the decision ASAP because AFAIU we
 need tempest tests in Icehouse (for more discussion you are welcome to
 thread  [openstack-dev] Updating libvirt in gate jobs).
 
 If you have any thoughts on this please share.

There is a fundamental problem here that the Ceilometer team requires a
version of Mongo that's not provided by the distro. We've taken a pretty
hard line on not requiring newer versions of non python stuff than the
distros we support actually have.

And the SQL backend is basically unusable from what I can tell.

So I'm -2 on injecting an arbitrary upstream Mongo in devstack.

What is preventing Ceilometer from bringing back support for the mongo
that you can get from 12.04? That seems like it should be the much
higher priority item. Then we could actually be gating Ceilometer
features on what the platforms can actually support. Then I'd be happy
to support a Mongo job running in tests.

Once that was done, we can start unpacking some of the other issues.

I'm not sure how changing to using 4 cores in the gate is going to
reduce the list command from 120s to 2s, so that doesn't really seem to
be the core issue (and is likely to just cause db deadlocks).

As long as Ceilometer says it supports SQL backends, it needs to do so
in a sane way. So that should still be gating.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Julien Danjou
On Tue, Mar 18 2014, Sean Dague wrote:

 There is a fundamental problem here that the Ceilometer team requires a
 version of Mongo that's not provided by the distro. We've taken a pretty
 hard line on not requiring newer versions of non python stuff than the
 distros we support actually have.

MongoDB 2.4 is in UCA for a while now. We just can't use it because of
libvirt bug https://bugs.launchpad.net/nova/+bug/1228977.

-- 
Julien Danjou
/* Free Software hacker
   http://julien.danjou.info */


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Sean Dague
On 03/18/2014 09:02 AM, Julien Danjou wrote:
 On Tue, Mar 18 2014, Sean Dague wrote:
 
 There is a fundamental problem here that the Ceilometer team requires a
 version of Mongo that's not provided by the distro. We've taken a pretty
 hard line on not requiring newer versions of non python stuff than the
 distros we support actually have.
 
 MongoDB 2.4 is in UCA for a while now. We just can't use it because of
 libvirt bug https://bugs.launchpad.net/nova/+bug/1228977.

We've not required UCA for any other project to pass the gate. So what
is the issue with Mongo 2.0.4 that makes it unsupportable in ceilometer?

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Julien Danjou
On Tue, Mar 18 2014, Sean Dague wrote:

 We've not required UCA for any other project to pass the gate. So what
 is the issue with Mongo 2.0.4 that makes it unsupportable in ceilometer?

We require features not present in MongoDB  2.2.

-- 
Julien Danjou
-- Free Software hacker
-- http://julien.danjou.info


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Chmouel Boudjnah
On Tue, Mar 18, 2014 at 2:09 PM, Sean Dague s...@dague.net wrote:

 We've not required UCA for any other project to pass the gate.



Is it that bad to have UCA in default devstack, as far as I know UCA is the
official way to do OpenStack on ubuntu, right?

Chmouel.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Sean Dague
On 03/18/2014 12:09 PM, Chmouel Boudjnah wrote:
 
 On Tue, Mar 18, 2014 at 2:09 PM, Sean Dague s...@dague.net
 mailto:s...@dague.net wrote:
 
 We've not required UCA for any other project to pass the gate.
 
 
 
 Is it that bad to have UCA in default devstack, as far as I know UCA is
 the official way to do OpenStack on ubuntu, right?

Currently we can't use it because libvirt in UCA remains too buggy to
run under the gate. If we had it turned on we'd see an astronomical
failure rate.

That is hopefully getting fixed, thanks to a lot of leg work by dims, as
it's required a lot of chasing.

However, I still believe UCA remains problematic, because our
experiences to date are basically that the entrance criteria for content
in UCA is clearly less than the base distro. And we are very likely to
be broken by changes put into it, as seen by the inability to run our
tests on top of it.

So I'm still -1 at the point in making UCA our default run environment
until it's provably functional for a period of time. Because working
around upstream distro breaks is no fun.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Chmouel Boudjnah
On Tue, Mar 18, 2014 at 5:21 PM, Sean Dague s...@dague.net wrote:

 So I'm still -1 at the point in making UCA our default run environment
 until it's provably functional for a period of time. Because working
 around upstream distro breaks is no fun.



I agree, if UCA is not very stable ATM, this os going to cause us more
pain, but what would be the plan of action? a non-voting gate for
ceilometer as a start ? (if that's possible).

Chmouel
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Tim Bell

If UCA is required, what would be the upgrade path for a currently running 
OpenStack Havana site to Icehouse with this requirement ?

Would it be an online upgrade (i.e. what order to upgrade the different 
components in order to keep things running at all times) ?

Tim

From: Chmouel Boudjnah [mailto:chmo...@enovance.com]
Sent: 18 March 2014 17:58
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer 
tempest testing in gate


On Tue, Mar 18, 2014 at 5:21 PM, Sean Dague 
s...@dague.netmailto:s...@dague.net wrote:
So I'm still -1 at the point in making UCA our default run environment
until it's provably functional for a period of time. Because working
around upstream distro breaks is no fun.

I agree, if UCA is not very stable ATM, this os going to cause us more pain, 
but what would be the plan of action? a non-voting gate for ceilometer as a 
start ? (if that's possible).

Chmouel
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev