Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-22 Thread Sean Dague
On 03/21/2014 05:11 PM, Joe Gordon wrote:
> 
> 
> 
> On Fri, Mar 21, 2014 at 4:04 AM, Sean Dague  > wrote:
> 
> On 03/20/2014 06:18 PM, Joe Gordon wrote:
> >
> >
> >
> > On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
> > mailto:alexei.kornie...@gmail.com>
>  >> wrote:
> >
> > Hello,
> >
> > We've done some profiling and results are quite interesting:
> > during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> > record_metering_data)
> > this calls resulted in total 2591573 SQL queries.
> >
> > And the most interesting part is that 291569 queries were ROLLBACK
> > queries.
> > We do around 5 rollbacks to record a single event!
> >
> > I guess it means that MySQL backend is currently totally
> unusable in
> > production environment.
> >
> >
> > It should be noticed that SQLAlchemy is horrible for performance, in
> > nova we usually see sqlalchemy overheads of well over 10x (time
> > nova.db.api call vs the time MySQL measures when slow log is recording
> > everything).
> 
> That's not really a fair assessment. Python object inflation takes time.
> I do get that there is SQLA overhead here, but even if you trimmed it
> out you would not get the the mysql query time.
> 
> 
> To give an example from nova:
> 
> doing a nova list with no servers:
> 
> stack@devstack:~/devstack$ nova --timing list 
> 
> | GET
> http://10.0.0.16:8774/v2/a82ededa9a934b93a7184d06f302d745/servers/detail
> | 0.0817470550537 |
> 
> So nova command takes 0.0817470550537 seconds.
> 
> Inside the nova logs (when putting a timer around all nova.db.api calls
> [1] ), nova.db.api.instance_get_all_by_filters takes 0.06 seconds:
> 
> 2014-03-21 20:58:46.760 DEBUG nova.db.api
> [req-91879f86-7665-4943-8953-41c92c42c030 demo demo]
> 'instance_get_all_by_filters' 0.06 seconds timed
> /mnt/stack/nova/nova/db/api.py:1940
> 
> But the sql slow long reports the same query takes only 0.001006 seconds
> with a lock_time of 0.000269 for a total of  0.00127 seconds.
> 
> # Query_time: 0.001006  Lock_time: 0.000269 Rows_sent: 0
>  Rows_examined: 0
> 
> 
> So in this case only 2% of the time
> that  nova.db.api.instance_get_all_by_filters takes is spent inside of
> mysql. Or to put it differently  nova.db.api.instance_get_all_by_filters
> is 47 times slower then the raw DB call underneath.
> 
> Yes I agree that that turning raw sql data into python objects should
> take time, but I just don't think it should take 98% of the time.
> 
> [1] 
> https://github.com/jogo/nova/commit/7743ee366bbf8746f1c0f634f29ebf73bff16ea1
> 
> That being said, having Ceilometer's write path be highly tuned and not
> use SQLA (and written for every back end natively) is probably
> appropriate.
> 
> 
> While I like this idea, they loose free postgresql support by dropping
> SQLA. But that is a solvable problem.

Joe, you're just trolling now, right? :)

I mean you picked the most pathological case possible. An empty table
with no data ever returned. So no actual work was done anywhere, and
this is just measure side effects which in no way are commensurate with
actual read / write profiles of a real system.

I 100% agree that SQLA provides overhead. However, removing SQLA is the
last in a series of optimizations that you do on a system. Because
taking it out doesn't solve having bad data usage (getting more data
than you need), bad schema, or bad queries. I would expect substantial
gains could be made tackling those first.

If after that, fast path drivers sounded like a good idea, go for it.

But realize that a fast path driver is more work to write and maintain.
And has the energy hasn't gone into optimizing things yet, I think a
proposal to put even more work on the team to write a new set of harder
to maintain drivers, is just a non starter.

All I'm asking is that we need profiling. Ceilometer is suppose to be
high performance / low overhead metrics collection. We have some
indication that it's not meeting that desire based on our gate runs.
Which means we can reproduce it. Which is great, because reproducing
means things are fixable, and we can easily know if we did fix it.

Optimizing is hard, but I think it's the right time to do it. Not just
with elasticity, but with old fashion analysis.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Doug Hellmann
On Fri, Mar 21, 2014 at 5:13 PM, Joe Gordon  wrote:

>
>
>
> On Fri, Mar 21, 2014 at 8:58 AM, Doug Hellmann <
> doug.hellm...@dreamhost.com> wrote:
>
>>
>>
>>
>> On Fri, Mar 21, 2014 at 7:04 AM, Sean Dague  wrote:
>>
>>> On 03/20/2014 06:18 PM, Joe Gordon wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
>>> > mailto:alexei.kornie...@gmail.com>>
>>> wrote:
>>> >
>>> > Hello,
>>> >
>>> > We've done some profiling and results are quite interesting:
>>> > during 1,5 hour ceilometer inserted 59755 events (59755 calls to
>>> > record_metering_data)
>>> > this calls resulted in total 2591573 SQL queries.
>>> >
>>> > And the most interesting part is that 291569 queries were ROLLBACK
>>> > queries.
>>> > We do around 5 rollbacks to record a single event!
>>> >
>>> > I guess it means that MySQL backend is currently totally unusable
>>> in
>>> > production environment.
>>> >
>>> >
>>> > It should be noticed that SQLAlchemy is horrible for performance, in
>>> > nova we usually see sqlalchemy overheads of well over 10x (time
>>> > nova.db.api call vs the time MySQL measures when slow log is recording
>>> > everything).
>>>
>>> That's not really a fair assessment. Python object inflation takes time.
>>> I do get that there is SQLA overhead here, but even if you trimmed it
>>> out you would not get the the mysql query time.
>>>
>>> That being said, having Ceilometer's write path be highly tuned and not
>>> use SQLA (and written for every back end natively) is probably
>>> appropriate.
>>>
>>
>> I have been working to get Mike Bayer (author of SQLAlchemy) to the
>> summit in Atlanta. He is interested in working with us to improve
>> SQLAlchemy, so if we have specific performance or feature issues like this,
>> it would be good to make a list. If we have enough, maybe we can set aside
>> a session in the Oslo track, otherwise we can at least have some hallway
>> conversations.
>>
>
>
> That would be really amazing. Is he on IRC, so we can get the ball rolling?
>

I'll ask him to join #openstack-dev if he is.

Doug



>
>
>>
>> Doug
>>
>>
>>
>>>
>>> -Sean
>>>
>>> --
>>> Sean Dague
>>> Samsung Research America
>>> s...@dague.net / sean.da...@samsung.com
>>> http://dague.net
>>>
>>>
>>> ___
>>> OpenStack-dev mailing list
>>> OpenStack-dev@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Alexei Kornienko

Hello,

Please see some comments inline.

Best Regards,
Alexei Kornienko

On 03/21/2014 11:11 PM, Joe Gordon wrote:




On Fri, Mar 21, 2014 at 4:04 AM, Sean Dague > wrote:


On 03/20/2014 06:18 PM, Joe Gordon wrote:
>
>
>
> On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
> mailto:alexei.kornie...@gmail.com>
>> wrote:
>
> Hello,
>
> We've done some profiling and results are quite interesting:
> during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> record_metering_data)
> this calls resulted in total 2591573 SQL queries.
>
> And the most interesting part is that 291569 queries were
ROLLBACK
> queries.
> We do around 5 rollbacks to record a single event!
>
> I guess it means that MySQL backend is currently totally
unusable in
> production environment.
>
>
> It should be noticed that SQLAlchemy is horrible for performance, in
> nova we usually see sqlalchemy overheads of well over 10x (time
> nova.db.api call vs the time MySQL measures when slow log is
recording
> everything).

That's not really a fair assessment. Python object inflation takes
time.
I do get that there is SQLA overhead here, but even if you trimmed it
out you would not get the the mysql query time.


To give an example from nova:

doing a nova list with no servers:

stack@devstack:~/devstack$ nova --timing list

| GET 
http://10.0.0.16:8774/v2/a82ededa9a934b93a7184d06f302d745/servers/detail 
| 0.0817470550537 |


So nova command takes 0.0817470550537 seconds.

Inside the nova logs (when putting a timer around all nova.db.api 
calls [1] ), nova.db.api.instance_get_all_by_filters takes 0.06 seconds:


2014-03-21 20:58:46.760 DEBUG nova.db.api 
[req-91879f86-7665-4943-8953-41c92c42c030 demo demo] 
'instance_get_all_by_filters' 0.06 seconds timed 
/mnt/stack/nova/nova/db/api.py:1940


But the sql slow long reports the same query takes only 0.001006 
seconds with a lock_time of 0.000269 for a total of  0.00127 seconds.


# Query_time: 0.001006  Lock_time: 0.000269 Rows_sent: 0 
 Rows_examined: 0



So in this case only 2% of the time 
that  nova.db.api.instance_get_all_by_filters takes is spent inside of 
mysql. Or to put it differently 
 nova.db.api.instance_get_all_by_filters is 47 times slower then the 
raw DB call underneath.


Yes I agree that that turning raw sql data into python objects should 
take time, but I just don't think it should take 98% of the time.
If you would open actual code of nova.db.api.instance_get_all_by_filters 
- 
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1817

You will find out that python code is actually doing lot's of things:
1) setup join conditions
2) create query filters
3) doing some heavy matching, loops in exact_filter, regex_filter, 
tag_filter
This code won't go away with python objects since it's related to 
busyness logic.
I think that it's quite hypocritical to say that the problem is "turning 
raw sql data into python objects"




[1] 
https://github.com/jogo/nova/commit/7743ee366bbf8746f1c0f634f29ebf73bff16ea1


That being said, having Ceilometer's write path be highly tuned
and not
use SQLA (and written for every back end natively) is probably
appropriate.


While I like this idea, they loose free postgresql support by dropping 
SQLA. But that is a solvable problem.



-Sean

--
Sean Dague
Samsung Research America
s...@dague.net  / sean.da...@samsung.com

http://dague.net


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Joe Gordon
On Fri, Mar 21, 2014 at 8:58 AM, Doug Hellmann
wrote:

>
>
>
> On Fri, Mar 21, 2014 at 7:04 AM, Sean Dague  wrote:
>
>> On 03/20/2014 06:18 PM, Joe Gordon wrote:
>> >
>> >
>> >
>> > On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
>> > mailto:alexei.kornie...@gmail.com>> wrote:
>> >
>> > Hello,
>> >
>> > We've done some profiling and results are quite interesting:
>> > during 1,5 hour ceilometer inserted 59755 events (59755 calls to
>> > record_metering_data)
>> > this calls resulted in total 2591573 SQL queries.
>> >
>> > And the most interesting part is that 291569 queries were ROLLBACK
>> > queries.
>> > We do around 5 rollbacks to record a single event!
>> >
>> > I guess it means that MySQL backend is currently totally unusable in
>> > production environment.
>> >
>> >
>> > It should be noticed that SQLAlchemy is horrible for performance, in
>> > nova we usually see sqlalchemy overheads of well over 10x (time
>> > nova.db.api call vs the time MySQL measures when slow log is recording
>> > everything).
>>
>> That's not really a fair assessment. Python object inflation takes time.
>> I do get that there is SQLA overhead here, but even if you trimmed it
>> out you would not get the the mysql query time.
>>
>> That being said, having Ceilometer's write path be highly tuned and not
>> use SQLA (and written for every back end natively) is probably
>> appropriate.
>>
>
> I have been working to get Mike Bayer (author of SQLAlchemy) to the summit
> in Atlanta. He is interested in working with us to improve SQLAlchemy, so
> if we have specific performance or feature issues like this, it would be
> good to make a list. If we have enough, maybe we can set aside a session in
> the Oslo track, otherwise we can at least have some hallway conversations.
>


That would be really amazing. Is he on IRC, so we can get the ball rolling?


>
> Doug
>
>
>
>>
>> -Sean
>>
>> --
>> Sean Dague
>> Samsung Research America
>> s...@dague.net / sean.da...@samsung.com
>> http://dague.net
>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Joe Gordon
On Fri, Mar 21, 2014 at 4:04 AM, Sean Dague  wrote:

> On 03/20/2014 06:18 PM, Joe Gordon wrote:
> >
> >
> >
> > On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
> > mailto:alexei.kornie...@gmail.com>> wrote:
> >
> > Hello,
> >
> > We've done some profiling and results are quite interesting:
> > during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> > record_metering_data)
> > this calls resulted in total 2591573 SQL queries.
> >
> > And the most interesting part is that 291569 queries were ROLLBACK
> > queries.
> > We do around 5 rollbacks to record a single event!
> >
> > I guess it means that MySQL backend is currently totally unusable in
> > production environment.
> >
> >
> > It should be noticed that SQLAlchemy is horrible for performance, in
> > nova we usually see sqlalchemy overheads of well over 10x (time
> > nova.db.api call vs the time MySQL measures when slow log is recording
> > everything).
>
> That's not really a fair assessment. Python object inflation takes time.
> I do get that there is SQLA overhead here, but even if you trimmed it
> out you would not get the the mysql query time.
>
>
To give an example from nova:

doing a nova list with no servers:

stack@devstack:~/devstack$ nova --timing list

| GET
http://10.0.0.16:8774/v2/a82ededa9a934b93a7184d06f302d745/servers/detail |
0.0817470550537 |

So nova command takes 0.0817470550537 seconds.

Inside the nova logs (when putting a timer around all nova.db.api calls [1]
), nova.db.api.instance_get_all_by_filters takes 0.06 seconds:

2014-03-21 20:58:46.760 DEBUG nova.db.api
[req-91879f86-7665-4943-8953-41c92c42c030 demo demo]
'instance_get_all_by_filters' 0.06 seconds timed
/mnt/stack/nova/nova/db/api.py:1940

But the sql slow long reports the same query takes only 0.001006 seconds
with a lock_time of 0.000269 for a total of  0.00127 seconds.

# Query_time: 0.001006  Lock_time: 0.000269 Rows_sent: 0
 Rows_examined: 0


So in this case only 2% of the time
that  nova.db.api.instance_get_all_by_filters takes is spent inside of
mysql. Or to put it differently  nova.db.api.instance_get_all_by_filters is
47 times slower then the raw DB call underneath.

Yes I agree that that turning raw sql data into python objects should take
time, but I just don't think it should take 98% of the time.

[1]
https://github.com/jogo/nova/commit/7743ee366bbf8746f1c0f634f29ebf73bff16ea1

That being said, having Ceilometer's write path be highly tuned and not
> use SQLA (and written for every back end natively) is probably appropriate.
>

While I like this idea, they loose free postgresql support by dropping
SQLA. But that is a solvable problem.


>
> -Sean
>
> --
> Sean Dague
> Samsung Research America
> s...@dague.net / sean.da...@samsung.com
> http://dague.net
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Doug Hellmann
On Fri, Mar 21, 2014 at 7:04 AM, Sean Dague  wrote:

> On 03/20/2014 06:18 PM, Joe Gordon wrote:
> >
> >
> >
> > On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
> > mailto:alexei.kornie...@gmail.com>> wrote:
> >
> > Hello,
> >
> > We've done some profiling and results are quite interesting:
> > during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> > record_metering_data)
> > this calls resulted in total 2591573 SQL queries.
> >
> > And the most interesting part is that 291569 queries were ROLLBACK
> > queries.
> > We do around 5 rollbacks to record a single event!
> >
> > I guess it means that MySQL backend is currently totally unusable in
> > production environment.
> >
> >
> > It should be noticed that SQLAlchemy is horrible for performance, in
> > nova we usually see sqlalchemy overheads of well over 10x (time
> > nova.db.api call vs the time MySQL measures when slow log is recording
> > everything).
>
> That's not really a fair assessment. Python object inflation takes time.
> I do get that there is SQLA overhead here, but even if you trimmed it
> out you would not get the the mysql query time.
>
> That being said, having Ceilometer's write path be highly tuned and not
> use SQLA (and written for every back end natively) is probably appropriate.
>

I have been working to get Mike Bayer (author of SQLAlchemy) to the summit
in Atlanta. He is interested in working with us to improve SQLAlchemy, so
if we have specific performance or feature issues like this, it would be
good to make a list. If we have enough, maybe we can set aside a session
in the Oslo track, otherwise we can at least have some hallway
conversations.

Doug



>
> -Sean
>
> --
> Sean Dague
> Samsung Research America
> s...@dague.net / sean.da...@samsung.com
> http://dague.net
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Boris Pavlovic
Sean,


Absolutely agree with you.
It's not the same to execute query and get plain text, and execute query
and get hierarchy of python objects.

Plus I disagree when I hear that SQLAlchemy is slow. It's slow when you are
using it wrong.

Like in Nova Scheduler [1] we were fetching full 3 tables with JOIN. Which
produce much more results from DB (in bytes and rows) then just make 3
separated selects and then join it by hand.

We should stop using next phrases:
1) python is slow
2) mysql is slow
3) sqlalchemy is slow
4) hardware is slow [2]

And start using these phrase:
1) Algorithms that we are using are bad
2) Architecture solutions that we are using are bad

And start thinking about how to improve them.


[1] https://review.openstack.org/#/c/43151/
[2] http://en.wikipedia.org/wiki/Buran_(spacecraft)

Best regards,
Boris Pavlovic



On Fri, Mar 21, 2014 at 3:04 PM, Sean Dague  wrote:

> On 03/20/2014 06:18 PM, Joe Gordon wrote:
> >
> >
> >
> > On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
> > mailto:alexei.kornie...@gmail.com>> wrote:
> >
> > Hello,
> >
> > We've done some profiling and results are quite interesting:
> > during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> > record_metering_data)
> > this calls resulted in total 2591573 SQL queries.
> >
> > And the most interesting part is that 291569 queries were ROLLBACK
> > queries.
> > We do around 5 rollbacks to record a single event!
> >
> > I guess it means that MySQL backend is currently totally unusable in
> > production environment.
> >
> >
> > It should be noticed that SQLAlchemy is horrible for performance, in
> > nova we usually see sqlalchemy overheads of well over 10x (time
> > nova.db.api call vs the time MySQL measures when slow log is recording
> > everything).
>
> That's not really a fair assessment. Python object inflation takes time.
> I do get that there is SQLA overhead here, but even if you trimmed it
> out you would not get the the mysql query time.
>
> That being said, having Ceilometer's write path be highly tuned and not
> use SQLA (and written for every back end natively) is probably appropriate.
>
> -Sean
>
> --
> Sean Dague
> Samsung Research America
> s...@dague.net / sean.da...@samsung.com
> http://dague.net
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-21 Thread Sean Dague
On 03/20/2014 06:18 PM, Joe Gordon wrote:
> 
> 
> 
> On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
> mailto:alexei.kornie...@gmail.com>> wrote:
> 
> Hello,
> 
> We've done some profiling and results are quite interesting:
> during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> record_metering_data)
> this calls resulted in total 2591573 SQL queries.
> 
> And the most interesting part is that 291569 queries were ROLLBACK
> queries.
> We do around 5 rollbacks to record a single event!
> 
> I guess it means that MySQL backend is currently totally unusable in
> production environment.
> 
> 
> It should be noticed that SQLAlchemy is horrible for performance, in
> nova we usually see sqlalchemy overheads of well over 10x (time
> nova.db.api call vs the time MySQL measures when slow log is recording
> everything).

That's not really a fair assessment. Python object inflation takes time.
I do get that there is SQLA overhead here, but even if you trimmed it
out you would not get the the mysql query time.

That being said, having Ceilometer's write path be highly tuned and not
use SQLA (and written for every back end natively) is probably appropriate.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Jay Pipes
On Fri, 2014-03-21 at 01:02 +0200, Alexei Kornienko wrote:
> On 03/21/2014 12:53 AM, Jay Pipes wrote:
> > On Fri, 2014-03-21 at 00:32 +0200, Alexei Kornienko wrote:
> >> On 03/21/2014 12:15 AM, Jay Pipes wrote:
> >>> On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:
>  Hello,
> 
>  We've done some profiling and results are quite interesting:
>  during 1,5 hour ceilometer inserted 59755 events (59755 calls to
>  record_metering_data)
>  this calls resulted in total 2591573 SQL queries.
> >>> Yes, this matches my own experience with Ceilo+MySQL. But do not assume
> >>> that there are 2591573/59755 or around 43 queries per record meter
> >>> event. That is misleading. In fact, the number of queries per record
> >>> meter event increases over time, as the number of retries climbs due to
> >>> contention between readers and writers.
> >>>
>  And the most interesting part is that 291569 queries were ROLLBACK
>  queries.
> >>> Yep, I noted that as well. But, this is not unique to Ceilometer by any
> >>> means. Just take a look at any database serving Nova, Cinder, Glance, or
> >>> anything that uses the common SQLAlchemy code. You will see a huge
> >>> percentage of entire number of queries taken up by ROLLBACK statements.
> >>> The problem in Ceilometer is just that the write:read ratio is much
> >>> higher than any of the other projects.
> >>>
> >>> I had a suspicion that the rollbacks have to do with the way that the
> >>> oslo.db retry logic works, but I never had a chance to investigate it
> >>> further. Would be really interested to see similar stats against
> >>> PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
> >>> it is).
> >> Rollbacks are caused not by retry logic but by create_or_update logic:
> >> We first try to do INSERT in sub-transaction when it fails we rollback
> >> this transaction and do update instead.
> > No, that isn't correct, AFAIK. We first do a SELECT into the table and
> > then if no result, try an insert:
> >
> > https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L286-L292
> >
> > The problem, IMO, is twofold. There does not need to be nested
> > transactional containers around these create_or_update lookups -- i.e.
> > the lookups can be done outside of the main transaction begin here:
> >
> > https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L335
> I'm afraid you are wrong here:
> 
> nested  =  session.connection().dialect.name  !=  'sqlite' # always True for 
> MySQL
> if  not  nested  and  session.query(model_class).get(str(_id)): # always False
> 
> Short circuit is used and no select is ever performed in MySQL.

Doh, true enough! /me wonders why this is written like so (only for
sqlite...?)

> > Secondly, given the volume of inserts (that also generate selects), a
> > simple memcache lookup cache would be highly beneficial in cutting down
> > on writer/reader contention in MySQL.
> You are right but I'm afraid that adding memcache will make deployment 
> more complicated.
> >
> > These are things that can be done without changing the schema (which has
> > other issues that can be looked at of course).
> >
> > Best,
> > -jay
> >
> >> This is caused by poorly designed schema that requires such hacks.
> >> Cause of this I suspect that we'll have similar results for PostgreSQL.
> >>
> >> Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if
> >> there is any difference.
> >>
> >>> Best,
> >>> -jay
> >>>
>  We do around 5 rollbacks to record a single event!
> 
>  I guess it means that MySQL backend is currently totally unusable in
>  production environment.
> 
>  Please find a full profiling graph attached.
> 
>  Regards,
> 
>  On 03/20/2014 10:31 PM, Sean Dague wrote:
> 
> > On 03/20/2014 01:01 PM, David Kranz wrote:
> >> On 03/20/2014 12:31 PM, Sean Dague wrote:
> >>> On 03/20/2014 11:35 AM, David Kranz wrote:
>  On 03/20/2014 06:15 AM, Sean Dague wrote:
> > On 03/20/2014 05:49 AM, Nadya Privalova wrote:
> >> Hi all,
> >> First of all, thanks for your suggestions!
> >>
> >> To summarize the discussions here:
> >> 1. We are not going to install Mongo (because "is's wrong" ?)
> > We are not going to install Mongo "not from base distribution", 
> > because
> > we don't do that for things that aren't python. Our assumption is
> > dependent services come from the base OS.
> >
> > That being said, being an integrated project means you have to be 
> > able
> > to function, sanely, on an sqla backend, as that will always be 
> > part of
> > your gate.
>  This is a claim I think needs a bit more scrutiny if by "sanely" you
>  mean "performant". It seems we have an integrated project that no one
>  would deploy using th

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Gordon Chung
Alexei, awesome work.

> Rollbacks are caused not by retry logic but by create_or_update logic:
> We first try to do INSERT in sub-transaction when it fails we rollback 
> this transaction and do update instead.

if you take a look at my patch addressing deadlocks(
https://review.openstack.org/#/c/80461/), i actually added a check to get 
rid of the blind insert logic we had so that should lower the number of 
rollbacks (except for race conditions, which is what the function was 
designed for). i did some minor performance testing as well and will add a 
few notes to the patch where performance can be improved but requires a 
larger schema change.  Jay, please take a look there as well if you have 
time.

> Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if 
> there is any difference.

i look forward to these results, from my quick testing with Mongo, we get 
about 10x the write speed vs mysql.

> > We required a non mongo backend to graduate ceilometer. So I don't 
think
> > it's too much to ask that it actually works.

i don't think sql is the recommended back in real deployments but that 
said, given the modest load of tempest tests, i would expect our sql 
backend be able to handle it.

cheers,
gordon chung
openstack, ibm software standards___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Alexei Kornienko

On 03/21/2014 12:53 AM, Jay Pipes wrote:

On Fri, 2014-03-21 at 00:32 +0200, Alexei Kornienko wrote:

On 03/21/2014 12:15 AM, Jay Pipes wrote:

On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:

Hello,

We've done some profiling and results are quite interesting:
during 1,5 hour ceilometer inserted 59755 events (59755 calls to
record_metering_data)
this calls resulted in total 2591573 SQL queries.

Yes, this matches my own experience with Ceilo+MySQL. But do not assume
that there are 2591573/59755 or around 43 queries per record meter
event. That is misleading. In fact, the number of queries per record
meter event increases over time, as the number of retries climbs due to
contention between readers and writers.


And the most interesting part is that 291569 queries were ROLLBACK
queries.

Yep, I noted that as well. But, this is not unique to Ceilometer by any
means. Just take a look at any database serving Nova, Cinder, Glance, or
anything that uses the common SQLAlchemy code. You will see a huge
percentage of entire number of queries taken up by ROLLBACK statements.
The problem in Ceilometer is just that the write:read ratio is much
higher than any of the other projects.

I had a suspicion that the rollbacks have to do with the way that the
oslo.db retry logic works, but I never had a chance to investigate it
further. Would be really interested to see similar stats against
PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
it is).

Rollbacks are caused not by retry logic but by create_or_update logic:
We first try to do INSERT in sub-transaction when it fails we rollback
this transaction and do update instead.

No, that isn't correct, AFAIK. We first do a SELECT into the table and
then if no result, try an insert:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L286-L292

The problem, IMO, is twofold. There does not need to be nested
transactional containers around these create_or_update lookups -- i.e.
the lookups can be done outside of the main transaction begin here:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L335

I'm afraid you are wrong here:

nested  =  session.connection().dialect.name  !=  'sqlite' # always True for 
MySQL
if  not  nested  and  session.query(model_class).get(str(_id)): # always False

Short circuit is used and no select is ever performed in MySQL.


Secondly, given the volume of inserts (that also generate selects), a
simple memcache lookup cache would be highly beneficial in cutting down
on writer/reader contention in MySQL.
You are right but I'm afraid that adding memcache will make deployment 
more complicated.


These are things that can be done without changing the schema (which has
other issues that can be looked at of course).

Best,
-jay


This is caused by poorly designed schema that requires such hacks.
Cause of this I suspect that we'll have similar results for PostgreSQL.

Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if
there is any difference.


Best,
-jay


We do around 5 rollbacks to record a single event!

I guess it means that MySQL backend is currently totally unusable in
production environment.

Please find a full profiling graph attached.

Regards,

On 03/20/2014 10:31 PM, Sean Dague wrote:


On 03/20/2014 01:01 PM, David Kranz wrote:

On 03/20/2014 12:31 PM, Sean Dague wrote:

On 03/20/2014 11:35 AM, David Kranz wrote:

On 03/20/2014 06:15 AM, Sean Dague wrote:

On 03/20/2014 05:49 AM, Nadya Privalova wrote:

Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because "is's wrong" ?)

We are not going to install Mongo "not from base distribution", because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.

This is a claim I think needs a bit more scrutiny if by "sanely" you
mean "performant". It seems we have an integrated project that no one
would deploy using the sql db driver we have in the gate. Is any one
doing that?  Is having a scalable sql back end a goal of ceilometer?

More generally, if there is functionality that is of great importance to
any cloud deployment (and we would not integrate it if we didn't think
it was) that cannot be deployed at scale using sqla, are we really going
to say it should not be a part of OpenStack because we refuse, for
whatever reason, to run it in our gate using a driver that would
actually be used? And if we do demand an sqla backend, how much time
should we spend trying to optimize it if no one will really use it?
Though the slow heat job is a little different because the slowness
comes directly from running real use cases, perhaps we should just set
up a "slow ceilometer" job if the sql versio

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Jay Pipes
On Fri, 2014-03-21 at 00:32 +0200, Alexei Kornienko wrote:
> On 03/21/2014 12:15 AM, Jay Pipes wrote:
> > On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:
> >> Hello,
> >>
> >> We've done some profiling and results are quite interesting:
> >> during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> >> record_metering_data)
> >> this calls resulted in total 2591573 SQL queries.
> > Yes, this matches my own experience with Ceilo+MySQL. But do not assume
> > that there are 2591573/59755 or around 43 queries per record meter
> > event. That is misleading. In fact, the number of queries per record
> > meter event increases over time, as the number of retries climbs due to
> > contention between readers and writers.
> >
> >> And the most interesting part is that 291569 queries were ROLLBACK
> >> queries.
> > Yep, I noted that as well. But, this is not unique to Ceilometer by any
> > means. Just take a look at any database serving Nova, Cinder, Glance, or
> > anything that uses the common SQLAlchemy code. You will see a huge
> > percentage of entire number of queries taken up by ROLLBACK statements.
> > The problem in Ceilometer is just that the write:read ratio is much
> > higher than any of the other projects.
> >
> > I had a suspicion that the rollbacks have to do with the way that the
> > oslo.db retry logic works, but I never had a chance to investigate it
> > further. Would be really interested to see similar stats against
> > PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
> > it is).
> Rollbacks are caused not by retry logic but by create_or_update logic:
> We first try to do INSERT in sub-transaction when it fails we rollback 
> this transaction and do update instead.

No, that isn't correct, AFAIK. We first do a SELECT into the table and
then if no result, try an insert:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L286-L292

The problem, IMO, is twofold. There does not need to be nested
transactional containers around these create_or_update lookups -- i.e.
the lookups can be done outside of the main transaction begin here:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L335

Secondly, given the volume of inserts (that also generate selects), a
simple memcache lookup cache would be highly beneficial in cutting down
on writer/reader contention in MySQL.

These are things that can be done without changing the schema (which has
other issues that can be looked at of course).

Best,
-jay

> This is caused by poorly designed schema that requires such hacks.
> Cause of this I suspect that we'll have similar results for PostgreSQL.
> 
> Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if 
> there is any difference.
> 
> >
> > Best,
> > -jay
> >
> >> We do around 5 rollbacks to record a single event!
> >>
> >> I guess it means that MySQL backend is currently totally unusable in
> >> production environment.
> >>
> >> Please find a full profiling graph attached.
> >>
> >> Regards,
> >>
> >> On 03/20/2014 10:31 PM, Sean Dague wrote:
> >>
> >>> On 03/20/2014 01:01 PM, David Kranz wrote:
>  On 03/20/2014 12:31 PM, Sean Dague wrote:
> > On 03/20/2014 11:35 AM, David Kranz wrote:
> >> On 03/20/2014 06:15 AM, Sean Dague wrote:
> >>> On 03/20/2014 05:49 AM, Nadya Privalova wrote:
>  Hi all,
>  First of all, thanks for your suggestions!
> 
>  To summarize the discussions here:
>  1. We are not going to install Mongo (because "is's wrong" ?)
> >>> We are not going to install Mongo "not from base distribution", 
> >>> because
> >>> we don't do that for things that aren't python. Our assumption is
> >>> dependent services come from the base OS.
> >>>
> >>> That being said, being an integrated project means you have to be able
> >>> to function, sanely, on an sqla backend, as that will always be part 
> >>> of
> >>> your gate.
> >> This is a claim I think needs a bit more scrutiny if by "sanely" you
> >> mean "performant". It seems we have an integrated project that no one
> >> would deploy using the sql db driver we have in the gate. Is any one
> >> doing that?  Is having a scalable sql back end a goal of ceilometer?
> >>
> >> More generally, if there is functionality that is of great importance 
> >> to
> >> any cloud deployment (and we would not integrate it if we didn't think
> >> it was) that cannot be deployed at scale using sqla, are we really 
> >> going
> >> to say it should not be a part of OpenStack because we refuse, for
> >> whatever reason, to run it in our gate using a driver that would
> >> actually be used? And if we do demand an sqla backend, how much time
> >> should we spend trying to optimize it if no one will really use it?
> >> Though the slow heat job is a little different because the slowness
> >>

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Alexei Kornienko


On 03/21/2014 12:15 AM, Jay Pipes wrote:

On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:

Hello,

We've done some profiling and results are quite interesting:
during 1,5 hour ceilometer inserted 59755 events (59755 calls to
record_metering_data)
this calls resulted in total 2591573 SQL queries.

Yes, this matches my own experience with Ceilo+MySQL. But do not assume
that there are 2591573/59755 or around 43 queries per record meter
event. That is misleading. In fact, the number of queries per record
meter event increases over time, as the number of retries climbs due to
contention between readers and writers.


And the most interesting part is that 291569 queries were ROLLBACK
queries.

Yep, I noted that as well. But, this is not unique to Ceilometer by any
means. Just take a look at any database serving Nova, Cinder, Glance, or
anything that uses the common SQLAlchemy code. You will see a huge
percentage of entire number of queries taken up by ROLLBACK statements.
The problem in Ceilometer is just that the write:read ratio is much
higher than any of the other projects.

I had a suspicion that the rollbacks have to do with the way that the
oslo.db retry logic works, but I never had a chance to investigate it
further. Would be really interested to see similar stats against
PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
it is).

Rollbacks are caused not by retry logic but by create_or_update logic:
We first try to do INSERT in sub-transaction when it fails we rollback 
this transaction and do update instead.

This is caused by poorly designed schema that requires such hacks.
Cause of this I suspect that we'll have similar results for PostgreSQL.

Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if 
there is any difference.




Best,
-jay


We do around 5 rollbacks to record a single event!

I guess it means that MySQL backend is currently totally unusable in
production environment.

Please find a full profiling graph attached.

Regards,

On 03/20/2014 10:31 PM, Sean Dague wrote:


On 03/20/2014 01:01 PM, David Kranz wrote:

On 03/20/2014 12:31 PM, Sean Dague wrote:

On 03/20/2014 11:35 AM, David Kranz wrote:

On 03/20/2014 06:15 AM, Sean Dague wrote:

On 03/20/2014 05:49 AM, Nadya Privalova wrote:

Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because "is's wrong" ?)

We are not going to install Mongo "not from base distribution", because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.

This is a claim I think needs a bit more scrutiny if by "sanely" you
mean "performant". It seems we have an integrated project that no one
would deploy using the sql db driver we have in the gate. Is any one
doing that?  Is having a scalable sql back end a goal of ceilometer?

More generally, if there is functionality that is of great importance to
any cloud deployment (and we would not integrate it if we didn't think
it was) that cannot be deployed at scale using sqla, are we really going
to say it should not be a part of OpenStack because we refuse, for
whatever reason, to run it in our gate using a driver that would
actually be used? And if we do demand an sqla backend, how much time
should we spend trying to optimize it if no one will really use it?
Though the slow heat job is a little different because the slowness
comes directly from running real use cases, perhaps we should just set
up a "slow ceilometer" job if the sql version is too slow for its budget
in the main job.

It seems like there is a similar thread, at least in part, about this
around marconi.

We required a non mongo backend to graduate ceilometer. So I don't think
it's too much to ask that it actually works.

If the answer is that it will never work and it was a checkbox with no
intent to make it work, then it should be deprecated and removed from
the tree in Juno, with a big WARNING that you shouldn't ever use that
backend. Like Nova now does with all the virt drivers that aren't tested
upstream.

Shipping in tree code that you don't want people to use is bad for
users. Either commit to making it work, or deprecate it and remove it.

I don't see this as the same issue as the slow heat job. Heat,
architecturally, is going to be slow. It spins up real OSes and does
real thinks to them. There is no way that's ever going to be fast, and
the dedicated job was a recognition that to support this level of
services in OpenStack we need to give them more breathing room.

Peace. I specifically noted that difference in my original comment. And
for that reason the heat slow job may not be temporary.

Architecturally Ceilometer should not be this expensive. We've got some
data showing it to be aberrant from where we beli

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Joe Gordon
On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko <
alexei.kornie...@gmail.com> wrote:

>  Hello,
>
> We've done some profiling and results are quite interesting:
> during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> record_metering_data)
> this calls resulted in total 2591573 SQL queries.
>
> And the most interesting part is that 291569 queries were ROLLBACK queries.
> We do around 5 rollbacks to record a single event!
>
> I guess it means that MySQL backend is currently totally unusable in
> production environment.
>

It should be noticed that SQLAlchemy is horrible for performance, in nova
we usually see sqlalchemy overheads of well over 10x (time nova.db.api call
vs the time MySQL measures when slow log is recording everything).


>
> Please find a full profiling graph attached.
>
> Regards,
>
>
> On 03/20/2014 10:31 PM, Sean Dague wrote:
>
> On 03/20/2014 01:01 PM, David Kranz wrote:
>
>  On 03/20/2014 12:31 PM, Sean Dague wrote:
>
>  On 03/20/2014 11:35 AM, David Kranz wrote:
>
>  On 03/20/2014 06:15 AM, Sean Dague wrote:
>
>  On 03/20/2014 05:49 AM, Nadya Privalova wrote:
>
>  Hi all,
> First of all, thanks for your suggestions!
>
> To summarize the discussions here:
> 1. We are not going to install Mongo (because "is's wrong" ?)
>
>  We are not going to install Mongo "not from base distribution", because
> we don't do that for things that aren't python. Our assumption is
> dependent services come from the base OS.
>
> That being said, being an integrated project means you have to be able
> to function, sanely, on an sqla backend, as that will always be part of
> your gate.
>
>  This is a claim I think needs a bit more scrutiny if by "sanely" you
> mean "performant". It seems we have an integrated project that no one
> would deploy using the sql db driver we have in the gate. Is any one
> doing that?  Is having a scalable sql back end a goal of ceilometer?
>
> More generally, if there is functionality that is of great importance to
> any cloud deployment (and we would not integrate it if we didn't think
> it was) that cannot be deployed at scale using sqla, are we really going
> to say it should not be a part of OpenStack because we refuse, for
> whatever reason, to run it in our gate using a driver that would
> actually be used? And if we do demand an sqla backend, how much time
> should we spend trying to optimize it if no one will really use it?
> Though the slow heat job is a little different because the slowness
> comes directly from running real use cases, perhaps we should just set
> up a "slow ceilometer" job if the sql version is too slow for its budget
> in the main job.
>
> It seems like there is a similar thread, at least in part, about this
> around marconi.
>
>  We required a non mongo backend to graduate ceilometer. So I don't think
> it's too much to ask that it actually works.
>
> If the answer is that it will never work and it was a checkbox with no
> intent to make it work, then it should be deprecated and removed from
> the tree in Juno, with a big WARNING that you shouldn't ever use that
> backend. Like Nova now does with all the virt drivers that aren't tested
> upstream.
>
> Shipping in tree code that you don't want people to use is bad for
> users. Either commit to making it work, or deprecate it and remove it.
>
> I don't see this as the same issue as the slow heat job. Heat,
> architecturally, is going to be slow. It spins up real OSes and does
> real thinks to them. There is no way that's ever going to be fast, and
> the dedicated job was a recognition that to support this level of
> services in OpenStack we need to give them more breathing room.
>
>  Peace. I specifically noted that difference in my original comment. And
> for that reason the heat slow job may not be temporary.
>
>  Architecturally Ceilometer should not be this expensive. We've got some
> data showing it to be aberrant from where we believe it should be. We
> should fix that.
>
>  There are plenty of cases where we have had code that passes gate tests
> with acceptable performance but falls over in real deployment. I'm just
> saying that having a driver that works ok in the gate but does not work
> for real deployments is of no more value that not having it at all.
> Maybe less value.
> How do you propose to solve the problem of getting more ceilometer tests
> into the gate in the short-run? As a practical measure l don't see why
> it is so bad to have a separate job until the complex issue of whether
> it is possible to have a real-world performant sqla backend is resolved.
> Or did I miss something and it has already been determined that sqla
> could be used for large-scale deployments if we just fixed our code?
>
>  I think right now the ball is back in the ceilometer court to do some
> performance profiling, and lets see what comes of that. I don't think
> we're getting more test before the release in any real way.
>
>
>  Once we get a base OS in the gate that lets us direct install mon

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Jay Pipes
On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:
> Hello,
> 
> We've done some profiling and results are quite interesting:
> during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> record_metering_data)
> this calls resulted in total 2591573 SQL queries.

Yes, this matches my own experience with Ceilo+MySQL. But do not assume
that there are 2591573/59755 or around 43 queries per record meter
event. That is misleading. In fact, the number of queries per record
meter event increases over time, as the number of retries climbs due to
contention between readers and writers.

> And the most interesting part is that 291569 queries were ROLLBACK
> queries.

Yep, I noted that as well. But, this is not unique to Ceilometer by any
means. Just take a look at any database serving Nova, Cinder, Glance, or
anything that uses the common SQLAlchemy code. You will see a huge
percentage of entire number of queries taken up by ROLLBACK statements.
The problem in Ceilometer is just that the write:read ratio is much
higher than any of the other projects.

I had a suspicion that the rollbacks have to do with the way that the
oslo.db retry logic works, but I never had a chance to investigate it
further. Would be really interested to see similar stats against
PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
it is).

Best,
-jay

> We do around 5 rollbacks to record a single event!
> 
> I guess it means that MySQL backend is currently totally unusable in
> production environment.
> 
> Please find a full profiling graph attached.
> 
> Regards,
> 
> On 03/20/2014 10:31 PM, Sean Dague wrote:
> 
> > On 03/20/2014 01:01 PM, David Kranz wrote:
> > > On 03/20/2014 12:31 PM, Sean Dague wrote:
> > > > On 03/20/2014 11:35 AM, David Kranz wrote:
> > > > > On 03/20/2014 06:15 AM, Sean Dague wrote:
> > > > > > On 03/20/2014 05:49 AM, Nadya Privalova wrote:
> > > > > > > Hi all,
> > > > > > > First of all, thanks for your suggestions!
> > > > > > > 
> > > > > > > To summarize the discussions here:
> > > > > > > 1. We are not going to install Mongo (because "is's wrong" ?)
> > > > > > We are not going to install Mongo "not from base distribution", 
> > > > > > because
> > > > > > we don't do that for things that aren't python. Our assumption is
> > > > > > dependent services come from the base OS.
> > > > > > 
> > > > > > That being said, being an integrated project means you have to be 
> > > > > > able
> > > > > > to function, sanely, on an sqla backend, as that will always be 
> > > > > > part of
> > > > > > your gate.
> > > > > This is a claim I think needs a bit more scrutiny if by "sanely" you
> > > > > mean "performant". It seems we have an integrated project that no one
> > > > > would deploy using the sql db driver we have in the gate. Is any one
> > > > > doing that?  Is having a scalable sql back end a goal of ceilometer?
> > > > > 
> > > > > More generally, if there is functionality that is of great importance 
> > > > > to
> > > > > any cloud deployment (and we would not integrate it if we didn't think
> > > > > it was) that cannot be deployed at scale using sqla, are we really 
> > > > > going
> > > > > to say it should not be a part of OpenStack because we refuse, for
> > > > > whatever reason, to run it in our gate using a driver that would
> > > > > actually be used? And if we do demand an sqla backend, how much time
> > > > > should we spend trying to optimize it if no one will really use it?
> > > > > Though the slow heat job is a little different because the slowness
> > > > > comes directly from running real use cases, perhaps we should just set
> > > > > up a "slow ceilometer" job if the sql version is too slow for its 
> > > > > budget
> > > > > in the main job.
> > > > > 
> > > > > It seems like there is a similar thread, at least in part, about this
> > > > > around marconi.
> > > > We required a non mongo backend to graduate ceilometer. So I don't think
> > > > it's too much to ask that it actually works.
> > > > 
> > > > If the answer is that it will never work and it was a checkbox with no
> > > > intent to make it work, then it should be deprecated and removed from
> > > > the tree in Juno, with a big WARNING that you shouldn't ever use that
> > > > backend. Like Nova now does with all the virt drivers that aren't tested
> > > > upstream.
> > > > 
> > > > Shipping in tree code that you don't want people to use is bad for
> > > > users. Either commit to making it work, or deprecate it and remove it.
> > > > 
> > > > I don't see this as the same issue as the slow heat job. Heat,
> > > > architecturally, is going to be slow. It spins up real OSes and does
> > > > real thinks to them. There is no way that's ever going to be fast, and
> > > > the dedicated job was a recognition that to support this level of
> > > > services in OpenStack we need to give them more breathing room.
> > > Peace. I specifically noted that difference in my original comment. And
> > > for that reason th

Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Sean Dague
On 03/20/2014 01:01 PM, David Kranz wrote:
> On 03/20/2014 12:31 PM, Sean Dague wrote:
>> On 03/20/2014 11:35 AM, David Kranz wrote:
>>> On 03/20/2014 06:15 AM, Sean Dague wrote:
 On 03/20/2014 05:49 AM, Nadya Privalova wrote:
> Hi all,
> First of all, thanks for your suggestions!
>
> To summarize the discussions here:
> 1. We are not going to install Mongo (because "is's wrong" ?)
 We are not going to install Mongo "not from base distribution", because
 we don't do that for things that aren't python. Our assumption is
 dependent services come from the base OS.

 That being said, being an integrated project means you have to be able
 to function, sanely, on an sqla backend, as that will always be part of
 your gate.
>>> This is a claim I think needs a bit more scrutiny if by "sanely" you
>>> mean "performant". It seems we have an integrated project that no one
>>> would deploy using the sql db driver we have in the gate. Is any one
>>> doing that?  Is having a scalable sql back end a goal of ceilometer?
>>>
>>> More generally, if there is functionality that is of great importance to
>>> any cloud deployment (and we would not integrate it if we didn't think
>>> it was) that cannot be deployed at scale using sqla, are we really going
>>> to say it should not be a part of OpenStack because we refuse, for
>>> whatever reason, to run it in our gate using a driver that would
>>> actually be used? And if we do demand an sqla backend, how much time
>>> should we spend trying to optimize it if no one will really use it?
>>> Though the slow heat job is a little different because the slowness
>>> comes directly from running real use cases, perhaps we should just set
>>> up a "slow ceilometer" job if the sql version is too slow for its budget
>>> in the main job.
>>>
>>> It seems like there is a similar thread, at least in part, about this
>>> around marconi.
>> We required a non mongo backend to graduate ceilometer. So I don't think
>> it's too much to ask that it actually works.
>>
>> If the answer is that it will never work and it was a checkbox with no
>> intent to make it work, then it should be deprecated and removed from
>> the tree in Juno, with a big WARNING that you shouldn't ever use that
>> backend. Like Nova now does with all the virt drivers that aren't tested
>> upstream.
>>
>> Shipping in tree code that you don't want people to use is bad for
>> users. Either commit to making it work, or deprecate it and remove it.
>>
>> I don't see this as the same issue as the slow heat job. Heat,
>> architecturally, is going to be slow. It spins up real OSes and does
>> real thinks to them. There is no way that's ever going to be fast, and
>> the dedicated job was a recognition that to support this level of
>> services in OpenStack we need to give them more breathing room.
> Peace. I specifically noted that difference in my original comment. And
> for that reason the heat slow job may not be temporary.
>>
>> Architecturally Ceilometer should not be this expensive. We've got some
>> data showing it to be aberrant from where we believe it should be. We
>> should fix that.
> There are plenty of cases where we have had code that passes gate tests
> with acceptable performance but falls over in real deployment. I'm just
> saying that having a driver that works ok in the gate but does not work
> for real deployments is of no more value that not having it at all.
> Maybe less value.
> How do you propose to solve the problem of getting more ceilometer tests
> into the gate in the short-run? As a practical measure l don't see why
> it is so bad to have a separate job until the complex issue of whether
> it is possible to have a real-world performant sqla backend is resolved.
> Or did I miss something and it has already been determined that sqla
> could be used for large-scale deployments if we just fixed our code?

I think right now the ball is back in the ceilometer court to do some
performance profiling, and lets see what comes of that. I don't think
we're getting more test before the release in any real way.

>> Once we get a base OS in the gate that lets us direct install mongo from
>> base packages, we can also do that. Or someone can 3rd party it today.
>> Then we'll even have comparative results to understand the differences.
> Yes. Do you know which base OS's are candidates for that?

Ubuntu 14.04 will have a sufficient level of Mongo, so some time in the
Juno cycle we should have it in the gate.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread David Kranz

On 03/20/2014 12:31 PM, Sean Dague wrote:

On 03/20/2014 11:35 AM, David Kranz wrote:

On 03/20/2014 06:15 AM, Sean Dague wrote:

On 03/20/2014 05:49 AM, Nadya Privalova wrote:

Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because "is's wrong" ?)

We are not going to install Mongo "not from base distribution", because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.

This is a claim I think needs a bit more scrutiny if by "sanely" you
mean "performant". It seems we have an integrated project that no one
would deploy using the sql db driver we have in the gate. Is any one
doing that?  Is having a scalable sql back end a goal of ceilometer?

More generally, if there is functionality that is of great importance to
any cloud deployment (and we would not integrate it if we didn't think
it was) that cannot be deployed at scale using sqla, are we really going
to say it should not be a part of OpenStack because we refuse, for
whatever reason, to run it in our gate using a driver that would
actually be used? And if we do demand an sqla backend, how much time
should we spend trying to optimize it if no one will really use it?
Though the slow heat job is a little different because the slowness
comes directly from running real use cases, perhaps we should just set
up a "slow ceilometer" job if the sql version is too slow for its budget
in the main job.

It seems like there is a similar thread, at least in part, about this
around marconi.

We required a non mongo backend to graduate ceilometer. So I don't think
it's too much to ask that it actually works.

If the answer is that it will never work and it was a checkbox with no
intent to make it work, then it should be deprecated and removed from
the tree in Juno, with a big WARNING that you shouldn't ever use that
backend. Like Nova now does with all the virt drivers that aren't tested
upstream.

Shipping in tree code that you don't want people to use is bad for
users. Either commit to making it work, or deprecate it and remove it.

I don't see this as the same issue as the slow heat job. Heat,
architecturally, is going to be slow. It spins up real OSes and does
real thinks to them. There is no way that's ever going to be fast, and
the dedicated job was a recognition that to support this level of
services in OpenStack we need to give them more breathing room.
Peace. I specifically noted that difference in my original comment. And 
for that reason the heat slow job may not be temporary.


Architecturally Ceilometer should not be this expensive. We've got some
data showing it to be aberrant from where we believe it should be. We
should fix that.
There are plenty of cases where we have had code that passes gate tests 
with acceptable performance but falls over in real deployment. I'm just 
saying that having a driver that works ok in the gate but does not work 
for real deployments is of no more value that not having it at all. 
Maybe less value.
How do you propose to solve the problem of getting more ceilometer tests 
into the gate in the short-run? As a practical measure l don't see why 
it is so bad to have a separate job until the complex issue of whether 
it is possible to have a real-world performant sqla backend is resolved. 
Or did I miss something and it has already been determined that sqla 
could be used for large-scale deployments if we just fixed our code?


Once we get a base OS in the gate that lets us direct install mongo from
base packages, we can also do that. Or someone can 3rd party it today.
Then we'll even have comparative results to understand the differences.

Yes. Do you know which base OS's are candidates for that?

 -David



-Sean



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Sean Dague
On 03/20/2014 11:35 AM, David Kranz wrote:
> On 03/20/2014 06:15 AM, Sean Dague wrote:
>> On 03/20/2014 05:49 AM, Nadya Privalova wrote:
>>> Hi all,
>>> First of all, thanks for your suggestions!
>>>
>>> To summarize the discussions here:
>>> 1. We are not going to install Mongo (because "is's wrong" ?)
>> We are not going to install Mongo "not from base distribution", because
>> we don't do that for things that aren't python. Our assumption is
>> dependent services come from the base OS.
>>
>> That being said, being an integrated project means you have to be able
>> to function, sanely, on an sqla backend, as that will always be part of
>> your gate.
> This is a claim I think needs a bit more scrutiny if by "sanely" you
> mean "performant". It seems we have an integrated project that no one
> would deploy using the sql db driver we have in the gate. Is any one
> doing that?  Is having a scalable sql back end a goal of ceilometer?
> 
> More generally, if there is functionality that is of great importance to
> any cloud deployment (and we would not integrate it if we didn't think
> it was) that cannot be deployed at scale using sqla, are we really going
> to say it should not be a part of OpenStack because we refuse, for
> whatever reason, to run it in our gate using a driver that would
> actually be used? And if we do demand an sqla backend, how much time
> should we spend trying to optimize it if no one will really use it?
> Though the slow heat job is a little different because the slowness
> comes directly from running real use cases, perhaps we should just set
> up a "slow ceilometer" job if the sql version is too slow for its budget
> in the main job.
> 
> It seems like there is a similar thread, at least in part, about this
> around marconi.

We required a non mongo backend to graduate ceilometer. So I don't think
it's too much to ask that it actually works.

If the answer is that it will never work and it was a checkbox with no
intent to make it work, then it should be deprecated and removed from
the tree in Juno, with a big WARNING that you shouldn't ever use that
backend. Like Nova now does with all the virt drivers that aren't tested
upstream.

Shipping in tree code that you don't want people to use is bad for
users. Either commit to making it work, or deprecate it and remove it.

I don't see this as the same issue as the slow heat job. Heat,
architecturally, is going to be slow. It spins up real OSes and does
real thinks to them. There is no way that's ever going to be fast, and
the dedicated job was a recognition that to support this level of
services in OpenStack we need to give them more breathing room.

Architecturally Ceilometer should not be this expensive. We've got some
data showing it to be aberrant from where we believe it should be. We
should fix that.

Once we get a base OS in the gate that lets us direct install mongo from
base packages, we can also do that. Or someone can 3rd party it today.
Then we'll even have comparative results to understand the differences.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread David Kranz

On 03/20/2014 06:15 AM, Sean Dague wrote:

On 03/20/2014 05:49 AM, Nadya Privalova wrote:

Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because "is's wrong" ?)

We are not going to install Mongo "not from base distribution", because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.
This is a claim I think needs a bit more scrutiny if by "sanely" you 
mean "performant". It seems we have an integrated project that no one 
would deploy using the sql db driver we have in the gate. Is any one 
doing that?  Is having a scalable sql back end a goal of ceilometer?


More generally, if there is functionality that is of great importance to 
any cloud deployment (and we would not integrate it if we didn't think 
it was) that cannot be deployed at scale using sqla, are we really going 
to say it should not be a part of OpenStack because we refuse, for 
whatever reason, to run it in our gate using a driver that would 
actually be used? And if we do demand an sqla backend, how much time 
should we spend trying to optimize it if no one will really use it? 
Though the slow heat job is a little different because the slowness 
comes directly from running real use cases, perhaps we should just set 
up a "slow ceilometer" job if the sql version is too slow for its budget 
in the main job.


It seems like there is a similar thread, at least in part, about this 
around marconi.


 -David








___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Tim Bell
We're using a dedicated MongoDB instance for ceilometer and a distinct DB for 
each of the Nova cells.

Tim

From: Nadya Privalova [mailto:nprival...@mirantis.com]
Sent: 20 March 2014 13:27
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer 
tempest testing in gate

Tim, yep. If you use one db for Ceilometer and Nova then nova's performance may 
be affected. I've seen this issue.
Will start profiling ASAP.

On Thu, Mar 20, 2014 at 3:59 PM, Tim Bell 
mailto:tim.b...@cern.ch>> wrote:

+1 for performance analysis to understand what needs to be optimised. Metering 
should be light-weight.

For those of us running in production, we don't have an option to turn 
ceilometer off some of the time. That we are not able to run through the gate 
tests hints that there are optimisations that are needed.

For example, turning on ceilometer caused a 16x increase in our Nova API call 
rate, see 
http://openstack-in-production.blogspot.ch/2014/03/cern-cloud-architecture-update-for.html
 for details.

Tim

> -Original Message-
> From: Sean Dague [mailto:s...@dague.net<mailto:s...@dague.net>]
> Sent: 20 March 2014 11:16
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer 
> tempest testing in gate
>
...
>
> While I agree that Tempest's job is not to test performance, we do have to 
> give some basic sanity checking here that the software is running in some
> performance profile that we believe is base usable.
>
> Based on the latest dstat results, I think that's a dubious assessment.
> The answer on the collector side has to be something other than horizontal 
> scaling. Because we're talking about the collector being the 3rd highest
> utilized process on the box right now (we should write a dstat plugin to give 
> us cumulative data, just haven't gotten there yet).
>
> So right now, I think performance analysis for ceilometer on sqla is 
> important, really important. Not just horizontal scaling, but actual
> performance profiling.
>
>   -Sean
>
> --
> Sean Dague
> Samsung Research America
> s...@dague.net<mailto:s...@dague.net> / 
> sean.da...@samsung.com<mailto:sean.da...@samsung.com>
> http://dague.net
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Christian Berendt
On 03/20/2014 01:27 PM, Nadya Privalova wrote:
> Tim, yep. If you use one db for Ceilometer and Nova then nova's
> performance may be affected.

If I understood it correctly the problem is not the higher load produced
directly by Ceilometer on the database. The problem is that the
Ceilometer compute agent sends a lot of Nova API calls and this results
in a higher load on the nova-api services. Tim mentioned a factor of 16.

Christian.

-- 
Christian Berendt
Cloud Computing Solution Architect
Mail: bere...@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Nadya Privalova
Tim, yep. If you use one db for Ceilometer and Nova then nova's performance
may be affected. I've seen this issue.
Will start profiling ASAP.


On Thu, Mar 20, 2014 at 3:59 PM, Tim Bell  wrote:

>
> +1 for performance analysis to understand what needs to be optimised.
> Metering should be light-weight.
>
> For those of us running in production, we don't have an option to turn
> ceilometer off some of the time. That we are not able to run through the
> gate tests hints that there are optimisations that are needed.
>
> For example, turning on ceilometer caused a 16x increase in our Nova API
> call rate, see
> http://openstack-in-production.blogspot.ch/2014/03/cern-cloud-architecture-update-for.htmlfor
>  details.
>
> Tim
>
> > -Original Message-
> > From: Sean Dague [mailto:s...@dague.net]
> > Sent: 20 March 2014 11:16
> > To: OpenStack Development Mailing List (not for usage questions)
> > Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer
> tempest testing in gate
> >
> ...
> >
> > While I agree that Tempest's job is not to test performance, we do have
> to give some basic sanity checking here that the software is running in some
> > performance profile that we believe is base usable.
> >
> > Based on the latest dstat results, I think that's a dubious assessment.
> > The answer on the collector side has to be something other than
> horizontal scaling. Because we're talking about the collector being the 3rd
> highest
> > utilized process on the box right now (we should write a dstat plugin to
> give us cumulative data, just haven't gotten there yet).
> >
> > So right now, I think performance analysis for ceilometer on sqla is
> important, really important. Not just horizontal scaling, but actual
> > performance profiling.
> >
> >   -Sean
> >
> > --
> > Sean Dague
> > Samsung Research America
> > s...@dague.net / sean.da...@samsung.com
> > http://dague.net
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Tim Bell

+1 for performance analysis to understand what needs to be optimised. Metering 
should be light-weight.

For those of us running in production, we don't have an option to turn 
ceilometer off some of the time. That we are not able to run through the gate 
tests hints that there are optimisations that are needed.

For example, turning on ceilometer caused a 16x increase in our Nova API call 
rate, see 
http://openstack-in-production.blogspot.ch/2014/03/cern-cloud-architecture-update-for.html
 for details.

Tim

> -Original Message-
> From: Sean Dague [mailto:s...@dague.net]
> Sent: 20 March 2014 11:16
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer 
> tempest testing in gate
> 
...
> 
> While I agree that Tempest's job is not to test performance, we do have to 
> give some basic sanity checking here that the software is running in some
> performance profile that we believe is base usable.
>
> Based on the latest dstat results, I think that's a dubious assessment.
> The answer on the collector side has to be something other than horizontal 
> scaling. Because we're talking about the collector being the 3rd highest
> utilized process on the box right now (we should write a dstat plugin to give 
> us cumulative data, just haven't gotten there yet).
>
> So right now, I think performance analysis for ceilometer on sqla is 
> important, really important. Not just horizontal scaling, but actual
> performance profiling.
> 
>   -Sean
> 
> --
> Sean Dague
> Samsung Research America
> s...@dague.net / sean.da...@samsung.com
> http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Nadya Privalova
Sean, thank for analysis.
JFYI, I did some initial profiling, it's described here
https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg19030.html.


On Thu, Mar 20, 2014 at 2:15 PM, Sean Dague  wrote:

> On 03/20/2014 05:49 AM, Nadya Privalova wrote:
> > Hi all,
> > First of all, thanks for your suggestions!
> >
> > To summarize the discussions here:
> > 1. We are not going to install Mongo (because "is's wrong" ?)
>
> We are not going to install Mongo "not from base distribution", because
> we don't do that for things that aren't python. Our assumption is
> dependent services come from the base OS.
>
> That being said, being an integrated project means you have to be able
> to function, sanely, on an sqla backend, as that will always be part of
> your gate.
>
> > 2. Idea about spawning several collectors is suspicious (btw there is a
> > patch that run several collectors:
> > https://review.openstack.org/#/c/79962/ .)
>
> Correct, given that the collector is already one of the most expensive
> processes in a devstack run.
>
> > Let's try to get back to original problem. All these solutions were
> > suggested to solve the problem of high load on Ceilometer. AFAIK,
> > Tempest's goal is to test projects` interactions, not performance
> > testing. The perfect tempest's behaviour would be "start ceilometer only
> > for Ceilometer tests". From one hand it will allow not to load db during
> > other tests, from the other hand "projects` interactions" will be tested
> > because during Ceilometer test we create volums, images and instances.
> > But I'm afraid that this scenario is not possible technically.
> > There is one more idea. Make Ceilometer able to monitor not all messages
> > but filtered set of messages. But anyway this is a new feature and
> > cannot be added right now.
> >
> > Tempest guys, if you have any thoughts about first suggestion "start
> > ceilometer only for Ceilometer tests" please share.
>
> The point of the gate is that it's integrated and testing the
> interaction between projects. Ceilometer can be tested on it's own in
> ceilometer unit tests, or by creating ceilometer functional tests that
> only run on the ceilometer jobs.
>
> While I agree that Tempest's job is not to test performance, we do have
> to give some basic sanity checking here that the software is running in
> some performance profile that we believe is base usable.
>
> Based on the latest dstat results, I think that's a dubious assessment.
> The answer on the collector side has to be something other than
> horizontal scaling. Because we're talking about the collector being the
> 3rd highest utilized process on the box right now (we should write a
> dstat plugin to give us cumulative data, just haven't gotten there yet).
>
> So right now, I think performance analysis for ceilometer on sqla is
> important, really important. Not just horizontal scaling, but actual
> performance profiling.
>
> -Sean
>
> --
> Sean Dague
> Samsung Research America
> s...@dague.net / sean.da...@samsung.com
> http://dague.net
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Sean Dague
On 03/20/2014 05:49 AM, Nadya Privalova wrote:
> Hi all,
> First of all, thanks for your suggestions!
> 
> To summarize the discussions here:
> 1. We are not going to install Mongo (because "is's wrong" ?)

We are not going to install Mongo "not from base distribution", because
we don't do that for things that aren't python. Our assumption is
dependent services come from the base OS.

That being said, being an integrated project means you have to be able
to function, sanely, on an sqla backend, as that will always be part of
your gate.

> 2. Idea about spawning several collectors is suspicious (btw there is a
> patch that run several collectors:
> https://review.openstack.org/#/c/79962/ .)

Correct, given that the collector is already one of the most expensive
processes in a devstack run.

> Let's try to get back to original problem. All these solutions were
> suggested to solve the problem of high load on Ceilometer. AFAIK,
> Tempest's goal is to test projects` interactions, not performance
> testing. The perfect tempest's behaviour would be "start ceilometer only
> for Ceilometer tests". From one hand it will allow not to load db during
> other tests, from the other hand "projects` interactions" will be tested
> because during Ceilometer test we create volums, images and instances.
> But I'm afraid that this scenario is not possible technically.
> There is one more idea. Make Ceilometer able to monitor not all messages
> but filtered set of messages. But anyway this is a new feature and
> cannot be added right now.
> 
> Tempest guys, if you have any thoughts about first suggestion "start
> ceilometer only for Ceilometer tests" please share.

The point of the gate is that it's integrated and testing the
interaction between projects. Ceilometer can be tested on it's own in
ceilometer unit tests, or by creating ceilometer functional tests that
only run on the ceilometer jobs.

While I agree that Tempest's job is not to test performance, we do have
to give some basic sanity checking here that the software is running in
some performance profile that we believe is base usable.

Based on the latest dstat results, I think that's a dubious assessment.
The answer on the collector side has to be something other than
horizontal scaling. Because we're talking about the collector being the
3rd highest utilized process on the box right now (we should write a
dstat plugin to give us cumulative data, just haven't gotten there yet).

So right now, I think performance analysis for ceilometer on sqla is
important, really important. Not just horizontal scaling, but actual
performance profiling.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-20 Thread Nadya Privalova
Hi all,
First of all, thanks for your suggestions!

To summarize the discussions here:
1. We are not going to install Mongo (because "is's wrong" ?)
2. Idea about spawning several collectors is suspicious (btw there is a
patch that run several collectors: https://review.openstack.org/#/c/79962/.)

Let's try to get back to original problem. All these solutions were
suggested to solve the problem of high load on Ceilometer. AFAIK, Tempest's
goal is to test projects` interactions, not performance testing. The
perfect tempest's behaviour would be "start ceilometer only for Ceilometer
tests". From one hand it will allow not to load db during other tests, from
the other hand "projects` interactions" will be tested because during
Ceilometer test we create volums, images and instances. But I'm afraid that
this scenario is not possible technically.
There is one more idea. Make Ceilometer able to monitor not all messages
but filtered set of messages. But anyway this is a new feature and cannot
be added right now.

Tempest guys, if you have any thoughts about first suggestion "start
ceilometer only for Ceilometer tests" please share.

Thanks,
Nadya




On Thu, Mar 20, 2014 at 3:23 AM, Sean Dague  wrote:

> On 03/19/2014 06:09 PM, Doug Hellmann wrote:
> > The ceilometer collector is meant to scale horizontally. Have you tried
> > configuring the test environment to run more than one copy, to process
> > the notifications more quickly?
>
> The ceilometer collector is already one of the top running processes on
> the box -
>
> http://logs.openstack.org/82/81282/2/check/check-tempest-dsvm-full/693dc3b/logs/dstat.txt.gz
>
>
> Often consuming > 1/2 a core (25% == 1 core in that run, as can been
> seen when qemu boots and pegs one).
>
> So while we could spin up more collectors, I think it's unreasonable
> that the majority of our cpu has to be handed over to the metric
> collector to make it function responsively. I thought the design point
> was that this was low impact.
>
> -Sean
>
> --
> Sean Dague
> Samsung Research America
> s...@dague.net / sean.da...@samsung.com
> http://dague.net
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Sean Dague
On 03/19/2014 06:09 PM, Doug Hellmann wrote:
> The ceilometer collector is meant to scale horizontally. Have you tried
> configuring the test environment to run more than one copy, to process
> the notifications more quickly?

The ceilometer collector is already one of the top running processes on
the box -
http://logs.openstack.org/82/81282/2/check/check-tempest-dsvm-full/693dc3b/logs/dstat.txt.gz


Often consuming > 1/2 a core (25% == 1 core in that run, as can been
seen when qemu boots and pegs one).

So while we could spin up more collectors, I think it's unreasonable
that the majority of our cpu has to be handed over to the metric
collector to make it function responsively. I thought the design point
was that this was low impact.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Joe Gordon
On Wed, Mar 19, 2014 at 3:09 PM, Doug Hellmann
wrote:

> The ceilometer collector is meant to scale horizontally. Have you tried
> configuring the test environment to run more than one copy, to process the
> notifications more quickly?
>

FYI:
http://logs.openstack.org/82/79182/1/check/check-tempest-dsvm-neutron/156f1d4/logs/screen-dstat.txt.gz



>
> Doug
>
>
> On Tue, Mar 18, 2014 at 8:09 AM, Nadya Privalova 
> wrote:
>
>> Hi folks,
>>
>> I'd like to discuss Ceilometer's tempest situation with you.
>> Now we have several patch sets on review that test core functionality of
>> Ceilometer: notificaton and pollstering (topic
>> https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:bp/add-basic-ceilometer-tests,n,z).
>> But there is a problem: Ceilometer performance is very poor on mysql and
>> postgresql because of the bug
>> https://bugs.launchpad.net/ceilometer/+bug/1291054. Mongo behaves much
>> better even in single thread and I hope that it's performance will be
>> enough to successfully run Ceilometer tempest tests.
>> Let me explain in several words why tempest tests is mostly performance
>> tests for Ceilometer. The thing is that Ceilometer service is running
>> during all other nova, cinder and so on tests run. All the tests create
>> instances, volumes and each creation produces a lot of notifications. Each
>> notification is the entry to database. So Ceilometer cannot process such a
>> big amount of notifications quickly. Ceilometer tests have 'telemetry'
>> prefix and it means that they will be started in the last turn. And it
>> makes situation even worst.
>> So my proposal:
>> 1. create a non-voting job with Mongo-backend
>> 2. make sure that tests pass on Mongo
>> 3. merge tests to tempest but skip that on postgres and mysql till
>> bug/1291054 is resolved
>> 4. make the new job 'voting'
>>
>> The problem is only in Mongo installation. I have a cr
>> https://review.openstack.org/#/c/81001/ that will allow us to install
>> Mongo from deb. From the other hand there is
>> https://review.openstack.org/#/c/74889/ that enables UCA. I'm
>> collaborating with infra-team to make the decision ASAP because AFAIU we
>> need tempest tests in Icehouse (for more discussion you are welcome to
>> thread  [openstack-dev] Updating libvirt in gate jobs).
>>
>> If you have any thoughts on this please share.
>>
>> Thanks for attention,
>> Nadya
>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Doug Hellmann
The ceilometer collector is meant to scale horizontally. Have you tried
configuring the test environment to run more than one copy, to process the
notifications more quickly?

Doug


On Tue, Mar 18, 2014 at 8:09 AM, Nadya Privalova wrote:

> Hi folks,
>
> I'd like to discuss Ceilometer's tempest situation with you.
> Now we have several patch sets on review that test core functionality of
> Ceilometer: notificaton and pollstering (topic
> https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:bp/add-basic-ceilometer-tests,n,z).
> But there is a problem: Ceilometer performance is very poor on mysql and
> postgresql because of the bug
> https://bugs.launchpad.net/ceilometer/+bug/1291054. Mongo behaves much
> better even in single thread and I hope that it's performance will be
> enough to successfully run Ceilometer tempest tests.
> Let me explain in several words why tempest tests is mostly performance
> tests for Ceilometer. The thing is that Ceilometer service is running
> during all other nova, cinder and so on tests run. All the tests create
> instances, volumes and each creation produces a lot of notifications. Each
> notification is the entry to database. So Ceilometer cannot process such a
> big amount of notifications quickly. Ceilometer tests have 'telemetry'
> prefix and it means that they will be started in the last turn. And it
> makes situation even worst.
> So my proposal:
> 1. create a non-voting job with Mongo-backend
> 2. make sure that tests pass on Mongo
> 3. merge tests to tempest but skip that on postgres and mysql till
> bug/1291054 is resolved
> 4. make the new job 'voting'
>
> The problem is only in Mongo installation. I have a cr
> https://review.openstack.org/#/c/81001/ that will allow us to install
> Mongo from deb. From the other hand there is
> https://review.openstack.org/#/c/74889/ that enables UCA. I'm
> collaborating with infra-team to make the decision ASAP because AFAIU we
> need tempest tests in Icehouse (for more discussion you are welcome to
> thread  [openstack-dev] Updating libvirt in gate jobs).
>
> If you have any thoughts on this please share.
>
> Thanks for attention,
> Nadya
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Joe Gordon
On Wed, Mar 19, 2014 at 1:52 AM, Nadya Privalova wrote:

> Ok, so we don't want to switch to UCA, let's consider this variant.
> What options do we have to make possible to run Ceilometer jobs with Mongo
> backend?
> I see only  https://review.openstack.org/#/c/81001/ or making Ceilometer
> able to work with old Mongo. But the last variant looks inappropriate at
> least in Icehouse.
> What am I missing here? Maybe there is smth else we can do?
>
>
If ceilometer says it supports MySQL then it should work, we shouldn't be
forced to switch to an alternate backend.



>
> On Tue, Mar 18, 2014 at 9:28 PM, Tim Bell  wrote:
>
>>
>>
>> If UCA is required, what would be the upgrade path for a currently
>> running OpenStack Havana site to Icehouse with this requirement ?
>>
>>
>>
>> Would it be an online upgrade (i.e. what order to upgrade the different
>> components in order to keep things running at all times) ?
>>
>>
>>
>> Tim
>>
>>
>>
>> *From:* Chmouel Boudjnah [mailto:chmo...@enovance.com]
>> *Sent:* 18 March 2014 17:58
>> *To:* OpenStack Development Mailing List (not for usage questions)
>> *Subject:* Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra]
>> Ceilometer tempest testing in gate
>>
>>
>>
>>
>>
>> On Tue, Mar 18, 2014 at 5:21 PM, Sean Dague  wrote:
>>
>>  So I'm still -1 at the point in making UCA our default run environment
>> until it's provably functional for a period of time. Because working
>> around upstream distro breaks is no fun.
>>
>>
>>
>> I agree, if UCA is not very stable ATM, this os going to cause us more
>> pain, but what would be the plan of action? a non-voting gate for
>> ceilometer as a start ? (if that's possible).
>>
>> Chmouel
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-19 Thread Nadya Privalova
Ok, so we don't want to switch to UCA, let's consider this variant.
What options do we have to make possible to run Ceilometer jobs with Mongo
backend?
I see only  https://review.openstack.org/#/c/81001/ or making Ceilometer
able to work with old Mongo. But the last variant looks inappropriate at
least in Icehouse.
What am I missing here? Maybe there is smth else we can do?


On Tue, Mar 18, 2014 at 9:28 PM, Tim Bell  wrote:

>
>
> If UCA is required, what would be the upgrade path for a currently running
> OpenStack Havana site to Icehouse with this requirement ?
>
>
>
> Would it be an online upgrade (i.e. what order to upgrade the different
> components in order to keep things running at all times) ?
>
>
>
> Tim
>
>
>
> *From:* Chmouel Boudjnah [mailto:chmo...@enovance.com]
> *Sent:* 18 March 2014 17:58
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra]
> Ceilometer tempest testing in gate
>
>
>
>
>
> On Tue, Mar 18, 2014 at 5:21 PM, Sean Dague  wrote:
>
>  So I'm still -1 at the point in making UCA our default run environment
> until it's provably functional for a period of time. Because working
> around upstream distro breaks is no fun.
>
>
>
> I agree, if UCA is not very stable ATM, this os going to cause us more
> pain, but what would be the plan of action? a non-voting gate for
> ceilometer as a start ? (if that's possible).
>
> Chmouel
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Tim Bell

If UCA is required, what would be the upgrade path for a currently running 
OpenStack Havana site to Icehouse with this requirement ?

Would it be an online upgrade (i.e. what order to upgrade the different 
components in order to keep things running at all times) ?

Tim

From: Chmouel Boudjnah [mailto:chmo...@enovance.com]
Sent: 18 March 2014 17:58
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer 
tempest testing in gate


On Tue, Mar 18, 2014 at 5:21 PM, Sean Dague 
mailto:s...@dague.net>> wrote:
So I'm still -1 at the point in making UCA our default run environment
until it's provably functional for a period of time. Because working
around upstream distro breaks is no fun.

I agree, if UCA is not very stable ATM, this os going to cause us more pain, 
but what would be the plan of action? a non-voting gate for ceilometer as a 
start ? (if that's possible).

Chmouel
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Chmouel Boudjnah
On Tue, Mar 18, 2014 at 5:21 PM, Sean Dague  wrote:

> So I'm still -1 at the point in making UCA our default run environment
> until it's provably functional for a period of time. Because working
> around upstream distro breaks is no fun.
>


I agree, if UCA is not very stable ATM, this os going to cause us more
pain, but what would be the plan of action? a non-voting gate for
ceilometer as a start ? (if that's possible).

Chmouel
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Sean Dague
On 03/18/2014 12:09 PM, Chmouel Boudjnah wrote:
> 
> On Tue, Mar 18, 2014 at 2:09 PM, Sean Dague  > wrote:
> 
> We've not required UCA for any other project to pass the gate.
> 
> 
> 
> Is it that bad to have UCA in default devstack, as far as I know UCA is
> the official way to do OpenStack on ubuntu, right?

Currently we can't use it because libvirt in UCA remains too buggy to
run under the gate. If we had it turned on we'd see an astronomical
failure rate.

That is hopefully getting fixed, thanks to a lot of leg work by dims, as
it's required a lot of chasing.

However, I still believe UCA remains problematic, because our
experiences to date are basically that the entrance criteria for content
in UCA is clearly less than the base distro. And we are very likely to
be broken by changes put into it, as seen by the inability to run our
tests on top of it.

So I'm still -1 at the point in making UCA our default run environment
until it's provably functional for a period of time. Because working
around upstream distro breaks is no fun.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Chmouel Boudjnah
On Tue, Mar 18, 2014 at 2:09 PM, Sean Dague  wrote:

> We've not required UCA for any other project to pass the gate.



Is it that bad to have UCA in default devstack, as far as I know UCA is the
official way to do OpenStack on ubuntu, right?

Chmouel.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Julien Danjou
On Tue, Mar 18 2014, Sean Dague wrote:

> We've not required UCA for any other project to pass the gate. So what
> is the issue with Mongo 2.0.4 that makes it unsupportable in ceilometer?

We require features not present in MongoDB < 2.2.

-- 
Julien Danjou
-- Free Software hacker
-- http://julien.danjou.info


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Sean Dague
On 03/18/2014 09:02 AM, Julien Danjou wrote:
> On Tue, Mar 18 2014, Sean Dague wrote:
> 
>> There is a fundamental problem here that the Ceilometer team requires a
>> version of Mongo that's not provided by the distro. We've taken a pretty
>> hard line on not requiring newer versions of non python stuff than the
>> distros we support actually have.
> 
> MongoDB 2.4 is in UCA for a while now. We just can't use it because of
> libvirt bug https://bugs.launchpad.net/nova/+bug/1228977.

We've not required UCA for any other project to pass the gate. So what
is the issue with Mongo 2.0.4 that makes it unsupportable in ceilometer?

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Julien Danjou
On Tue, Mar 18 2014, Sean Dague wrote:

> There is a fundamental problem here that the Ceilometer team requires a
> version of Mongo that's not provided by the distro. We've taken a pretty
> hard line on not requiring newer versions of non python stuff than the
> distros we support actually have.

MongoDB 2.4 is in UCA for a while now. We just can't use it because of
libvirt bug https://bugs.launchpad.net/nova/+bug/1228977.

-- 
Julien Danjou
/* Free Software hacker
   http://julien.danjou.info */


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

2014-03-18 Thread Sean Dague
On 03/18/2014 08:09 AM, Nadya Privalova wrote:
> Hi folks,
> 
> I'd like to discuss Ceilometer's tempest situation with you.
> Now we have several patch sets on review that test core functionality of
> Ceilometer: notificaton and pollstering (topic
> https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:bp/add-basic-ceilometer-tests,n,z).
> But there is a problem: Ceilometer performance is very poor on mysql and
> postgresql because of the bug
> https://bugs.launchpad.net/ceilometer/+bug/1291054. Mongo behaves much
> better even in single thread and I hope that it's performance will be
> enough to successfully run Ceilometer tempest tests.
> Let me explain in several words why tempest tests is mostly performance
> tests for Ceilometer. The thing is that Ceilometer service is running
> during all other nova, cinder and so on tests run. All the tests create
> instances, volumes and each creation produces a lot of notifications.
> Each notification is the entry to database. So Ceilometer cannot process
> such a big amount of notifications quickly. Ceilometer tests have
> 'telemetry' prefix and it means that they will be started in the last
> turn. And it makes situation even worst.
> So my proposal:
> 1. create a non-voting job with Mongo-backend
> 2. make sure that tests pass on Mongo
> 3. merge tests to tempest but skip that on postgres and mysql till
> bug/1291054 is resolved
> 4. make the new job 'voting'
> 
> The problem is only in Mongo installation. I have a cr
> https://review.openstack.org/#/c/81001/ that will allow us to install
> Mongo from deb. From the other hand there is
> https://review.openstack.org/#/c/74889/ that enables UCA. I'm
> collaborating with infra-team to make the decision ASAP because AFAIU we
> need tempest tests in Icehouse (for more discussion you are welcome to
> thread  [openstack-dev] Updating libvirt in gate jobs).
> 
> If you have any thoughts on this please share.

There is a fundamental problem here that the Ceilometer team requires a
version of Mongo that's not provided by the distro. We've taken a pretty
hard line on not requiring newer versions of non python stuff than the
distros we support actually have.

And the SQL backend is basically unusable from what I can tell.

So I'm -2 on injecting an arbitrary upstream Mongo in devstack.

What is preventing Ceilometer from bringing back support for the mongo
that you can get from 12.04? That seems like it should be the much
higher priority item. Then we could actually be gating Ceilometer
features on what the platforms can actually support. Then I'd be happy
to support a Mongo job running in tests.

Once that was done, we can start unpacking some of the other issues.

I'm not sure how changing to using 4 cores in the gate is going to
reduce the list command from 120s to 2s, so that doesn't really seem to
be the core issue (and is likely to just cause db deadlocks).

As long as Ceilometer says it supports SQL backends, it needs to do so
in a sane way. So that should still be gating.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev