Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-20 Thread Sampath Priyankara
Hi,

  Re-architecting the schema might fix most of the performance issues of
resource_list.  
  And also, must do some work to improve the performance of meter-list.
  Is the Gordon's blue print gonna cover the both aspects ?
  https://blueprints.launchpad.net/ceilometer/+spec/big-data-sql
  
Sampath

-Original Message-
From: Neal, Phil [mailto:phil.n...@hp.com] 
Sent: Wednesday, March 19, 2014 12:17 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list
CLI command


> -Original Message-
> From: Tim Bell [mailto:tim.b...@cern.ch]
> Sent: Monday, March 17, 2014 2:04 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer 
> resource_list CLI command
> 
> 
> At CERN, we've had similar issues when enabling telemetry. Our 
> resource-list times out after 10 minutes when the proxies for HA 
> assume there is no answer coming back. Keystone instances per cell 
> have helped the situation a little so we can collect the data but 
> there was a significant increase in load on the API endpoints.
> 
> I feel that some reference for production scale validation would be 
> beneficial as part of TC approval to leave incubation in case there 
> are issues such as this to be addressed.
> 
> Tim
> 
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: 17 March 2014 20:25
> > To: openstack-dev@lists.openstack.org
> > Subject: Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer
> resource_list CLI command
> >
> ...
> >
> > Yep. At AT&T, we had to disable calls to GET /resources without any 
> > filters
> on it. The call would return hundreds of thousands of
> > records, all being JSON-ified at the Ceilometer API endpoint, and 
> > the result
> would take minutes to return. There was no default limit
> > on the query, which meant every single records in the database was
> returned, and on even a semi-busy system, that meant
> > horrendous performance.
> >
> > Besides the problem that the SQLAlchemy driver doesn't yet support
> pagination [1], the main problem with the get_resources() call is
> > the underlying databases schema for the Sample model is wacky, and
> forces the use of a dependent subquery in the WHERE clause
> > [2] which completely kills performance of the query to get resources.
> >
> > [1]
> >
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage
> /
> impl_sqlalchemy.py#L436
> > [2]
> >
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage
> /
> impl_sqlalchemy.py#L503
> >
> > > The cli tests are supposed to be quick read-only sanity checks of 
> > > the cli functionality and really shouldn't ever be on the list of 
> > > slowest tests for a gate run.
> >
> > Oh, the test is readonly all-right. ;) It's just that it's reading 
> > hundreds of
> thousands of records.
> >
> > >  I think there was possibly a performance regression recently in 
> > > ceilometer because from I can tell this test used to normally take ~60
sec.
> > > (which honestly is probably too slow for a cli test too) but it is 
> > > currently much slower than that.
> > >
> > > From logstash it seems there are still some cases when the 
> > > resource list takes as long to execute as it used to, but the 
> > > majority of runs take a
> long time:
> > > http://goo.gl/smJPB9
> > >
> > > In the short term I've pushed out a patch that will remove this 
> > > test from gate
> > > runs: https://review.openstack.org/#/c/81036 But, I thought it 
> > > would be good to bring this up on the ML to try and figure out 
> > > what changed or why this is so slow.
> >
> > I agree with removing the test from the gate in the short term. 
> > Medium to
> long term, the root causes of the problem (that GET
> > /resources has no support for pagination on the query, there is no 
> > default
> for limiting results based on a since timestamp, and that
> > the underlying database schema is non-optimal) should be addressed.

Gordon has introduced a blueprint
https://blueprints.launchpad.net/ceilometer/+spec/big-data-sql with some
fixes for individual queries but +1 to the point of looking at
re-architecting the schema as an approach to fixing performance. We've also
seen some gains here at HP using batch writes as well but have temporarily
tabled that work in favor of getting a better-performing schema 

Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-18 Thread Neal, Phil

> -Original Message-
> From: Tim Bell [mailto:tim.b...@cern.ch]
> Sent: Monday, March 17, 2014 2:04 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer
> resource_list CLI command
> 
> 
> At CERN, we've had similar issues when enabling telemetry. Our resource-list
> times out after 10 minutes when the proxies for HA assume there is no
> answer coming back. Keystone instances per cell have helped the situation a
> little so we can collect the data but there was a significant increase in 
> load on
> the API endpoints.
> 
> I feel that some reference for production scale validation would be beneficial
> as part of TC approval to leave incubation in case there are issues such as 
> this
> to be addressed.
> 
> Tim
> 
> > -Original Message-
> > From: Jay Pipes [mailto:jaypi...@gmail.com]
> > Sent: 17 March 2014 20:25
> > To: openstack-dev@lists.openstack.org
> > Subject: Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer
> resource_list CLI command
> >
> ...
> >
> > Yep. At AT&T, we had to disable calls to GET /resources without any filters
> on it. The call would return hundreds of thousands of
> > records, all being JSON-ified at the Ceilometer API endpoint, and the result
> would take minutes to return. There was no default limit
> > on the query, which meant every single records in the database was
> returned, and on even a semi-busy system, that meant
> > horrendous performance.
> >
> > Besides the problem that the SQLAlchemy driver doesn't yet support
> pagination [1], the main problem with the get_resources() call is
> > the underlying databases schema for the Sample model is wacky, and
> forces the use of a dependent subquery in the WHERE clause
> > [2] which completely kills performance of the query to get resources.
> >
> > [1]
> >
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/
> impl_sqlalchemy.py#L436
> > [2]
> >
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/
> impl_sqlalchemy.py#L503
> >
> > > The cli tests are supposed to be quick read-only sanity checks of the
> > > cli functionality and really shouldn't ever be on the list of slowest
> > > tests for a gate run.
> >
> > Oh, the test is readonly all-right. ;) It's just that it's reading hundreds 
> > of
> thousands of records.
> >
> > >  I think there was possibly a performance regression recently in
> > > ceilometer because from I can tell this test used to normally take ~60 
> > > sec.
> > > (which honestly is probably too slow for a cli test too) but it is
> > > currently much slower than that.
> > >
> > > From logstash it seems there are still some cases when the resource
> > > list takes as long to execute as it used to, but the majority of runs 
> > > take a
> long time:
> > > http://goo.gl/smJPB9
> > >
> > > In the short term I've pushed out a patch that will remove this test
> > > from gate
> > > runs: https://review.openstack.org/#/c/81036 But, I thought it would
> > > be good to bring this up on the ML to try and figure out what changed
> > > or why this is so slow.
> >
> > I agree with removing the test from the gate in the short term. Medium to
> long term, the root causes of the problem (that GET
> > /resources has no support for pagination on the query, there is no default
> for limiting results based on a since timestamp, and that
> > the underlying database schema is non-optimal) should be addressed.

Gordon has introduced a blueprint 
https://blueprints.launchpad.net/ceilometer/+spec/big-data-sql with some fixes 
for individual queries but +1 to the point of looking at re-architecting the 
schema as an approach to fixing performance. We've also seen some gains here at 
HP using batch writes as well but have temporarily tabled that work in favor of 
getting a better-performing schema in place.
- Phil

> >
> > Best,
> > -jay
> >
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-17 Thread Tim Bell

At CERN, we've had similar issues when enabling telemetry. Our resource-list 
times out after 10 minutes when the proxies for HA assume there is no answer 
coming back. Keystone instances per cell have helped the situation a little so 
we can collect the data but there was a significant increase in load on the API 
endpoints.

I feel that some reference for production scale validation would be beneficial 
as part of TC approval to leave incubation in case there are issues such as 
this to be addressed.

Tim

> -Original Message-
> From: Jay Pipes [mailto:jaypi...@gmail.com]
> Sent: 17 March 2014 20:25
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list 
> CLI command
> 
...
> 
> Yep. At AT&T, we had to disable calls to GET /resources without any filters 
> on it. The call would return hundreds of thousands of
> records, all being JSON-ified at the Ceilometer API endpoint, and the result 
> would take minutes to return. There was no default limit
> on the query, which meant every single records in the database was returned, 
> and on even a semi-busy system, that meant
> horrendous performance.
> 
> Besides the problem that the SQLAlchemy driver doesn't yet support pagination 
> [1], the main problem with the get_resources() call is
> the underlying databases schema for the Sample model is wacky, and forces the 
> use of a dependent subquery in the WHERE clause
> [2] which completely kills performance of the query to get resources.
> 
> [1]
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L436
> [2]
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L503
> 
> > The cli tests are supposed to be quick read-only sanity checks of the
> > cli functionality and really shouldn't ever be on the list of slowest
> > tests for a gate run.
> 
> Oh, the test is readonly all-right. ;) It's just that it's reading hundreds 
> of thousands of records.
> 
> >  I think there was possibly a performance regression recently in
> > ceilometer because from I can tell this test used to normally take ~60 sec.
> > (which honestly is probably too slow for a cli test too) but it is
> > currently much slower than that.
> >
> > From logstash it seems there are still some cases when the resource
> > list takes as long to execute as it used to, but the majority of runs take 
> > a long time:
> > http://goo.gl/smJPB9
> >
> > In the short term I've pushed out a patch that will remove this test
> > from gate
> > runs: https://review.openstack.org/#/c/81036 But, I thought it would
> > be good to bring this up on the ML to try and figure out what changed
> > or why this is so slow.
> 
> I agree with removing the test from the gate in the short term. Medium to 
> long term, the root causes of the problem (that GET
> /resources has no support for pagination on the query, there is no default 
> for limiting results based on a since timestamp, and that
> the underlying database schema is non-optimal) should be addressed.
> 
> Best,
> -jay
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-17 Thread Gordon Chung
> Yep. At AT&T, we had to disable calls to GET /resources without any
> filters on it. The call would return hundreds of thousands of records,
> all being JSON-ified at the Ceilometer API endpoint, and the result
> would take minutes to return.

so the performance issue with resource-list is somewhat artificial... the 
gathering of resources itself can return back in seconds with over a 
million records... the real cost is that the api also returns a list of 
all related meters for each resource. if i disable that, resource-list 
performance is decent (debatable).

>  the main problem with the get_resources() call is the
> underlying databases schema for the Sample model is wacky, and forces
> the use of a dependent subquery in the WHERE clause [2] which completely
> kills performance of the query to get resources.

Jay, i've begun the initial steps to improve sql model and would love to 
get your opinion. i've created a bp here: 
https://blueprints.launchpad.net/ceilometer/+spec/big-data-sql (i use 'big 
data' in quotes...)

Regarding the < 2 second requirement. i haven't seen the number of records 
tempest generates but i would expect sub 2 seconds would be a good target. 
that said, as Jay mentioned, as the load/test increases there's only so 
much performance you can get with hundred thousand  to millions of records 
using an sql backend... at the very least it's going to flutuate (how much 
 is acceptable i have no clue currently).

cheers,
gordon chung
openstack, ibm software standards___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-17 Thread Joe Gordon
On Mon, Mar 17, 2014 at 12:25 PM, Sean Dague  wrote:

> On 03/17/2014 03:22 PM, Joe Gordon wrote:
> >
> >
> >
> > On Mon, Mar 17, 2014 at 11:55 AM, Matthew Treinish  > > wrote:
> >
> > Hi everyone,
> >
> > So a little while ago we noticed that in all the gate runs one of
> > the ceilometer
> > cli tests is consistently in the list of slowest tests. (and often
> > the slowest)
> > This was a bit surprising given the nature of the cli tests we
> > expect them to
> > execute very quickly.
> >
> > test_ceilometer_resource_list which just calls ceilometer
> > resource_list from the
> > CLI once is taking >=2 min to respond. For example:
> >
> http://logs.openstack.org/68/80168/3/gate/gate-tempest-dsvm-postgres-full/07ab7f5/logs/tempest.txt.gz#_2014-03-17_17_08_25_003
> > (where it takes > 3min)
> >
> > The cli tests are supposed to be quick read-only sanity checks of
> > the cli
> > functionality and really shouldn't ever be on the list of slowest
> > tests for a
> > gate run. I think there was possibly a performance regression
> > recently in
> > ceilometer because from I can tell this test used to normally take
> > ~60 sec.
> > (which honestly is probably too slow for a cli test too) but it is
> > currently
> > much slower than that.
> >
> >
> > Sounds like we should add another round of sanity checking to the CLI
> > tests: make sure all commands return within x seconds.   As a first pass
> > we can say x=60 and than crank it down in the future.
>
> So, the last thing I want to do is trigger a race here by us
> artificially timing out on tests. However I do think cli tests should be
> returning in < 2s otherwise they are not simple readonly tests.
>

Agreed, I said 60 just as a starting point.


>
> -Sean
>
> --
> Sean Dague
> Samsung Research America
> s...@dague.net / sean.da...@samsung.com
> http://dague.net
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-17 Thread Jay Pipes
On Mon, 2014-03-17 at 14:55 -0400, Matthew Treinish wrote:
> Hi everyone,
> 
> So a little while ago we noticed that in all the gate runs one of the 
> ceilometer
> cli tests is consistently in the list of slowest tests. (and often the 
> slowest)
> This was a bit surprising given the nature of the cli tests we expect them to
> execute very quickly.
> 
> test_ceilometer_resource_list which just calls ceilometer resource_list from 
> the
> CLI once is taking >=2 min to respond. For example:
> http://logs.openstack.org/68/80168/3/gate/gate-tempest-dsvm-postgres-full/07ab7f5/logs/tempest.txt.gz#_2014-03-17_17_08_25_003
> (where it takes > 3min)

Yep. At AT&T, we had to disable calls to GET /resources without any
filters on it. The call would return hundreds of thousands of records,
all being JSON-ified at the Ceilometer API endpoint, and the result
would take minutes to return. There was no default limit on the query,
which meant every single records in the database was returned, and on
even a semi-busy system, that meant horrendous performance.

Besides the problem that the SQLAlchemy driver doesn't yet support
pagination [1], the main problem with the get_resources() call is the
underlying databases schema for the Sample model is wacky, and forces
the use of a dependent subquery in the WHERE clause [2] which completely
kills performance of the query to get resources.

[1]
https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L436
[2]
https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L503

> The cli tests are supposed to be quick read-only sanity checks of the cli
> functionality and really shouldn't ever be on the list of slowest tests for a
> gate run.

Oh, the test is readonly all-right. ;) It's just that it's reading
hundreds of thousands of records.

>  I think there was possibly a performance regression recently in
> ceilometer because from I can tell this test used to normally take ~60 sec.
> (which honestly is probably too slow for a cli test too) but it is currently
> much slower than that.
> 
> From logstash it seems there are still some cases when the resource list takes
> as long to execute as it used to, but the majority of runs take a long time:
> http://goo.gl/smJPB9
> 
> In the short term I've pushed out a patch that will remove this test from gate
> runs: https://review.openstack.org/#/c/81036 But, I thought it would be good 
> to
> bring this up on the ML to try and figure out what changed or why this is so
> slow.

I agree with removing the test from the gate in the short term. Medium
to long term, the root causes of the problem (that GET /resources has no
support for pagination on the query, there is no default for limiting
results based on a since timestamp, and that the underlying database
schema is non-optimal) should be addressed.

Best,
-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-17 Thread Joe Gordon
On Mon, Mar 17, 2014 at 11:55 AM, Matthew Treinish wrote:

> Hi everyone,
>
> So a little while ago we noticed that in all the gate runs one of the
> ceilometer
> cli tests is consistently in the list of slowest tests. (and often the
> slowest)
> This was a bit surprising given the nature of the cli tests we expect them
> to
> execute very quickly.
>
> test_ceilometer_resource_list which just calls ceilometer resource_list
> from the
> CLI once is taking >=2 min to respond. For example:
>
> http://logs.openstack.org/68/80168/3/gate/gate-tempest-dsvm-postgres-full/07ab7f5/logs/tempest.txt.gz#_2014-03-17_17_08_25_003
> (where it takes > 3min)
>
> The cli tests are supposed to be quick read-only sanity checks of the cli
> functionality and really shouldn't ever be on the list of slowest tests
> for a
> gate run. I think there was possibly a performance regression recently in
> ceilometer because from I can tell this test used to normally take ~60 sec.
> (which honestly is probably too slow for a cli test too) but it is
> currently
> much slower than that.
>

Sounds like we should add another round of sanity checking to the CLI
tests: make sure all commands return within x seconds.   As a first pass we
can say x=60 and than crank it down in the future.


>
> From logstash it seems there are still some cases when the resource list
> takes
> as long to execute as it used to, but the majority of runs take a long
> time:
> http://goo.gl/smJPB9
>
> In the short term I've pushed out a patch that will remove this test from
> gate
> runs: https://review.openstack.org/#/c/81036 But, I thought it would be
> good to
> bring this up on the ML to try and figure out what changed or why this is
> so
> slow.
>
> Thanks,
>
> -Matt Treinish
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-17 Thread Sean Dague
On 03/17/2014 03:22 PM, Joe Gordon wrote:
> 
> 
> 
> On Mon, Mar 17, 2014 at 11:55 AM, Matthew Treinish  > wrote:
> 
> Hi everyone,
> 
> So a little while ago we noticed that in all the gate runs one of
> the ceilometer
> cli tests is consistently in the list of slowest tests. (and often
> the slowest)
> This was a bit surprising given the nature of the cli tests we
> expect them to
> execute very quickly.
> 
> test_ceilometer_resource_list which just calls ceilometer
> resource_list from the
> CLI once is taking >=2 min to respond. For example:
> 
> http://logs.openstack.org/68/80168/3/gate/gate-tempest-dsvm-postgres-full/07ab7f5/logs/tempest.txt.gz#_2014-03-17_17_08_25_003
> (where it takes > 3min)
> 
> The cli tests are supposed to be quick read-only sanity checks of
> the cli
> functionality and really shouldn't ever be on the list of slowest
> tests for a
> gate run. I think there was possibly a performance regression
> recently in
> ceilometer because from I can tell this test used to normally take
> ~60 sec.
> (which honestly is probably too slow for a cli test too) but it is
> currently
> much slower than that.
> 
> 
> Sounds like we should add another round of sanity checking to the CLI
> tests: make sure all commands return within x seconds.   As a first pass
> we can say x=60 and than crank it down in the future.

So, the last thing I want to do is trigger a race here by us
artificially timing out on tests. However I do think cli tests should be
returning in < 2s otherwise they are not simple readonly tests.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-17 Thread Gordon Chung
hi Matt,

> test_ceilometer_resource_list which just calls ceilometer 
> resource_list from the
> CLI once is taking >=2 min to respond. For example:
> http://logs.openstack.org/68/80168/3/gate/gate-tempest-dsvm-
> postgres-full/07ab7f5/logs/tempest.txt.gz#_2014-03-17_17_08_25_003
> (where it takes > 3min)

thanks for bringing this up... we're tracking this here: 
https://bugs.launchpad.net/ceilometer/+bug/1264434

i've put a patch out that partially fixes the issue. from bad to 
average... but i guess i should make the fix a bit more aggressive to 
bring the performance in line with the 'seconds' expectation.

cheers,
gordon chung
openstack, ibm software standards

Matthew Treinish  wrote on 17/03/2014 02:55:40 PM:

> From: Matthew Treinish 
> To: openstack-dev@lists.openstack.org
> Date: 17/03/2014 02:57 PM
> Subject: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer 
> resource_list CLI command
> 
> Hi everyone,
> 
> So a little while ago we noticed that in all the gate runs one of 
> the ceilometer
> cli tests is consistently in the list of slowest tests. (and often 
> the slowest)
> This was a bit surprising given the nature of the cli tests we expect 
them to
> execute very quickly.
> 
> test_ceilometer_resource_list which just calls ceilometer 
> resource_list from the
> CLI once is taking >=2 min to respond. For example:
> http://logs.openstack.org/68/80168/3/gate/gate-tempest-dsvm-
> postgres-full/07ab7f5/logs/tempest.txt.gz#_2014-03-17_17_08_25_003
> (where it takes > 3min)
> 
> The cli tests are supposed to be quick read-only sanity checks of the 
cli
> functionality and really shouldn't ever be on the list of slowest tests 
for a
> gate run. I think there was possibly a performance regression recently 
in
> ceilometer because from I can tell this test used to normally take ~60 
sec.
> (which honestly is probably too slow for a cli test too) but it is 
currently
> much slower than that.
> 
> From logstash it seems there are still some cases when the resource list 
takes
> as long to execute as it used to, but the majority of runs take a long 
time:
> http://goo.gl/smJPB9
> 
> In the short term I've pushed out a patch that will remove this testfrom 
gate
> runs: https://review.openstack.org/#/c/81036 But, I thought it wouldbe 
good to
> bring this up on the ML to try and figure out what changed or why this 
is so
> slow.
> 
> Thanks,
> 
> -Matt Treinish
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

2014-03-17 Thread Matthew Treinish
Hi everyone,

So a little while ago we noticed that in all the gate runs one of the ceilometer
cli tests is consistently in the list of slowest tests. (and often the slowest)
This was a bit surprising given the nature of the cli tests we expect them to
execute very quickly.

test_ceilometer_resource_list which just calls ceilometer resource_list from the
CLI once is taking >=2 min to respond. For example:
http://logs.openstack.org/68/80168/3/gate/gate-tempest-dsvm-postgres-full/07ab7f5/logs/tempest.txt.gz#_2014-03-17_17_08_25_003
(where it takes > 3min)

The cli tests are supposed to be quick read-only sanity checks of the cli
functionality and really shouldn't ever be on the list of slowest tests for a
gate run. I think there was possibly a performance regression recently in
ceilometer because from I can tell this test used to normally take ~60 sec.
(which honestly is probably too slow for a cli test too) but it is currently
much slower than that.

>From logstash it seems there are still some cases when the resource list takes
as long to execute as it used to, but the majority of runs take a long time:
http://goo.gl/smJPB9

In the short term I've pushed out a patch that will remove this test from gate
runs: https://review.openstack.org/#/c/81036 But, I thought it would be good to
bring this up on the ML to try and figure out what changed or why this is so
slow.

Thanks,

-Matt Treinish

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev