Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-23 Thread David Kranz

On 10/23/2013 05:08 PM, Rochelle.Grober wrote:


John Griffith wrote:

On Wed, Oct 23, 2013 at 8:47 AM, Sean Dague > wrote:


On 10/23/2013 10:40 AM, John Griffith wrote:




On Sun, Oct 20, 2013 at 7:38 AM, Sean Dague mailto:s...@dague.net>

>> wrote:

Dave Kranz has been building a system so that we can ensure that
during a Tempest run services don't spew ERRORs in the logs.
Eventually, we're going to gate on this, because there is nothing
that Tempest does to the system that should cause any OpenStack
service to ERROR or stack trace (Errors should actually be
exceptional events that something is wrong with the system, not
regular events).


So I have to disagree with the approach being taken here.
 Particularly
in the case of Cinder and the negative tests that are in place.
 When I
read this last week I assumed you actually meant that "Exceptions"
were
exceptional and nothing in Tempest should cause Exceptions.  It turns
out you apparently did mean Errors.  I completely disagree here,
Errors
happen, some are recovered, some are expected by the tests etc.
 Having
a policy and especially a gate that says NO ERROR MESSAGE in logs
makes
absolutely no sense to me.

Something like NO TRACE/EXCEPTION MESSAGE in logs I can agree
with, but
this makes no sense to me.  By the way, here's a perfect example:
https://bugs.launchpad.net/cinder/+bug/1243485

As long as we have Tempest tests that do things like "show
non-existent
volume" you're going to get an Error message and I think that you
should
quite frankly.


Ok, I guess that's where we probably need to clarify what "Not Found" 
is. Because "Not Found" to me seems like it should be a request at 
INFO level, not ERROR.



ERROR from an admin perspective should really be something that
would suitable for sending an alert to an administrator for them
to come and fix the cloud.

From my perspective as someone who has done Ops in the past, a
"Volume Not Found" can be either info or an error.  It all depends
on the context.  That said, we need to be able to test ERROR
conditions and ensure that they report properly as ERROR, else the
poor Ops folks will always be on the spot for not knowing that
there is a problem.  A volume that has gone missing is a problem. 
Ops would like an immediate report.  They would trigger on the

ERROR statement in the log.  On the other hand, if someone/thing
 fatfingers an input and requests something that has never
existed, then that's just info.

It is not just a case of fatfingers. Some of the delete apis are 
asynchronous and the only way to know that a delete finished is to check 
if the object still exists. Tempest does such checks to manage resource 
usage, even if there were no negative tests. The logs are not full of 
ERRORs because almost all of our apis, including nova, do not log an 
ERROR when returning 404.


I think John's point is that it can be hard or impossible to tell if an 
object is not found because it truly no longer exists (or never 
existed), or if there is something wrong with the system and the object 
really exists but is not being found. But I would argue that even if 
this is true we cannot alert the operator every time some user checks to 
see if an object is still there. So there has to be some "thing" that 
gets put in the log which says "there is a problem with the system, 
either a bug or ran out of disk or something". The appearance of that 
thing in the log is what an alert should be triggered on, and what 
should fail a gate job. That is pretty close to what ERROR is being used 
for now.


We need to be able to test for correctness of errors and process
logs with errors in them as part of the test verification. 
Perhaps a switch in the test that indicates log needs post

processing, or a way to redirect the log during a specific error
test, or some such?  The question is, how do we keep test system
logs clean of ERRORs and still test system logs for intentionally
triggered ERRORs?




--Rocky

We might be able to do that in our test framework, but it would not help 
operators. IMO the least of evils here by far is to log events 
associated with an api call that returns 4xx in a way that is 
distinguishable from how we log when we detect a system failure of some 
sort.


 -David





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-23 Thread Rochelle.Grober


John Griffith wrote:
On Wed, Oct 23, 2013 at 8:47 AM, Sean Dague 
mailto:s...@dague.net>> wrote:
On 10/23/2013 10:40 AM, John Griffith wrote:



On Sun, Oct 20, 2013 at 7:38 AM, Sean Dague 
mailto:s...@dague.net>
>> wrote:

Dave Kranz has been building a system so that we can ensure that
during a Tempest run services don't spew ERRORs in the logs.
Eventually, we're going to gate on this, because there is nothing
that Tempest does to the system that should cause any OpenStack
service to ERROR or stack trace (Errors should actually be
exceptional events that something is wrong with the system, not
regular events).


So I have to disagree with the approach being taken here.  Particularly
in the case of Cinder and the negative tests that are in place.  When I
read this last week I assumed you actually meant that "Exceptions" were
exceptional and nothing in Tempest should cause Exceptions.  It turns
out you apparently did mean Errors.  I completely disagree here, Errors
happen, some are recovered, some are expected by the tests etc.  Having
a policy and especially a gate that says NO ERROR MESSAGE in logs makes
absolutely no sense to me.

Something like NO TRACE/EXCEPTION MESSAGE in logs I can agree with, but
this makes no sense to me.  By the way, here's a perfect example:
https://bugs.launchpad.net/cinder/+bug/1243485

As long as we have Tempest tests that do things like "show non-existent
volume" you're going to get an Error message and I think that you should
quite frankly.

Ok, I guess that's where we probably need to clarify what "Not Found" is. 
Because "Not Found" to me seems like it should be a request at INFO level, not 
ERROR.


ERROR from an admin perspective should really be something that would suitable 
for sending an alert to an administrator for them to come and fix the cloud.

>From my perspective as someone who has done Ops in the past, a "Volume Not 
>Found" can be either info or an error.  It all depends on the context.  That 
>said, we need to be able to test ERROR conditions and ensure that they report 
>properly as ERROR, else the poor Ops folks will always be on the spot for not 
>knowing that there is a problem.  A volume that has gone missing is a problem. 
> Ops would like an immediate report.  They would trigger on the ERROR 
>statement in the log.  On the other hand, if someone/thing  fatfingers an 
>input and requests something that has never existed, then that's just info.

We need to be able to test for correctness of errors and process logs with 
errors in them as part of the test verification.  Perhaps a switch in the test 
that indicates log needs post processing, or a way to redirect the log during a 
specific error test, or some such?  The question is, how do we keep test system 
logs clean of ERRORs and still test system logs for intentionally triggered 
ERRORs?

--Rocky


TRACE is actually a lower level of severity in our log systems than ERROR is.

Sorry, by Trace I was referring to unhandled stack/exception trace messages in 
the logs.


-Sean

--
Sean Dague
http://dague.net


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-23 Thread John Griffith
On Wed, Oct 23, 2013 at 8:47 AM, Sean Dague  wrote:

> On 10/23/2013 10:40 AM, John Griffith wrote:
>
>>
>>
>>
>> On Sun, Oct 20, 2013 at 7:38 AM, Sean Dague > > wrote:
>>
>> Dave Kranz has been building a system so that we can ensure that
>> during a Tempest run services don't spew ERRORs in the logs.
>> Eventually, we're going to gate on this, because there is nothing
>> that Tempest does to the system that should cause any OpenStack
>> service to ERROR or stack trace (Errors should actually be
>> exceptional events that something is wrong with the system, not
>> regular events).
>>
>>
>> So I have to disagree with the approach being taken here.  Particularly
>> in the case of Cinder and the negative tests that are in place.  When I
>> read this last week I assumed you actually meant that "Exceptions" were
>> exceptional and nothing in Tempest should cause Exceptions.  It turns
>> out you apparently did mean Errors.  I completely disagree here, Errors
>> happen, some are recovered, some are expected by the tests etc.  Having
>> a policy and especially a gate that says NO ERROR MESSAGE in logs makes
>> absolutely no sense to me.
>>
>> Something like NO TRACE/EXCEPTION MESSAGE in logs I can agree with, but
>> this makes no sense to me.  By the way, here's a perfect example:
>> https://bugs.launchpad.net/**cinder/+bug/1243485
>>
>> As long as we have Tempest tests that do things like "show non-existent
>> volume" you're going to get an Error message and I think that you should
>> quite frankly.
>>
>
> Ok, I guess that's where we probably need to clarify what "Not Found" is.
> Because "Not Found" to me seems like it should be a request at INFO level,
> not ERROR.
>


> ERROR from an admin perspective should really be something that would
> suitable for sending an alert to an administrator for them to come and fix
> the cloud.
>
> TRACE is actually a lower level of severity in our log systems than ERROR
> is.


Sorry, by Trace I was referring to unhandled stack/exception trace messages
in the logs.

>
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __**_
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.**org 
> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-23 Thread Sean Dague

On 10/23/2013 10:40 AM, John Griffith wrote:




On Sun, Oct 20, 2013 at 7:38 AM, Sean Dague mailto:s...@dague.net>> wrote:

Dave Kranz has been building a system so that we can ensure that
during a Tempest run services don't spew ERRORs in the logs.
Eventually, we're going to gate on this, because there is nothing
that Tempest does to the system that should cause any OpenStack
service to ERROR or stack trace (Errors should actually be
exceptional events that something is wrong with the system, not
regular events).


So I have to disagree with the approach being taken here.  Particularly
in the case of Cinder and the negative tests that are in place.  When I
read this last week I assumed you actually meant that "Exceptions" were
exceptional and nothing in Tempest should cause Exceptions.  It turns
out you apparently did mean Errors.  I completely disagree here, Errors
happen, some are recovered, some are expected by the tests etc.  Having
a policy and especially a gate that says NO ERROR MESSAGE in logs makes
absolutely no sense to me.

Something like NO TRACE/EXCEPTION MESSAGE in logs I can agree with, but
this makes no sense to me.  By the way, here's a perfect example:
https://bugs.launchpad.net/cinder/+bug/1243485

As long as we have Tempest tests that do things like "show non-existent
volume" you're going to get an Error message and I think that you should
quite frankly.


Ok, I guess that's where we probably need to clarify what "Not Found" 
is. Because "Not Found" to me seems like it should be a request at INFO 
level, not ERROR.


ERROR from an admin perspective should really be something that would 
suitable for sending an alert to an administrator for them to come and 
fix the cloud.


TRACE is actually a lower level of severity in our log systems than 
ERROR is.


-Sean

--
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-23 Thread John Griffith
On Sun, Oct 20, 2013 at 7:38 AM, Sean Dague  wrote:

> Dave Kranz has been building a system so that we can ensure that during a
> Tempest run services don't spew ERRORs in the logs. Eventually, we're going
> to gate on this, because there is nothing that Tempest does to the system
> that should cause any OpenStack service to ERROR or stack trace (Errors
> should actually be exceptional events that something is wrong with the
> system, not regular events).
>

So I have to disagree with the approach being taken here.  Particularly in
the case of Cinder and the negative tests that are in place.  When I read
this last week I assumed you actually meant that "Exceptions" were
exceptional and nothing in Tempest should cause Exceptions.  It turns out
you apparently did mean Errors.  I completely disagree here, Errors happen,
some are recovered, some are expected by the tests etc.  Having a policy
and especially a gate that says NO ERROR MESSAGE in logs makes absolutely
no sense to me.

Something like NO TRACE/EXCEPTION MESSAGE in logs I can agree with, but
this makes no sense to me.  By the way, here's a perfect example:
https://bugs.launchpad.net/cinder/+bug/1243485

As long as we have Tempest tests that do things like "show non-existent
volume" you're going to get an Error message and I think that you should
quite frankly.



> Ceilometer is currently one of the largest offenders in dumping ERRORs in
> the gate - http://logs.openstack.org/68/**52768/1/check/check-tempest-**
> devstack-vm-full/76f83a4/**console.html#_2013-10-19_14_**51_51_271(that
>  item isn't in our whitelist yet, so you'll see a lot of it at the end
> of every run)
>
> and http://logs.openstack.org/68/**52768/1/check/check-tempest-**
> devstack-vm-full/76f83a4/logs/**screen-ceilometer-collector.**
> txt.gz?level=TRACEfor
>  full details
>
> This seems like something is wrong in the integration, and would be really
> helpful if we could get ceilometer eyes on this one to put ceilo into a non
> erroring state.
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __**_
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.**org 
> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-22 Thread Nadya Privalova
Hi guys,

I can share my experience with devstack+ceilometer. There is certainly a
problem with MongoDB, because Ceilometer requires more fresh Mongo than
devstack provides. But I didn't experienced problems with SQL.
And just a quick question about testing: are there any plans to test
Ceilometer with different db-backends in devstack? Or do you suggest that
it is not devstack's responsibility?

Thanks,
Nadya


On Tue, Oct 22, 2013 at 6:55 PM, David Kranz  wrote:

> On 10/22/2013 10:19 AM, Sean Dague wrote:
>
>> On 10/21/2013 10:27 AM, Neal, Phil wrote:
>>
>>> Sean, we currently have a BP out there to investigate basic tempest
>>> integration and I think this might fall under the same umbrella.
>>> Unfortunately I've not been able to free up my development time
>>> for it, but I've assigned it out to someone who can take a look and
>>> report back.
>>>
>>> https://blueprints.launchpad.**net/tempest/+spec/basic-**
>>> tempest-integration-for-**ceilometer
>>>
>>
>> This is kind of worse than tempest integration issues. As far as I can
>> tell ceilometer via devstack is basically non functional at all. And sort
>> of worse than non functional, it's spewing errors, a lot.
>>
>> This really ought to be a top ceilometer item to address, otherwise we
>> should probably turn off celiometer in devstack by default, because it's
>> really not working at the moment.
>>
>> -Sean
>>
>>  Here are the two errors showing up persistently that are not
> whitelisted. Such log errors are now being shown in the console log right
> after the tempest tests finish.
>
> https://bugs.launchpad.net/**ceilometer/+bug/1243251
> 2013-10-21 21:11:00.229 | 2013-10-21 21:05:20.046 5624 ERROR
> ceilometer.collector.**dispatcher.database [-] Failed to record metering
> data: QueuePool limit of size 5 overflow 10 reached, connection timed out,
> timeout 30
>
>
> https://bugs.launchpad.net/**ceilometer/+bug/1243249
> 2013-10-21 20:22:27.600 | Log File: ceilometer-alarm-evaluator
> 2013-10-21 20:22:27.600 | 2013-10-21 20:14:33.038 22760 ERROR
> ceilometer.alarm.service [-] alarm evaluation cycle failed
>
> See also 
> https://bugs.launchpad.net/**ceilometer/+bug/1237671
>
>  -David
>
>
>
> __**_
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.**org 
> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-22 Thread David Kranz

On 10/22/2013 10:19 AM, Sean Dague wrote:

On 10/21/2013 10:27 AM, Neal, Phil wrote:

Sean, we currently have a BP out there to investigate basic tempest
integration and I think this might fall under the same umbrella.
Unfortunately I've not been able to free up my development time
for it, but I've assigned it out to someone who can take a look and
report back.

https://blueprints.launchpad.net/tempest/+spec/basic-tempest-integration-for-ceilometer 



This is kind of worse than tempest integration issues. As far as I can 
tell ceilometer via devstack is basically non functional at all. And 
sort of worse than non functional, it's spewing errors, a lot.


This really ought to be a top ceilometer item to address, otherwise we 
should probably turn off celiometer in devstack by default, because 
it's really not working at the moment.


-Sean

Here are the two errors showing up persistently that are not 
whitelisted. Such log errors are now being shown in the console log 
right after the tempest tests finish.


https://bugs.launchpad.net/ceilometer/+bug/1243251
2013-10-21 21:11:00.229 | 2013-10-21 21:05:20.046 5624 ERROR 
ceilometer.collector.dispatcher.database [-] Failed to record metering 
data: QueuePool limit of size 5 overflow 10 reached, connection timed 
out, timeout 30



https://bugs.launchpad.net/ceilometer/+bug/1243249
2013-10-21 20:22:27.600 | Log File: ceilometer-alarm-evaluator
2013-10-21 20:22:27.600 | 2013-10-21 20:14:33.038 22760 ERROR 
ceilometer.alarm.service [-] alarm evaluation cycle failed


See also https://bugs.launchpad.net/ceilometer/+bug/1237671

 -David


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-22 Thread Sean Dague

On 10/21/2013 10:27 AM, Neal, Phil wrote:

Sean, we currently have a BP out there to investigate basic tempest
integration and I think this might fall under the same umbrella.
Unfortunately I've not been able to free up my development time
for it, but I've assigned it out to someone who can take a look and
report back.

https://blueprints.launchpad.net/tempest/+spec/basic-tempest-integration-for-ceilometer


This is kind of worse than tempest integration issues. As far as I can 
tell ceilometer via devstack is basically non functional at all. And 
sort of worse than non functional, it's spewing errors, a lot.


This really ought to be a top ceilometer item to address, otherwise we 
should probably turn off celiometer in devstack by default, because it's 
really not working at the moment.


-Sean

--
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-22 Thread Mehdi Abaakouk
Hi,

On Mon, Oct 21, 2013 at 02:27:44PM +, Neal, Phil wrote:
> 
> https://blueprints.launchpad.net/tempest/+spec/basic-tempest-integration-for-ceilometer

Some works have been done behind an other blueprint: 

https://blueprints.launchpad.net/tempest/+spec/add-basic-ceilometer-tests

Most of this code have been written since a while, and need to be
rebased. And some other have showed bugs in ceilometer.

Bugs discovered with gate already have fixed in gerrit, and should be merged 
soon.

> > -Original Message-
> > From: Sean Dague [mailto:s...@dague.net]
> > Sent: Sunday, October 20, 2013 7:39 AM
> > To: OpenStack Development Mailing List
> > 
> > Ceilometer is currently one of the largest offenders in dumping ERRORs
> > in the gate -
> > http://logs.openstack.org/68/52768/1/check/check-tempest-devstack-vm-
> > full/76f83a4/console.html#_2013-10-19_14_51_51_271
> > (that item isn't in our whitelist yet, so you'll see a lot of it at the
> > end of every run)
> > 
> > and
> > http://logs.openstack.org/68/52768/1/check/check-tempest-devstack-vm-
> > full/76f83a4/logs/screen-ceilometer-collector.txt.gz?level=TRACE
> > for full details

I have planned to take a look on this, this week.

Regards, 

-- 
Mehdi Abaakouk
mail: sil...@sileht.net
irc: sileht


signature.asc
Description: Digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-21 Thread Neal, Phil
Sean, we currently have a BP out there to investigate basic tempest
integration and I think this might fall under the same umbrella. 
Unfortunately I've not been able to free up my development time 
for it, but I've assigned it out to someone who can take a look and 
report back.

https://blueprints.launchpad.net/tempest/+spec/basic-tempest-integration-for-ceilometer

- Phil

> -Original Message-
> From: Sean Dague [mailto:s...@dague.net]
> Sent: Sunday, October 20, 2013 7:39 AM
> To: OpenStack Development Mailing List
> Subject: [openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal
> runs
> 
> Dave Kranz has been building a system so that we can ensure that during
> a Tempest run services don't spew ERRORs in the logs. Eventually, we're
> going to gate on this, because there is nothing that Tempest does to the
> system that should cause any OpenStack service to ERROR or stack trace
> (Errors should actually be exceptional events that something is wrong
> with the system, not regular events).
> 
> Ceilometer is currently one of the largest offenders in dumping ERRORs
> in the gate -
> http://logs.openstack.org/68/52768/1/check/check-tempest-devstack-vm-
> full/76f83a4/console.html#_2013-10-19_14_51_51_271
> (that item isn't in our whitelist yet, so you'll see a lot of it at the
> end of every run)
> 
> and
> http://logs.openstack.org/68/52768/1/check/check-tempest-devstack-vm-
> full/76f83a4/logs/screen-ceilometer-collector.txt.gz?level=TRACE
> for full details
> 
> This seems like something is wrong in the integration, and would be
> really helpful if we could get ceilometer eyes on this one to put ceilo
> into a non erroring state.
> 
>   -Sean
> 
> --
> Sean Dague
> http://dague.net
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [ceilometer] [qa] Ceilometer ERRORS in normal runs

2013-10-20 Thread Sean Dague
Dave Kranz has been building a system so that we can ensure that during 
a Tempest run services don't spew ERRORs in the logs. Eventually, we're 
going to gate on this, because there is nothing that Tempest does to the 
system that should cause any OpenStack service to ERROR or stack trace 
(Errors should actually be exceptional events that something is wrong 
with the system, not regular events).


Ceilometer is currently one of the largest offenders in dumping ERRORs 
in the gate - 
http://logs.openstack.org/68/52768/1/check/check-tempest-devstack-vm-full/76f83a4/console.html#_2013-10-19_14_51_51_271 
(that item isn't in our whitelist yet, so you'll see a lot of it at the 
end of every run)


and 
http://logs.openstack.org/68/52768/1/check/check-tempest-devstack-vm-full/76f83a4/logs/screen-ceilometer-collector.txt.gz?level=TRACE 
for full details


This seems like something is wrong in the integration, and would be 
really helpful if we could get ceilometer eyes on this one to put ceilo 
into a non erroring state.


-Sean

--
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev