Re: [openstack-dev] RFC - Icehouse logging harmonization

Joe Gordon Thu, 24 Oct 2013 04:24:10 -0700

I think harmonizing the log files is a great idea, when working on
elastic-recheck I spent a lot of time staring at log files and cursing at
how bad and non-uniform they are.  I can only imagine what cloud operators
must think.


In addition to harmonizing the log levels, and makings sure we don't have
scary looking (stacktrace etc) logs during a normal tempest run I think we
should:

* Make sure that all projects use the same logging format and use
request-ids. I have already filed bugs for neutron and ceilometer on this (
https://bugs.launchpad.net/neutron/+bug/1239923
https://bugs.launchpad.net/ceilometer/+bug/1244182) and I have a hunch
other projects may not use these either.
* Have better default log levels for dependencies, for example when debug
logging is enabled for nova, I don't think we really need debug level logs
on for amqp, although perhaps I am wrong.


On Wed, Oct 23, 2013 at 8:55 PM, Sean Dague <s...@dague.net> wrote:

> On 10/23/2013 03:35 PM, Robert Collins wrote:
>
>> On 24 October 2013 08:28, John Griffith <john.griff...@solidfire.com>
>> wrote:
>>
>>> So I touched on this a bit in my earlier post but want to reiterate here
>>> and
>>> maybe clarify a bit.  I agree that cleaning up and standardizing the
>>> logs is
>>> a good thing, and particularly removing unhandled exception messages
>>> would
>>> be good.  What concerns me however is the approach being taken here of
>>> saying things like "Error level messages are banned from Tempest runs".
>>>
>>> The case I mentioned earlier of the negative test is a perfect example.
>>> There's no way for Cinder (or any other service) to know the difference
>>> between the end user specifying/requesting a non-existent volume and a
>>> valid
>>> volume being requested that for some reason can't be found.  I'm not
>>> quite
>>> sure how you place a definitive rule like "no error messages in logs"
>>> unless
>>> you make your tests such that you never run negative tests?
>>>
>>
>> Let me check that I understand: you want to check that when a user
>> asks for a volume that doesn't exist, they don't get it, *and* that
>> the reason they didn't get it was due to Cinder detecting it's
>> missing, not due to e.g. cinder throwing an error and returning 500 ?
>>
>> If so, that seems pretty straight forward; a) check the error that is
>> reported (it should be a 404 and contain an explanation which we can
>> check) and b) check the logs to see that nothing was logged (because a
>> server fault would be logged).
>>
>>  There are other cases in cinder as well that I'm concerned about.  One
>>> example is iscsi target creation, there are a number of scenarios where
>>> this
>>> can fail under certain conditions.  In most of these cases we now have
>>> retry
>>> mechanisms or alternate implementations to complete the task.  The fact
>>> is
>>> however that a call somewhere in the system failed, this should be
>>> something
>>> in my opinion that stands out in the logs.  Maybe this particular case
>>> would
>>> be well suited to being a warning other than an error, and that's fine.
>>>  My
>>> point however though is that I think some thought needs to go into this
>>> before making blanketing rules and especially gating criteria that says
>>> "no
>>> error messages in logs".
>>>
>>
> Absolutely agreed. That's why I wanted to kick off this discussion and am
> thinking about how we get to agreement by Icehouse (giving this lots of
> time to bake and getting different perspectives in here).
>
> On the short term of failing jobs in tempest because they've got errors in
> the logs, we've got a whole white list mechanism right now for "acceptable
> errors". Over time I'd love to shrink that to 0. But that's going to be a
> collaboration between the QA team and the specific core projects to make
> sure that's the right call in each case. Who knows, maybe there are
> generally agreed to ERROR conditions that we trigger, but we'll figure that
> out overtime.
>
> I think the iscsi example is a good case for WARNING, which is the same
> level we use when we fail to schedule a resource (compute / volume).
> Especially because we try to recover now. If we fail to recover, ERROR is
> probably called for. But if we actually failed to alocate a volume, we'd
> end up failing the tests anyways, which means the ERROR in the log wouldn't
> be a problem in and of itself.
>
>
>  I agree thought and care is needed. As a deployer my concern is that
>> the only time ERROR is logged in the logs is when something is wrong
>> with the infrastructure (rather than a user asking for something
>> stupid). I think my concern and yours can both be handled at the same
>> time.
>>
>
> Right, and I think this is the perspective that I'm coming from. Our logs
> (at INFO and up) are UX to our cloud admins.
>
> We should be pretty sure that we know something is a problem if we tag it
> as an ERROR, or CRITICAL. Because that's likely to be something that
> negatively impacts someones day.
>
> If we aren't completely sure your cloud is on fire, but we're pretty sure
> something is odd, WARNING is appropriate.
>
> If it's no good, but we have no way to test if it's a problem, it's just
> INFO. I really think the "not found" case falls more into standard INFO.
>
> Again, more concrete instances like the iscsi case, are probably the most
> helpful. I think in the abstract this problem is too hard to solve, but
> with examples, we can probably come to some concensus.
>
>
>         -Sean
>
> --
> Sean Dague
> http://dague.net
>
> ______________________________**_________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.**org <OpenStack-dev@lists.openstack.org>
> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-dev<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] RFC - Icehouse logging harmonization

Reply via email to