Re: [openstack-dev] [glance][all] Help with interpreting the log level guidelines

Sean Dague Tue, 16 Sep 2014 07:57:56 -0700

On 09/16/2014 10:16 AM, Kuvaja, Erno wrote:
>> -----Original Message-----
>> From: Sean Dague [mailto:[email protected]]
>> Sent: 16 September 2014 12:40
>> To: [email protected]
>> Subject: Re: [openstack-dev] [glance][all] Help with interpreting the log 
>> level
>> guidelines
>>
>> On 09/16/2014 06:44 AM, Kuvaja, Erno wrote:
>>>> -----Original Message-----
>>>> From: Flavio Percoco [mailto:[email protected]]
>>>> Sent: 16 September 2014 10:08
>>>> To: [email protected]
>>>> Subject: Re: [openstack-dev] [glance][all] Help with interpreting the
>>>> log level guidelines
>>>>
>>>> On 09/16/2014 01:10 AM, Clint Byrum wrote:
>>>>> Excerpts from Sean Dague's message of 2014-09-15 16:02:04 -0700:
>>>>>> On 09/15/2014 07:00 PM, Mark Washenberger wrote:
>>>>>>> Hi there logging experts,
>>>>>>>
>>>>>>> We've recently had a little disagreement in the glance team about
>>>>>>> the appropriate log levels for http requests that end up failing
>>>>>>> due to user errors. An example would be a request to get an image
>>>>>>> that does not exist, which results in a 404 Not Found request.
>>>>>>>
>>>>>>> On one hand, this event is an error, so DEBUG or INFO seem a
>>>>>>> little too low. On the other hand, this error doesn't generally
>>>>>>> require any kind of operator investigation or indicate any actual
>>>>>>> failure of the service, so perhaps it is excessive to log it at WARN or
>> ERROR.
>>>>>>>
>>>>>>> Please provide feedback to help us resolve this dispute if you
>>>>>>> feel you
>>>> can!
>>>>>>
>>>>>> My feeling is this is an INFO level. There is really nothing the
>>>>>> admin should care about here.
>>>>>
>>>>> Agree with Sean. INFO are useful for investigations. WARN and ERROR
>>>>> are cause for alarm.
>>>>
>>>> +1 this is what we do in Zaqar as well.
>>>>
>>>>
>>>> --
>>>> @flaper87
>>>> Flavio Percoco
>>>>
>>>
>>> I think the debate here does not only limit to 404s. By the logging 
>>> guidelines
>> INFO level messages should not contain any error related messages but
>> rather stuff like certain components starting/stopping, config info, etc.
>> WARN should not be anything that gets the ops pulled out of bed and so on.
>>>
>>> Also all information that would be in interest of ops should be logged
>> INFO+.
>>>
>>> Now if we are logging user errors as WARN that makes the environment
>> supportable even if the logging has been set as high as WARN cleaning the
>> output a lot (as INFO shouldn't contain anything out of order anyways).
>> Current situation is that logging at DEBUG level is the only option to get 
>> the
>> needed information to actually run the services and get the data needed to
>> support it as well. If we log user errors on INFO we get one step higher but
>> we still have all that clutter like every single request in the logs and if 
>> that's
>> the direction we want to go, we should revisit our logging guidelines as 
>> well.
>>>
>>> Thus my two euro cents goes towards WARN rather than debug and
>> definitely not INFO.
>>
>> Part of it is how often you expect things to happen as well. Remember
>> glanceclient opperates in the context of "other" processes. When it hits a 
>> 404
>> in Glance, it's not running in the glance context, it's running in the Nova
>> context. Which means it needs to think of itself in that context.
>>
>> In that context we got the exception back from Glance, we know the image
>> wasn't there. And we know whether or not that's a problem (glanceclient
>> actually has no idea if it's a problem or not, we might be checking to make
>> sure a thing isn't there, and success for us is the 404).
>>
>> So actually, I'm back to Jay on this, it should be DEBUG. Nova (or whoever 
>> the
>> caller is) can decide if the issue warents something larger than that.
>>
>> This is really the biggest issue with logging in the clients, people don't 
>> think
>> about the context that they are running in.
>>
>>      -Sean
>>
>> --
>> Sean Dague
>> http://dague.net
>>
> 
> Sean,
> 
> I'm not sure if we were specific enough here. Not talking about client but 
> the server logging. So how we should log events like client trying to change 
> protected properties, access non existing image, create duplicate image IDs, 
> etc.
> 
> So for example if Nova is trying to access image that does not exist should 
> we ignore it on Glance side or when the user tries to do something that does 
> not succeed. In my point of view it makes life much easier if we have 
> information where the request failed rather than just a wsgi return code or 
> having to run the system on DEBUG logging to get that information.

Glance client throws an ERROR on 404 from Glance server -
http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-full/90cb640/logs/screen-n-api.txt.gz?level=ERROR

Glance server does not -
http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-full/90cb640/logs/screen-g-api.txt.gz?level=ERROR

Which is why I assumed this was where the conversation started.

On the server side this should be an INFO with 404 recorded as the wsgi
request.

On the client this should be debug.

Just to get a look at how completely confusing the current scenario is,
lets dive through the logs about what should be changed.

Inside glance you also need to address client vs. server. For instance -
http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-15_19_42_25_137
is incredibly confusing that it's an INFO, because that's a client call.
The next line is the actual wsgi request you care about.

swiftclient is equally confusing with -
http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-15_19_42_27_206

It should not be logging an ERROR there.

Also whatever keeps dumping this:

127.0.0.1 - - [15/Sep/2014 19:42:38] code 400, message Bad request
syntax ('0')
127.0.0.1 - - [15/Sep/2014 19:42:38] "0" 400 -

Needs to be addressed.

urllib3.connectionpool needs to be set to WARN level cutoff, the INFO
messages cloud everything.

keystone client shouldn't log at info -
http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-15_19_47_27_833

Knowing on every keystone client request which token version is being
used is not INFO material.

urllib3.connectionpool logging issue again -
http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-15_19_47_52_000

And if this WARNING isn't something that's addressable, it should be
killed off -
http://logs.openstack.org/81/120781/4/check/check-tempest-dsvm-full/90cb640/logs/screen-g-api.txt.gz?level=INFO#_2014-09-15_19_50_59_792

-Sean

--
Sean Dague
http://dague.net

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [glance][all] Help with interpreting the log level guidelines

Reply via email to