On 5/9/19 9:13 PM, William Brown wrote:
Hi all,
So I think it's time for me to write some logging code to improve the
situation. Relevant links before we start:
https://pagure.io/389-ds-base/issue/49415
http://www.port389.org/docs/389ds/design/logging-performance-improvement.html
https://pagure.io/389-ds-base/issue/50350
https://pagure.io/389-ds-base/issue/50361
All of these links touch on issues around logging, and I think they all combine
to create three important points:
* The performance of logging should be improved
* The amount of details (fine grain) and information in logs should improve
* The structure of the log content should be improved to aid interaction
(possibly even machine parsable)
I will turn this into a design document, but there are some questions I would
like some input to help answer as part of this process to help set the
direction and tasks to achieve.
-- Should our logs as they exist today, continue to exist?
I think that my view on this is "no". I think if we make something better, we
have little need to continue to support our legacy interfaces. Of course, this would be a
large change and it may not sit comfortably with people.
A large part of this thinking is that the "new" log interface I want to add is
focused on *operations* rather than auditing accesses or changes, or over looking at
errors. The information of both the current access/audit/error would largely be melded
into a single operation log, and then with tools like logconv, we
could parse and extract information that would behave the same way as
access/error/audit.
At the same time - I can see how people *may* want a "realtime" audit of operations as
they occur (IE access log), but this still today is already limited by having to "wait"
for operations to complete.
In a crash scenario, we would be able to still view the logs that are queued,
so I think there are not so many concerns about losing information in these
cases (in fact we'd probably have more).
-- What should the operation log look like?
I think it should be structured, and should be whole-units of information,
related to a single operation. IE only at the conclusion of the operation is it
logged (thus the async!). It should support arbitrary, nested timers, and would
*not* support log levels - it's a detailed log of the process each query goes
through.
An example could be something like:
[timestamp] - [conn=id op=id] - start operation
[timestamp] - [conn=id op=id] - start time = time ...
[timestamp] - [conn=id op=id] - started internal search '(some=filter)'
[timestamp] - [conn=id op=id parentop=id] - start nested operation
[timestamp] - [conn=id op=id parentop=id] - start time = time ...
...
[timestamp] - [conn=id op=id parentop=id] - end time = time...
[timestamp] - [conn=id op=id parentop=id] - duration = diff end - start
[timestamp] - [conn=id op=id parentop=id] - end nested operation - result -> ...
[timestamp] - [conn=id op=id] - ended internal search '(some=filter)'
...
[timestamp] - [conn=id op=id] - end time = time
[timestamp] - [conn=id op=id] - duration = diff end - start
Due to the structured - blocked nature, there would be no interleaving of
operation messages. therefor the log would appear as:
[timestamp] - [conn=00 op=00] - start operation
[timestamp] - [conn=00 op=00] - start time = time ...
[timestamp] - [conn=00 op=00] - started internal search '(some=filter)'
[timestamp] - [conn=00 op=00 parentop=01] - start nested operation
[timestamp] - [conn=00 op=00 parentop=01] - start time = time ...
...
[timestamp] - [conn=00 op=00 parentop=01] - end time = time...
[timestamp] - [conn=00 op=00 parentop=01] - duration = diff end - start
[timestamp] - [conn=00 op=00 parentop=01] - end nested operation - result -> ...
[timestamp] - [conn=00 op=00] - ended internal search '(some=filter)'
...
[timestamp] - [conn=00 op=00] - end time = time
[timestamp] - [conn=00 op=00] - duration = diff end - start
[timestamp] - [conn=22 op=00] - start operation
[timestamp] - [conn=22 op=00] - start time = time ...
[timestamp] - [conn=22 op=00] - started internal search '(some=filter)'
[timestamp] - [conn=22 op=00 parentop=01] - start nested operation
[timestamp] - [conn=22 op=00 parentop=01] - start time = time ...
...
[timestamp] - [conn=22 op=00 parentop=01] - end time = time...
[timestamp] - [conn=22 op=00 parentop=01] - duration = diff end - start
[timestamp] - [conn=22 op=00 parentop=01] - end nested operation - result -> ...
[timestamp] - [conn=22 op=00] - ended internal search '(some=filter)'
...
[timestamp] - [conn=22 op=00] - end time = time
[timestamp] - [conn=22 op=00] - duration = diff end - start
An alternate method for structuring could be a machine readable format like
json:
{
'timestamp': 'time',
'duration': ,
'bind': 'dn of who initiated operation',
'events': [
'debug': 'msg',
'internal_search': {
'timestamp': 'time',
'duration': ,