> On 10 May 2019, at 23:59, Rich Megginson wrote:
>
> On 5/9/19 9:13 PM, William Brown wrote:
>> Hi all,
>>
>> So I think it's time for me to write some logging code to improve the
>> situation. Relevant links before we start:
>>
>> https://pagure.io/389-ds-base/issue/49415
>> http://www.port389.org/docs/389ds/design/logging-performance-improvement.html
>> https://pagure.io/389-ds-base/issue/50350
>> https://pagure.io/389-ds-base/issue/50361
>>
>>
>> All of these links touch on issues around logging, and I think they all
>> combine to create three important points:
>>
>> * The performance of logging should be improved
>> * The amount of details (fine grain) and information in logs should improve
>> * The structure of the log content should be improved to aid interaction
>> (possibly even machine parsable)
>>
>> I will turn this into a design document, but there are some questions I
>> would like some input to help answer as part of this process to help set the
>> direction and tasks to achieve.
>>
>> -- Should our logs as they exist today, continue to exist?
>>
>> I think that my view on this is "no". I think if we make something better,
>> we have little need to continue to support our legacy interfaces. Of course,
>> this would be a large change and it may not sit comfortably with people.
>>
>> A large part of this thinking is that the "new" log interface I want to add
>> is focused on *operations* rather than auditing accesses or changes, or over
>> looking at errors. The information of both the current access/audit/error
>> would largely be melded into a single operation log, and then with tools
>> like logconv, we
>> could parse and extract information that would behave the same way as
>> access/error/audit.
>>
>> At the same time - I can see how people *may* want a "realtime" audit of
>> operations as they occur (IE access log), but this still today is already
>> limited by having to "wait" for operations to complete.
>>
>> In a crash scenario, we would be able to still view the logs that are
>> queued, so I think there are not so many concerns about losing information
>> in these cases (in fact we'd probably have more).
>>
>> -- What should the operation log look like?
>>
>> I think it should be structured, and should be whole-units of information,
>> related to a single operation. IE only at the conclusion of the operation is
>> it logged (thus the async!). It should support arbitrary, nested timers, and
>> would *not* support log levels - it's a detailed log of the process each
>> query goes through.
>>
>> An example could be something like:
>>
>> [timestamp] - [conn=id op=id] - start operation
>> [timestamp] - [conn=id op=id] - start time = time ...
>> [timestamp] - [conn=id op=id] - started internal search '(some=filter)'
>> [timestamp] - [conn=id op=id parentop=id] - start nested operation
>> [timestamp] - [conn=id op=id parentop=id] - start time = time ...
>> ...
>> [timestamp] - [conn=id op=id parentop=id] - end time = time...
>> [timestamp] - [conn=id op=id parentop=id] - duration = diff end - start
>> [timestamp] - [conn=id op=id parentop=id] - end nested operation - result ->
>> ...
>> [timestamp] - [conn=id op=id] - ended internal search '(some=filter)'
>> ...
>> [timestamp] - [conn=id op=id] - end time = time
>> [timestamp] - [conn=id op=id] - duration = diff end - start
>>
>>
>> Due to the structured - blocked nature, there would be no interleaving of
>> operation messages. therefor the log would appear as:
>>
>> [timestamp] - [conn=00 op=00] - start operation
>> [timestamp] - [conn=00 op=00] - start time = time ...
>> [timestamp] - [conn=00 op=00] - started internal search '(some=filter)'
>> [timestamp] - [conn=00 op=00 parentop=01] - start nested operation
>> [timestamp] - [conn=00 op=00 parentop=01] - start time = time ...
>> ...
>> [timestamp] - [conn=00 op=00 parentop=01] - end time = time...
>> [timestamp] - [conn=00 op=00 parentop=01] - duration = diff end - start
>> [timestamp] - [conn=00 op=00 parentop=01] - end nested operation - result ->
>> ...
>> [timestamp] - [conn=00 op=00] - ended internal search '(some=filter)'
>> ...
>> [timestamp] - [conn=00 op=00] - end time = time
>> [timestamp] - [conn=00 op=00] - duration = diff end - start
>> [timestamp] - [conn=22 op=00] - start operation
>> [timestamp] - [conn=22 op=00] - start time = time ...
>> [timestamp] - [conn=22 op=00] - started internal search '(some=filter)'
>> [timestamp] - [conn=22 op=00 parentop=01] - start nested operation
>> [timestamp] - [conn=22 op=00 parentop=01] - start time = time ...
>> ...
>> [timestamp] - [conn=22 op=00 parentop=01] - end time = time...
>> [timestamp] - [conn=22 op=00 parentop=01] - duration = diff end - start
>> [timestamp] - [conn=22 op=00 parentop=01] - end nested operation - result ->
>> ...
>> [timestamp] - [conn=22 op=00] - ended internal search '(some=filter)'
>> ...
>> [timestamp] - [conn=22 op=00] - end time = time
>> [timestamp] -