[389-devel] Re: Logging future direction and ideas.

2019-05-22 Thread William Brown


> On 22 May 2019, at 22:36, Ludwig Krispenz  wrote:
> 
> 
> On 05/17/2019 12:44 AM, William Brown wrote:
> 
> I think this would be a "final goal", so to formalise the stages:
> 
> * Add build tooling and a simple (dummy) log thread as a "getting started". 
> Supplement with documentation on wiki.
> * Fill-in the log thread to support an "operation log", and add basic 
> operation log hooks in the server.
> * Fill in more operation log points in the server to build detail
> * change slapi_log_err to log to the new rust thread, continue to generate 
> error file
> * change slapi_log_audit to log to the new rust thread, continue to generate 
> audit file
> * change slapi_log_access to log to the new rust thread, continue to generate 
> access file.
> * remove former logging code
> 
>>> I wonder if we really could have one api  eg slapi_log_* and different 
>>> implementations, keep the current, implement a new one and allow to chose
>> I don't think I understand this comment, could you explain a bit more what 
>> you have in mind?
> what I wanted to say is that right now for logging we have a split in "what" 
> and "how", a function wants to log  something calls slapi_log_* and provides 
> the loglevel, type (_err, _access), and information (formatstring and 
> params). It does not know or care about log buffering, log rotation, they 
> could even all go to the same file, if it is a dedicated thread or 
> 
> If we want to change logging I would like to keep this separation, and if we 
> want to concentrate on performance we should first address the "how" part, 
> doin all together might be too big a task and too much work. And we would not 
> have to throw away the current impelmentation, it could be configured "how" 
> slapi_log _* perform theit task.

Sounds good. I think (?) that we agree on the design and the approach to coding 
this then, including the configuring the "how". Perhaps to be more specific, 
you'd amend the goal as: 

> * Add build tooling and a simple (dummy) log thread as a "getting started". 
> Supplement with documentation on wiki.
> * Fill-in the log thread to support an "operation log", and add basic 
> operation log hooks in the server.
> * Fill in more operation log points in the server to build detail
> * change slapi_log_err to log to the new rust thread, continue to generate 
> error file
> * change slapi_log_audit to log to the new rust thread, continue to generate 
> audit file
> * change slapi_log_access to log to the new rust thread, continue to generate 
> access file.

^ These three would become:

* Add a configuration item for async or legacy log processing for the 
slapi_log_* types, default to "legacy".
* Once more confident, default to "async" for slapi_log_* to use the new rust 
thread

> * remove former logging code


Is that what you had in mind?


> ___
> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
___
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org


[389-devel] Re: Logging future direction and ideas.

2019-05-22 Thread Ludwig Krispenz


On 05/17/2019 12:44 AM, William Brown wrote:

I think this would be a "final goal", so to formalise the stages:

* Add build tooling and a simple (dummy) log thread as a "getting started". 
Supplement with documentation on wiki.
* Fill-in the log thread to support an "operation log", and add basic operation 
log hooks in the server.
* Fill in more operation log points in the server to build detail
* change slapi_log_err to log to the new rust thread, continue to generate 
error file
* change slapi_log_audit to log to the new rust thread, continue to generate 
audit file
* change slapi_log_access to log to the new rust thread, continue to generate 
access file.
* remove former logging code


I wonder if we really could have one api  eg slapi_log_* and different 
implementations, keep the current, implement a new one and allow to chose

I don't think I understand this comment, could you explain a bit more what you 
have in mind?
what I wanted to say is that right now for logging we have a split in 
"what" and "how", a function wants to log  something calls slapi_log_* 
and provides the loglevel, type (_err, _access), and information 
(formatstring and params). It does not know or care about log buffering, 
log rotation, they could even all go to the same file, if it is a 
dedicated thread or 


If we want to change logging I would like to keep this separation, and 
if we want to concentrate on performance we should first address the 
"how" part, doin all together might be too big a task and too much work. 
And we would not have to throw away the current impelmentation, it could 
be configured "how" slapi_log _* perform theit task.

___
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org


[389-devel] Re: Logging future direction and ideas.

2019-05-16 Thread William Brown


> On 17 May 2019, at 00:16, Ludwig Krispenz  wrote:
> 
> 
> On 05/16/2019 02:56 AM, William Brown wrote:
>> Replying to both Simon and Ludwig in the one mail ...
>> 
>> To just comment to one point from Ludwig:
>> 
>>> This may sound too negative, I agree that logging deserves attention and 
>>> improvement, but we need to agree on what we want to achieve.
>> I don't think this is negative, at all and I think it's exactly the kind of 
>> feedback I wanted. :) So thank you!
>> 
>> 
>>> Hi William,
>>> I am more than interested!
>>> 
>>> I'd like to learn it one day anyway.
>>> So if there will be no objections and we'll go forward now,
>>> I think, it is important to agree on some points of decreasing a bus factor:
>>> 
>>> - As you said, it should have very detailed comments on all of the
>>>  language features and technics you'll use.
>>> - I think it will be nice to have an additional documentation which will
>>>  describe the basic setup for the development you use. All the toold you
>>>  need to develop, compile, test and debug.
>>> - Also, some nice links for the basic tutorials on Rust types, concepts, 
>>> etc.
>>> - I think, we should have detailed unit tests. It will help to
>>>  understand the code better and we will have less bugs (hopefully).
>> So I'm writing a document for the wiki to cover this *and* rust supports 
>> tests as part of the system - that will be all be included. Is that 
>> acceptable?
>> 
>>> And the final and big point I wanted to mention:
>>> 
>>> - We should be prepared for a slow review process because we have quite 
>>> limited
>>>  recources in the team and a lot of work should be done (WebUI still has to 
>>> be refactored to React,
>>>  and it is only a small piece of the workload).
>>>  Also, I think, it makes sense to have the smallest Rust PRs that can be 
>>> put together as an independent unit.
>> That's totally reasonable. I'll try to keep it to small parts and will build 
>> up as we go. See below for ideas on how I'll work toward this.
>> 
>>> But everything is just my opinion and I don't know what others think and
>>> if everyone is prepared to join it. I am feeling excited though :)
>>> 
>>> Thanks,
>>> Simon
>>> 
>>> P.S. check the contribution guide please. Espesially a part about
>>> 'rebase-force-push'. I think it is nice not to force push during
>>> the review process (and rebase-squash only after you got an ACK).
>> Yep, I saw this change. I'll keep it in mind :)
>> 
>> 
>>> On 15 May 2019, at 19:37, Ludwig  wrote:
>>> 
>>> Hi William,
>>> 
>>> this is a good starting point to discuss and decide how to move forward 
>>> with logging (although it looks like you are already some steps ahead).
>>> 
>>> I have no strong ties to existing logging format and if we do it in rust or 
>>> not I don't care, but I do not want to use existing functionality for the 
>>> sake af a new unified approach.
>>> 
>>> First we need to define what is the purpose and usage of our logs, what do 
>>> we want to keep or extend. I see these main area:
>>> 
>>> - statistics
>>> 
>>> - auditing (authentications, modifications)
>>> 
>>> . debugging (for my usage the most important aspect)
>>> 
>>> Next: what are the real problems
>>> 
>>> - performance: yes, it would be nice to have less logging impact on 
>>> performance, but I haven't seen real complaints and doing it differently 
>>> does not guarantee better performance, we have debug levels (like tracing) 
>>> which are not usable because they slow down everything, I do not think this 
>>> will be resolved by just replacing the library
>>> 
>>> - information:  we continue to add information to the logs, and there are 
>>> still open requests
>>> 
>>> 
>>> So: If I look at your suggestions I do like some and have concerns with 
>>> others, and I am not sure if the priorities are right.
>>> 
>>> You list a couple of tickets related to logging, but forgot others, eg:
>>> 
>>> https://pagure.io/389-ds-base/issue/47822 - improve logging of ACL decisions
>>> 
>>> https://pagure.io/389-ds-base/issue/48680 - enable logging for 3rd party 
>>> libs
>>> 
>>> https://pagure.io/389-ds-base/issue/50002 - improve password policy logging
>>> 
>>> https://pagure.io/389-ds-base/issue/50187 - improve replication logging
>> These are all things that you you are correct, that I missed.
>> 
>> So I want to focus on your point about what we want from logs, IE stats, 
>> auditing, debugging. I think this is a great summary of what we want, but to 
>> make it more detailed:
>> 
>> * Stats about operations from start to finish
>> * Auditing of client operations
>> * Debugging of client operations
>> * Debugging of internal server machinery and state
>> * Reporting of errors outside of the normal operation system.
>> 
>> Today we current have:
>> 
>> access - auditing of operations at a highlevel
>> audit - auditing of modifications and writes
>> error - debugging of internal server machinery (we tend not to enable the 
>> levels of debugging 

[389-devel] Re: Logging future direction and ideas.

2019-05-16 Thread Ludwig Krispenz


On 05/16/2019 02:56 AM, William Brown wrote:

Replying to both Simon and Ludwig in the one mail ...

To just comment to one point from Ludwig:


This may sound too negative, I agree that logging deserves attention and 
improvement, but we need to agree on what we want to achieve.

I don't think this is negative, at all and I think it's exactly the kind of 
feedback I wanted. :) So thank you!



Hi William,
I am more than interested!

I'd like to learn it one day anyway.
So if there will be no objections and we'll go forward now,
I think, it is important to agree on some points of decreasing a bus factor:

- As you said, it should have very detailed comments on all of the
  language features and technics you'll use.
- I think it will be nice to have an additional documentation which will
  describe the basic setup for the development you use. All the toold you
  need to develop, compile, test and debug.
- Also, some nice links for the basic tutorials on Rust types, concepts, etc.
- I think, we should have detailed unit tests. It will help to
  understand the code better and we will have less bugs (hopefully).

So I'm writing a document for the wiki to cover this *and* rust supports tests 
as part of the system - that will be all be included. Is that acceptable?


And the final and big point I wanted to mention:

- We should be prepared for a slow review process because we have quite limited
  recources in the team and a lot of work should be done (WebUI still has to be 
refactored to React,
  and it is only a small piece of the workload).
  Also, I think, it makes sense to have the smallest Rust PRs that can be put 
together as an independent unit.

That's totally reasonable. I'll try to keep it to small parts and will build up 
as we go. See below for ideas on how I'll work toward this.


But everything is just my opinion and I don't know what others think and
if everyone is prepared to join it. I am feeling excited though :)

Thanks,
Simon

P.S. check the contribution guide please. Espesially a part about
'rebase-force-push'. I think it is nice not to force push during
the review process (and rebase-squash only after you got an ACK).

Yep, I saw this change. I'll keep it in mind :)



On 15 May 2019, at 19:37, Ludwig  wrote:

Hi William,

this is a good starting point to discuss and decide how to move forward with 
logging (although it looks like you are already some steps ahead).

I have no strong ties to existing logging format and if we do it in rust or not 
I don't care, but I do not want to use existing functionality for the sake af a 
new unified approach.

First we need to define what is the purpose and usage of our logs, what do we 
want to keep or extend. I see these main area:

- statistics

- auditing (authentications, modifications)

. debugging (for my usage the most important aspect)

Next: what are the real problems

- performance: yes, it would be nice to have less logging impact on 
performance, but I haven't seen real complaints and doing it differently does 
not guarantee better performance, we have debug levels (like tracing) which are 
not usable because they slow down everything, I do not think this will be 
resolved by just replacing the library

- information:  we continue to add information to the logs, and there are still 
open requests


So: If I look at your suggestions I do like some and have concerns with others, 
and I am not sure if the priorities are right.

You list a couple of tickets related to logging, but forgot others, eg:

https://pagure.io/389-ds-base/issue/47822 - improve logging of ACL decisions

https://pagure.io/389-ds-base/issue/48680 - enable logging for 3rd party libs

https://pagure.io/389-ds-base/issue/50002 - improve password policy logging

https://pagure.io/389-ds-base/issue/50187 - improve replication logging

These are all things that you you are correct, that I missed.

So I want to focus on your point about what we want from logs, IE stats, 
auditing, debugging. I think this is a great summary of what we want, but to 
make it more detailed:

* Stats about operations from start to finish
* Auditing of client operations
* Debugging of client operations
* Debugging of internal server machinery and state
* Reporting of errors outside of the normal operation system.

Today we current have:

access - auditing of operations at a highlevel
audit - auditing of modifications and writes
error - debugging of internal server machinery (we tend not to enable the 
levels of debugging operations ...)

So I think there is room to improve.

Now I think that performance so far is only a barrier in terms of preventing us from 
adding more detail, because it slows down operations. This is why it's pretty high on the 
list of "to fix".



these requests for improvements of logging exist for quite some time, they were accepted as useful 
and good to have, but we didn't have the capacity to work on them. The work is tedious, going thru 
ACL or replication code and 

[389-devel] Re: Logging future direction and ideas.

2019-05-15 Thread William Brown
Replying to both Simon and Ludwig in the one mail ...

To just comment to one point from Ludwig:

> This may sound too negative, I agree that logging deserves attention and 
> improvement, but we need to agree on what we want to achieve.

I don't think this is negative, at all and I think it's exactly the kind of 
feedback I wanted. :) So thank you! 


> Hi William,
> I am more than interested!
> 
> I'd like to learn it one day anyway.
> So if there will be no objections and we'll go forward now,
> I think, it is important to agree on some points of decreasing a bus factor:
> 
> - As you said, it should have very detailed comments on all of the
>  language features and technics you'll use.
> - I think it will be nice to have an additional documentation which will
>  describe the basic setup for the development you use. All the toold you
>  need to develop, compile, test and debug.
> - Also, some nice links for the basic tutorials on Rust types, concepts, etc.
> - I think, we should have detailed unit tests. It will help to
>  understand the code better and we will have less bugs (hopefully).

So I'm writing a document for the wiki to cover this *and* rust supports tests 
as part of the system - that will be all be included. Is that acceptable?

> 
> And the final and big point I wanted to mention:
> 
> - We should be prepared for a slow review process because we have quite 
> limited
>  recources in the team and a lot of work should be done (WebUI still has to 
> be refactored to React,
>  and it is only a small piece of the workload).
>  Also, I think, it makes sense to have the smallest Rust PRs that can be put 
> together as an independent unit.

That's totally reasonable. I'll try to keep it to small parts and will build up 
as we go. See below for ideas on how I'll work toward this.

> 
> But everything is just my opinion and I don't know what others think and
> if everyone is prepared to join it. I am feeling excited though :)
> 
> Thanks,
> Simon
> 
> P.S. check the contribution guide please. Espesially a part about
> 'rebase-force-push'. I think it is nice not to force push during
> the review process (and rebase-squash only after you got an ACK).

Yep, I saw this change. I'll keep it in mind :) 


> On 15 May 2019, at 19:37, Ludwig  wrote:
> 
> Hi William,
> 
> this is a good starting point to discuss and decide how to move forward with 
> logging (although it looks like you are already some steps ahead).
> 
> I have no strong ties to existing logging format and if we do it in rust or 
> not I don't care, but I do not want to use existing functionality for the 
> sake af a new unified approach.
> 
> First we need to define what is the purpose and usage of our logs, what do we 
> want to keep or extend. I see these main area:
> 
> - statistics
> 
> - auditing (authentications, modifications)
> 
> . debugging (for my usage the most important aspect)
> 
> Next: what are the real problems
> 
> - performance: yes, it would be nice to have less logging impact on 
> performance, but I haven't seen real complaints and doing it differently does 
> not guarantee better performance, we have debug levels (like tracing) which 
> are not usable because they slow down everything, I do not think this will be 
> resolved by just replacing the library
> 
> - information:  we continue to add information to the logs, and there are 
> still open requests
> 
> 
> So: If I look at your suggestions I do like some and have concerns with 
> others, and I am not sure if the priorities are right.
> 
> You list a couple of tickets related to logging, but forgot others, eg:
> 
> https://pagure.io/389-ds-base/issue/47822 - improve logging of ACL decisions
> 
> https://pagure.io/389-ds-base/issue/48680 - enable logging for 3rd party libs
> 
> https://pagure.io/389-ds-base/issue/50002 - improve password policy logging
> 
> https://pagure.io/389-ds-base/issue/50187 - improve replication logging

These are all things that you you are correct, that I missed.

So I want to focus on your point about what we want from logs, IE stats, 
auditing, debugging. I think this is a great summary of what we want, but to 
make it more detailed:

* Stats about operations from start to finish
* Auditing of client operations
* Debugging of client operations
* Debugging of internal server machinery and state
* Reporting of errors outside of the normal operation system.

Today we current have:

access - auditing of operations at a highlevel
audit - auditing of modifications and writes
error - debugging of internal server machinery (we tend not to enable the 
levels of debugging operations ...)

So I think there is room to improve. 

Now I think that performance so far is only a barrier in terms of preventing us 
from adding more detail, because it slows down operations. This is why it's 
pretty high on the list of "to fix". 

> 
> 
> these requests for improvements of logging exist for quite some time, they 
> were accepted as useful and good to 

[389-devel] Re: Logging future direction and ideas.

2019-05-15 Thread Ludwig

Hi William,

this is a good starting point to discuss and decide how to move forward 
with logging (although it looks like you are already some steps ahead).


I have no strong ties to existing logging format and if we do it in rust 
or not I don't care, but I do not want to use existing functionality for 
the sake af a new unified approach.


First we need to define what is the purpose and usage of our logs, what 
do we want to keep or extend. I see these main area:


- statistics

- auditing (authentications, modifications)

. debugging (for my usage the most important aspect)

Next: what are the real problems

- performance: yes, it would be nice to have less logging impact on 
performance, but I haven't seen real complaints and doing it differently 
does not guarantee better performance, we have debug levels (like 
tracing) which are not usable because they slow down everything, I do 
not think this will be resolved by just replacing the library


- information:  we continue to add information to the logs, and there 
are still open requests



So: If I look at your suggestions I do like some and have concerns with 
others, and I am not sure if the priorities are right.


You list a couple of tickets related to logging, but forgot others, eg:

https://pagure.io/389-ds-base/issue/47822 - improve logging of ACL decisions

https://pagure.io/389-ds-base/issue/48680 - enable logging for 3rd party 
libs


https://pagure.io/389-ds-base/issue/50002 - improve password policy logging

https://pagure.io/389-ds-base/issue/50187 - improve replication logging


these requests for improvements of logging exist for quite some time, 
they were accepted as useful and good to have, but we didn't have the 
capacity to work on them. The work is tedious, going thru ACL or 
replication code and decide what and how to log better requires good 
knowledge of the areas and is not fancy - but I wonder if the effort 
wouldn't be spent better to improve the "what" and not the "how".



Now: some comments on your suggestions.

- perf: I think everything we can do to reduce locking in logging is a 
good thing, if we do logging more asynchronous it is ok for me as long 
as we can see the "real time" of the event and get the logs ordered. If 
replacing the "log" component by a rust implemention helps it is good. 
We could in a first step keep the slapi_log_* api and only do it more 
efficient


- merging of logs: I do not like the idea to have everything in one log 
file, if you say we can provide tools to split, we can also provide 
tools to merge. The logs have a different format and for good reasons, 
eg the audit log is ldif like and can easily be used to replay changes, 
why force it into another format.


Also, in my opinion, the different logs have a different life time. You 
might want to keep all audit logs, keep the error log for the main 
events for some longer time than the fast growing access log, defining 
log rotation based on the log type is a benfit in my opinion


- operation orientation: there are logging events which are independent 
of operations, especially in replicatio. I would not want to 
artificially enforce a "operation context" eg for changelog processing, 
tombstone purging,.


- format: for machines it might be good that logs are machine readable, 
but I did spend a lot of times reading logs and lokkinng for patterns 
and clues, I think as soon as they are in json I will stop doing this.



This may sound too negative, I agree that logging deserves attention and 
improvement, but we need to agree on what we want to achieve.



Best regards,

Ludwig


On 05/10/2019 05:13 AM, William Brown wrote:

Hi all,

So I think it's time for me to write some logging code to improve the 
situation. Relevant links before we start:

https://pagure.io/389-ds-base/issue/49415
http://www.port389.org/docs/389ds/design/logging-performance-improvement.html
https://pagure.io/389-ds-base/issue/50350
https://pagure.io/389-ds-base/issue/50361


All of these links touch on issues around logging, and I think they all combine 
to create three important points:

* The performance of logging should be improved
* The amount of details (fine grain) and information in logs should improve
* The structure of the log content should be improved to aid interaction 
(possibly even machine parsable)

I will turn this into a design document, but there are some questions I would 
like some input to help answer as part of this process to help set the 
direction and tasks to achieve.

-- Should our logs as they exist today, continue to exist?

I think that my view on this is "no". I think if we make something better, we 
have little need to continue to support our legacy interfaces. Of course, this would be a 
large change and it may not sit comfortably with people.

A large part of this thinking is that the "new" log interface I want to add is 
focused on *operations* rather than auditing accesses or changes, or over looking at 
errors. The 

[389-devel] Re: Logging future direction and ideas.

2019-05-12 Thread William Brown


> On 10 May 2019, at 23:59, Rich Megginson  wrote:
> 
> On 5/9/19 9:13 PM, William Brown wrote:
>> Hi all,
>> 
>> So I think it's time for me to write some logging code to improve the 
>> situation. Relevant links before we start:
>> 
>> https://pagure.io/389-ds-base/issue/49415
>> http://www.port389.org/docs/389ds/design/logging-performance-improvement.html
>> https://pagure.io/389-ds-base/issue/50350
>> https://pagure.io/389-ds-base/issue/50361
>> 
>> 
>> All of these links touch on issues around logging, and I think they all 
>> combine to create three important points:
>> 
>> * The performance of logging should be improved
>> * The amount of details (fine grain) and information in logs should improve
>> * The structure of the log content should be improved to aid interaction 
>> (possibly even machine parsable)
>> 
>> I will turn this into a design document, but there are some questions I 
>> would like some input to help answer as part of this process to help set the 
>> direction and tasks to achieve.
>> 
>> -- Should our logs as they exist today, continue to exist?
>> 
>> I think that my view on this is "no". I think if we make something better, 
>> we have little need to continue to support our legacy interfaces. Of course, 
>> this would be a large change and it may not sit comfortably with people.
>> 
>> A large part of this thinking is that the "new" log interface I want to add 
>> is focused on *operations* rather than auditing accesses or changes, or over 
>> looking at errors. The information of both the current access/audit/error 
>> would largely be melded into a single operation log, and then with tools 
>> like logconv, we
>> could parse and extract information that would behave the same way as 
>> access/error/audit.
>> 
>> At the same time - I can see how people *may* want a "realtime" audit of 
>> operations as they occur (IE access log), but this still today is already 
>> limited by having to "wait" for operations to complete.
>> 
>> In a crash scenario, we would be able to still view the logs that are 
>> queued, so I think there are not so many concerns about losing information 
>> in these cases (in fact we'd probably have more).
>> 
>> -- What should the operation log look like?
>> 
>> I think it should be structured, and should be whole-units of information, 
>> related to a single operation. IE only at the conclusion of the operation is 
>> it logged (thus the async!). It should support arbitrary, nested timers, and 
>> would *not* support log levels - it's a detailed log of the process each 
>> query goes through.
>> 
>> An example could be something like:
>> 
>> [timestamp] - [conn=id op=id] - start operation
>> [timestamp] - [conn=id op=id] - start time = time ...
>> [timestamp] - [conn=id op=id] - started internal search '(some=filter)'
>> [timestamp] - [conn=id op=id parentop=id] - start nested operation
>> [timestamp] - [conn=id op=id parentop=id] - start time = time ...
>> ...
>> [timestamp] - [conn=id op=id parentop=id] - end time = time...
>> [timestamp] - [conn=id op=id parentop=id] - duration = diff end - start
>> [timestamp] - [conn=id op=id parentop=id] - end nested operation - result -> 
>> ...
>> [timestamp] - [conn=id op=id] - ended internal search '(some=filter)'
>> ...
>> [timestamp] - [conn=id op=id] - end time = time
>> [timestamp] - [conn=id op=id] - duration = diff end - start
>> 
>> 
>> Due to the structured - blocked nature, there would be no interleaving of 
>> operation messages. therefor the log would appear as:
>> 
>> [timestamp] - [conn=00 op=00] - start operation
>> [timestamp] - [conn=00 op=00] - start time = time ...
>> [timestamp] - [conn=00 op=00] - started internal search '(some=filter)'
>> [timestamp] - [conn=00 op=00 parentop=01] - start nested operation
>> [timestamp] - [conn=00 op=00 parentop=01] - start time = time ...
>> ...
>> [timestamp] - [conn=00 op=00 parentop=01] - end time = time...
>> [timestamp] - [conn=00 op=00 parentop=01] - duration = diff end - start
>> [timestamp] - [conn=00 op=00 parentop=01] - end nested operation - result -> 
>> ...
>> [timestamp] - [conn=00 op=00] - ended internal search '(some=filter)'
>> ...
>> [timestamp] - [conn=00 op=00] - end time = time
>> [timestamp] - [conn=00 op=00] - duration = diff end - start
>> [timestamp] - [conn=22 op=00] - start operation
>> [timestamp] - [conn=22 op=00] - start time = time ...
>> [timestamp] - [conn=22 op=00] - started internal search '(some=filter)'
>> [timestamp] - [conn=22 op=00 parentop=01] - start nested operation
>> [timestamp] - [conn=22 op=00 parentop=01] - start time = time ...
>> ...
>> [timestamp] - [conn=22 op=00 parentop=01] - end time = time...
>> [timestamp] - [conn=22 op=00 parentop=01] - duration = diff end - start
>> [timestamp] - [conn=22 op=00 parentop=01] - end nested operation - result -> 
>> ...
>> [timestamp] - [conn=22 op=00] - ended internal search '(some=filter)'
>> ...
>> [timestamp] - [conn=22 op=00] - end time = time
>> [timestamp] - 

[389-devel] Re: Logging future direction and ideas.

2019-05-10 Thread Rich Megginson

On 5/9/19 9:13 PM, William Brown wrote:

Hi all,

So I think it's time for me to write some logging code to improve the 
situation. Relevant links before we start:

https://pagure.io/389-ds-base/issue/49415
http://www.port389.org/docs/389ds/design/logging-performance-improvement.html
https://pagure.io/389-ds-base/issue/50350
https://pagure.io/389-ds-base/issue/50361


All of these links touch on issues around logging, and I think they all combine 
to create three important points:

* The performance of logging should be improved
* The amount of details (fine grain) and information in logs should improve
* The structure of the log content should be improved to aid interaction 
(possibly even machine parsable)

I will turn this into a design document, but there are some questions I would 
like some input to help answer as part of this process to help set the 
direction and tasks to achieve.

-- Should our logs as they exist today, continue to exist?

I think that my view on this is "no". I think if we make something better, we 
have little need to continue to support our legacy interfaces. Of course, this would be a 
large change and it may not sit comfortably with people.

A large part of this thinking is that the "new" log interface I want to add is 
focused on *operations* rather than auditing accesses or changes, or over looking at 
errors. The information of both the current access/audit/error would largely be melded 
into a single operation log, and then with tools like logconv, we
could parse and extract information that would behave the same way as 
access/error/audit.

At the same time - I can see how people *may* want a "realtime" audit of operations as 
they occur (IE access log), but this still today is already limited by having to "wait" 
for operations to complete.

In a crash scenario, we would be able to still view the logs that are queued, 
so I think there are not so many concerns about losing information in these 
cases (in fact we'd probably have more).

-- What should the operation log look like?

I think it should be structured, and should be whole-units of information, 
related to a single operation. IE only at the conclusion of the operation is it 
logged (thus the async!). It should support arbitrary, nested timers, and would 
*not* support log levels - it's a detailed log of the process each query goes 
through.

An example could be something like:

[timestamp] - [conn=id op=id] - start operation
[timestamp] - [conn=id op=id] - start time = time ...
[timestamp] - [conn=id op=id] - started internal search '(some=filter)'
[timestamp] - [conn=id op=id parentop=id] - start nested operation
[timestamp] - [conn=id op=id parentop=id] - start time = time ...
...
[timestamp] - [conn=id op=id parentop=id] - end time = time...
[timestamp] - [conn=id op=id parentop=id] - duration = diff end - start
[timestamp] - [conn=id op=id parentop=id] - end nested operation - result -> ...
[timestamp] - [conn=id op=id] - ended internal search '(some=filter)'
...
[timestamp] - [conn=id op=id] - end time = time
[timestamp] - [conn=id op=id] - duration = diff end - start


Due to the structured - blocked nature, there would be no interleaving of 
operation messages. therefor the log would appear as:

[timestamp] - [conn=00 op=00] - start operation
[timestamp] - [conn=00 op=00] - start time = time ...
[timestamp] - [conn=00 op=00] - started internal search '(some=filter)'
[timestamp] - [conn=00 op=00 parentop=01] - start nested operation
[timestamp] - [conn=00 op=00 parentop=01] - start time = time ...
...
[timestamp] - [conn=00 op=00 parentop=01] - end time = time...
[timestamp] - [conn=00 op=00 parentop=01] - duration = diff end - start
[timestamp] - [conn=00 op=00 parentop=01] - end nested operation - result -> ...
[timestamp] - [conn=00 op=00] - ended internal search '(some=filter)'
...
[timestamp] - [conn=00 op=00] - end time = time
[timestamp] - [conn=00 op=00] - duration = diff end - start
[timestamp] - [conn=22 op=00] - start operation
[timestamp] - [conn=22 op=00] - start time = time ...
[timestamp] - [conn=22 op=00] - started internal search '(some=filter)'
[timestamp] - [conn=22 op=00 parentop=01] - start nested operation
[timestamp] - [conn=22 op=00 parentop=01] - start time = time ...
...
[timestamp] - [conn=22 op=00 parentop=01] - end time = time...
[timestamp] - [conn=22 op=00 parentop=01] - duration = diff end - start
[timestamp] - [conn=22 op=00 parentop=01] - end nested operation - result -> ...
[timestamp] - [conn=22 op=00] - ended internal search '(some=filter)'
...
[timestamp] - [conn=22 op=00] - end time = time
[timestamp] - [conn=22 op=00] - duration = diff end - start

An alternate method for structuring could be a machine readable format like 
json:

{
 'timestamp': 'time',
 'duration': ,
 'bind': 'dn of who initiated operation',
 'events': [
 'debug': 'msg',
 'internal_search': {
  'timestamp': 'time',
  'duration': ,