[389-devel] please review: Ticket 49926 - Add replication functionality to UI

2018-10-12 Thread Mark Reynolds

https://pagure.io/389-ds-base/pull-request/49976
___
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org


[389-devel] Re: Profiling discussion

2018-10-12 Thread William Brown
On Fri, 2018-10-12 at 21:39 -0400, Mark Reynolds wrote:
> On 10/10/18 6:57 PM, William Brown wrote:
> > On Wed, 2018-10-10 at 16:26 +0200, thierry bordaz wrote:
> > > Hi William,
> > > 
> > > Thanks for starting this discussion.
> > > Your email raise several aspects (How, for whom,..) and I think a
> > > way
> > > to
> > > start would be to write down what we want.
> > > A need is from a given workload to determine where we are
> > > spending
> > > time
> > > as a way to determine where to invest.
> > > An other need is to collect metrics at operation level.
> > 
> > Aren't these very similar? The time we invest is generally on
> > improving
> > a plugin or a small part of an operation, to make the operation as
> > a
> > whole faster.
> > 
> > So if we can report on an individual operation, we can write a tool
> > similar to log-conv.pl, but for performance metrics that displays
> > trends of operations that are not performaning well, then we can
> > find
> > examples of operations and why.
> > 
> > >   From the how perspective, we can rely on external tools
> > > (stap+scripts),
> > > or internal tool (like the plugin you described+scripts). Of
> > > course
> > > we
> > > can also do some enhancements inside DS (like adding probes) to
> > > help
> > > external tools. I have no strong opinion if an approach is better
> > > than
> > > the other but I think it also depends what you want to perform.
> > 
> > I think that it would be great if the tools we use internal to the
> > team, were accessible outside to admins of ds. That way when we get
> > reports for performance concerns, we have a standardised way of
> > looking
> > at this. It's going to mean our workflow is the same between
> > internal
> > development and profiling, as for external reports, and it will
> > force
> > us to have all the information we need in that one place.
> > 
> > I think as a coarse first metric internal event timings is probably
> > want we want first. After that we can continue to extend from
> > there?
> > 
> > As for the how, perhaps we can put something on the Operation
> > struct
> > for appending and logging events and turning those into metrics?
> > 
> > As mentioned you could use stap too with defined points for
> > tracing,
> > but that limits us to linux only?
> 
> Whatever tools we use doesn't really concern me - as long as we get
> good 
> data.  Somewhere we have old reports from stap pointing out lock 
> contention problem areas, but we should really rerun all of those
> tests 
> with the current code base. 

I think those tests did not properly check atomic usage nor different
lock types ... 

>  As for improving performance I think we 
> should first address the major issues found by the existing tools
> (stap 
> and friends) - specifically the lock contention problems (config, 
> connections, attr syntax checking, etc).  Once these are addressed
> then 
> we can start adding probes/internal structs to fine tune other
> aspects 
> of the server.

I think the information we have from current tools isn't complete, and
it doesn't help us when people give us reports of the server being
slow. We really need to invest in observability into performance, so
that long term we get better views into what exactly the issues are.
That's why I think we should look at this tooling/logging first. 

> 
> Improving performance will be the primary focus for 389-ds-base-
> 1.4.1, 
> and we should be able invest a good amount of time into this
> effort.  
> Getting nunc-stans stable falls into this category as well (it
> should 
> actually be addressed first).

There is a patch awaiting review for this topic ... :) 

> 
> Mark
> 
> > 
> > > best regards
> > > thierry
> > > 
> > > On 10/08/2018 12:37 PM, William Brown wrote:
> > > > Hi there,
> > > > 
> > > > In a ticket Thierry and I mentioned that we should have a quick
> > > > discussion about ideas for profiling and what we want it to
> > > > look
> > > > like and what we need. I think it’s important we improve our
> > > > observation into the server so that we can target improvements
> > > > correctly,
> > > > 
> > > > I think we should know:
> > > > 
> > > > * Who is the target audience to run our profiling tools?
> > > > * What kind of information we do want?
> > > > * Potential solution for the above.
> > > > 
> > > > With those in mind I think that Thierry suggested STAP scripts.
> > > > 
> > > > * Target audience - developers (us) and some “highly
> > > > experienced”
> > > > admins (STAP is not the easiest thing to run).
> > > > * Information - STAP would largely tell us timing and possibly
> > > > allows some variable/struct extraction. STAP does allow us to
> > > > look
> > > > at connection info too a bit easier.
> > > > 
> > > > I would suggest an “event” struct, and logging service
> > > > 
> > > > At the start of an operation we create an event struct. As we
> > > > enter
> > > > - exit a plugin we can append timing information, and the
> > > > plugin
> > > > 

[389-devel] Re: Profiling discussion

2018-10-12 Thread Mark Reynolds


On 10/12/18 9:52 PM, William Brown wrote:

On Fri, 2018-10-12 at 21:39 -0400, Mark Reynolds wrote:

On 10/10/18 6:57 PM, William Brown wrote:

On Wed, 2018-10-10 at 16:26 +0200, thierry bordaz wrote:

Hi William,

Thanks for starting this discussion.
Your email raise several aspects (How, for whom,..) and I think a
way
to
start would be to write down what we want.
A need is from a given workload to determine where we are
spending
time
as a way to determine where to invest.
An other need is to collect metrics at operation level.

Aren't these very similar? The time we invest is generally on
improving
a plugin or a small part of an operation, to make the operation as
a
whole faster.

So if we can report on an individual operation, we can write a tool
similar to log-conv.pl, but for performance metrics that displays
trends of operations that are not performaning well, then we can
find
examples of operations and why.


   From the how perspective, we can rely on external tools
(stap+scripts),
or internal tool (like the plugin you described+scripts). Of
course
we
can also do some enhancements inside DS (like adding probes) to
help
external tools. I have no strong opinion if an approach is better
than
the other but I think it also depends what you want to perform.

I think that it would be great if the tools we use internal to the
team, were accessible outside to admins of ds. That way when we get
reports for performance concerns, we have a standardised way of
looking
at this. It's going to mean our workflow is the same between
internal
development and profiling, as for external reports, and it will
force
us to have all the information we need in that one place.

I think as a coarse first metric internal event timings is probably
want we want first. After that we can continue to extend from
there?

As for the how, perhaps we can put something on the Operation
struct
for appending and logging events and turning those into metrics?

As mentioned you could use stap too with defined points for
tracing,
but that limits us to linux only?

Whatever tools we use doesn't really concern me - as long as we get
good
data.  Somewhere we have old reports from stap pointing out lock
contention problem areas, but we should really rerun all of those
tests
with the current code base.

I think those tests did not properly check atomic usage nor different
lock types ...

Not sure, haven't looked at stap or the older reports in some time.



  As for improving performance I think we
should first address the major issues found by the existing tools
(stap
and friends) - specifically the lock contention problems (config,
connections, attr syntax checking, etc).  Once these are addressed
then
we can start adding probes/internal structs to fine tune other
aspects
of the server.

I think the information we have from current tools isn't complete, and
it doesn't help us when people give us reports of the server being
slow. We really need to invest in observability into performance, so
that long term we get better views into what exactly the issues are.
That's why I think we should look at this tooling/logging first.
Yes perhaps, but the issues reported by these tools are still valid - 
although they might not be the main culprits as you are suggesting.



Improving performance will be the primary focus for 389-ds-base-
1.4.1,
and we should be able invest a good amount of time into this
effort.
Getting nunc-stans stable falls into this category as well (it
should
actually be addressed first).

There is a patch awaiting review for this topic ... :)
I know, I have asked for this patch to be tested internally, but that 
hasn't happened yet.  I will follow up on that next week!
  


Mark


best regards
thierry

On 10/08/2018 12:37 PM, William Brown wrote:

Hi there,

In a ticket Thierry and I mentioned that we should have a quick
discussion about ideas for profiling and what we want it to
look
like and what we need. I think it’s important we improve our
observation into the server so that we can target improvements
correctly,

I think we should know:

* Who is the target audience to run our profiling tools?
* What kind of information we do want?
* Potential solution for the above.

With those in mind I think that Thierry suggested STAP scripts.

* Target audience - developers (us) and some “highly
experienced”
admins (STAP is not the easiest thing to run).
* Information - STAP would largely tell us timing and possibly
allows some variable/struct extraction. STAP does allow us to
look
at connection info too a bit easier.

I would suggest an “event” struct, and logging service

At the start of an operation we create an event struct. As we
enter
- exit a plugin we can append timing information, and the
plugin
itself can add details (for example, backend could add idl
performance metrics or other). At the end of the operation, we
log
the event struct as a json blob to our access log associated to
the
conn/op.

* Target - anyone, it’s a log level. 

[389-devel] Re: Profiling discussion

2018-10-12 Thread William Brown
On Fri, 2018-10-12 at 21:39 -0400, Mark Reynolds wrote:
> On 10/10/18 6:57 PM, William Brown wrote:
> > On Wed, 2018-10-10 at 16:26 +0200, thierry bordaz wrote:
> > > Hi William,
> > > 
> > > Thanks for starting this discussion.
> > > Your email raise several aspects (How, for whom,..) and I think a
> > > way
> > > to
> > > start would be to write down what we want.
> > > A need is from a given workload to determine where we are
> > > spending
> > > time
> > > as a way to determine where to invest.
> > > An other need is to collect metrics at operation level.
> > 
> > Aren't these very similar? The time we invest is generally on
> > improving
> > a plugin or a small part of an operation, to make the operation as
> > a
> > whole faster.
> > 
> > So if we can report on an individual operation, we can write a tool
> > similar to log-conv.pl, but for performance metrics that displays
> > trends of operations that are not performaning well, then we can
> > find
> > examples of operations and why.
> > 
> > >   From the how perspective, we can rely on external tools
> > > (stap+scripts),
> > > or internal tool (like the plugin you described+scripts). Of
> > > course
> > > we
> > > can also do some enhancements inside DS (like adding probes) to
> > > help
> > > external tools. I have no strong opinion if an approach is better
> > > than
> > > the other but I think it also depends what you want to perform.
> > 
> > I think that it would be great if the tools we use internal to the
> > team, were accessible outside to admins of ds. That way when we get
> > reports for performance concerns, we have a standardised way of
> > looking
> > at this. It's going to mean our workflow is the same between
> > internal
> > development and profiling, as for external reports, and it will
> > force
> > us to have all the information we need in that one place.
> > 
> > I think as a coarse first metric internal event timings is probably
> > want we want first. After that we can continue to extend from
> > there?
> > 
> > As for the how, perhaps we can put something on the Operation
> > struct
> > for appending and logging events and turning those into metrics?
> > 
> > As mentioned you could use stap too with defined points for
> > tracing,
> > but that limits us to linux only?
> 
> Whatever tools we use doesn't really concern me - as long as we get
> good 
> data.  Somewhere we have old reports from stap pointing out lock 
> contention problem areas, but we should really rerun all of those
> tests 
> with the current code base.

I think those tests did not properly check atomic usage nor different
lock types ... 

>   As for improving performance I think we 
> should first address the major issues found by the existing tools
> (stap 
> and friends) - specifically the lock contention problems (config, 
> connections, attr syntax checking, etc).  Once these are addressed
> then 
> we can start adding probes/internal structs to fine tune other
> aspects 
> of the server.

I think the information we have from current tools isn't complete, and
it doesn't help us when people give us reports of the server being
slow. We really need to invest in observability into performance, so
that long term we get better views into what exactly the issues are.
That's why I think we should look at this tooling/logging first so that
our time is well spent when we make fixes.

> 
> Improving performance will be the primary focus for 389-ds-base-
> 1.4.1, 
> and we should be able invest a good amount of time into this
> effort.  
> Getting nunc-stans stable falls into this category as well (it
> should 
> actually be addressed first).

There is a patch awaiting review for this topic ... :) 

> 
> Mark
> 
> > 
> > > best regards
> > > thierry
> > > 
> > > On 10/08/2018 12:37 PM, William Brown wrote:
> > > > Hi there,
> > > > 
> > > > In a ticket Thierry and I mentioned that we should have a quick
> > > > discussion about ideas for profiling and what we want it to
> > > > look
> > > > like and what we need. I think it’s important we improve our
> > > > observation into the server so that we can target improvements
> > > > correctly,
> > > > 
> > > > I think we should know:
> > > > 
> > > > * Who is the target audience to run our profiling tools?
> > > > * What kind of information we do want?
> > > > * Potential solution for the above.
> > > > 
> > > > With those in mind I think that Thierry suggested STAP scripts.
> > > > 
> > > > * Target audience - developers (us) and some “highly
> > > > experienced”
> > > > admins (STAP is not the easiest thing to run).
> > > > * Information - STAP would largely tell us timing and possibly
> > > > allows some variable/struct extraction. STAP does allow us to
> > > > look
> > > > at connection info too a bit easier.
> > > > 
> > > > I would suggest an “event” struct, and logging service
> > > > 
> > > > At the start of an operation we create an event struct. As we
> > > > enter
> > > > - exit a plugin we can append 

[389-devel] Re: Profiling discussion

2018-10-12 Thread Mark Reynolds


On 10/10/18 6:57 PM, William Brown wrote:

On Wed, 2018-10-10 at 16:26 +0200, thierry bordaz wrote:

Hi William,

Thanks for starting this discussion.
Your email raise several aspects (How, for whom,..) and I think a way
to
start would be to write down what we want.
A need is from a given workload to determine where we are spending
time
as a way to determine where to invest.
An other need is to collect metrics at operation level.

Aren't these very similar? The time we invest is generally on improving
a plugin or a small part of an operation, to make the operation as a
whole faster.

So if we can report on an individual operation, we can write a tool
similar to log-conv.pl, but for performance metrics that displays
trends of operations that are not performaning well, then we can find
examples of operations and why.


  From the how perspective, we can rely on external tools
(stap+scripts),
or internal tool (like the plugin you described+scripts). Of course
we
can also do some enhancements inside DS (like adding probes) to help
external tools. I have no strong opinion if an approach is better
than
the other but I think it also depends what you want to perform.

I think that it would be great if the tools we use internal to the
team, were accessible outside to admins of ds. That way when we get
reports for performance concerns, we have a standardised way of looking
at this. It's going to mean our workflow is the same between internal
development and profiling, as for external reports, and it will force
us to have all the information we need in that one place.

I think as a coarse first metric internal event timings is probably
want we want first. After that we can continue to extend from there?

As for the how, perhaps we can put something on the Operation struct
for appending and logging events and turning those into metrics?

As mentioned you could use stap too with defined points for tracing,
but that limits us to linux only?


Whatever tools we use doesn't really concern me - as long as we get good 
data.  Somewhere we have old reports from stap pointing out lock 
contention problem areas, but we should really rerun all of those tests 
with the current code base.  As for improving performance I think we 
should first address the major issues found by the existing tools (stap 
and friends) - specifically the lock contention problems (config, 
connections, attr syntax checking, etc).  Once these are addressed then 
we can start adding probes/internal structs to fine tune other aspects 
of the server.


Improving performance will be the primary focus for 389-ds-base-1.4.1, 
and we should be able invest a good amount of time into this effort.  
Getting nunc-stans stable falls into this category as well (it should 
actually be addressed first).


Mark




best regards
thierry

On 10/08/2018 12:37 PM, William Brown wrote:

Hi there,

In a ticket Thierry and I mentioned that we should have a quick
discussion about ideas for profiling and what we want it to look
like and what we need. I think it’s important we improve our
observation into the server so that we can target improvements
correctly,

I think we should know:

* Who is the target audience to run our profiling tools?
* What kind of information we do want?
* Potential solution for the above.

With those in mind I think that Thierry suggested STAP scripts.

* Target audience - developers (us) and some “highly experienced”
admins (STAP is not the easiest thing to run).
* Information - STAP would largely tell us timing and possibly
allows some variable/struct extraction. STAP does allow us to look
at connection info too a bit easier.

I would suggest an “event” struct, and logging service

At the start of an operation we create an event struct. As we enter
- exit a plugin we can append timing information, and the plugin
itself can add details (for example, backend could add idl
performance metrics or other). At the end of the operation, we log
the event struct as a json blob to our access log associated to the
conn/op.

* Target - anyone, it’s a log level. Really easy to enable (Think
mailing list or user support, can easily send us diagnostic logs)
* Information - we need a bit more work to structure the “event”
struct internally for profiling, but we’d get timings and possibly
internal variable data as well in the event.


I think these are two possible approaches. STAP is less invasive,
easier to start now, but harder to extend later. Logging is more
accessible to users/admins, easier to extend later, but more work
to add now.

What do we think?


—
Sincerely,

William


___
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-leave@lists.fedoraproject
.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidel
ines
List Archives: