Re: Too many PMC implementations

2018-08-25 Thread Kamil Rytarowski
On 26.08.2018 02:40, Kamil Rytarowski wrote:
> On 25.08.2018 21:32, David Holland wrote:
>>  > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned
>>  >an idea of bytecode validation few years ago. From Lua 5.3 manual:
>>  > 
>>  >Lua does not check the consistency of binary chunks. Maliciously
>>  >crafted binary chunks can crash the interpreter.
>>
>> Are we talking about installing untrusted/unprivileged kernel trace
>> logic? Because that seems like a bad idea, or at least a very hard
>> thing to get right... and if not, it doesn't matter if there's a
>> validator.
>>
>> (Also, isn't EBPF not really validatable either, or am I mixing it
>> up with something else?)
>>
> 
> For the record, eBPF has at least two stages of validation:
>  - CFG analysis to find infinite loops that would deadlock,
>  - a simulator that tries to verify whether code paths are meaningful.
> 
> eBPF tries to prevent uninitialized memory read and read-only variable
> write.
> 
> Additionally eBPF can restrict pointer arithmetics.
> 
> But right, it's much easier to restrict loading any code into the kernel
> to a privileged user.
> 

We do not allow loading Lua bytecode by default. This means that the
verification is done by the compiler and it's difficult to crash the
interpreter on accident.

$ sysctl -d kern.lua.bytecode
kern.lua.bytecode = 0



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-08-25 Thread Kamil Rytarowski
On 25.08.2018 21:32, David Holland wrote:
>  > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned
>  >an idea of bytecode validation few years ago. From Lua 5.3 manual:
>  > 
>  >Lua does not check the consistency of binary chunks. Maliciously
>  >crafted binary chunks can crash the interpreter.
> 
> Are we talking about installing untrusted/unprivileged kernel trace
> logic? Because that seems like a bad idea, or at least a very hard
> thing to get right... and if not, it doesn't matter if there's a
> validator.
> 
> (Also, isn't EBPF not really validatable either, or am I mixing it
> up with something else?)
> 

For the record, eBPF has at least two stages of validation:
 - CFG analysis to find infinite loops that would deadlock,
 - a simulator that tries to verify whether code paths are meaningful.

eBPF tries to prevent uninitialized memory read and read-only variable
write.

Additionally eBPF can restrict pointer arithmetics.

But right, it's much easier to restrict loading any code into the kernel
to a privileged user.



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-08-25 Thread Alexander Nasonov
David Holland wrote:
> On Sat, Aug 25, 2018 at 11:26:07AM +0100, Alexander Nasonov wrote:
>  > 1. It's not standartised and it will very likely change in future versions
> 
> That doesn't really matter as long as you're only using one version at
> a time...

If bytecode is generated from a valid Lua program, it's indeed takes
very little effort to update to a new version. But updating handcrafted
bytecode may take a bit of time.

>  > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned
>  >an idea of bytecode validation few years ago. From Lua 5.3 manual:
>  > 
>  >Lua does not check the consistency of binary chunks. Maliciously
>  >crafted binary chunks can crash the interpreter.
> 
> Are we talking about installing untrusted/unprivileged kernel trace
> logic? Because that seems like a bad idea, or at least a very hard
> thing to get right... and if not, it doesn't matter if there's a
> validator.

Lua bytecode is turing complete and not validatable but I'm pretty sure
some subset of it (e.g. no loops, no strings, etc) can be validated.

> (Also, isn't EBPF not really validatable either, or am I mixing it
> up with something else?)

Last I checked, the author(s) of eBPF claimed that it can be validated.

-- 
Alex


Re: Too many PMC implementations

2018-08-25 Thread David Holland
On Fri, Aug 10, 2018 at 11:40:20AM +0200, Maxime Villard wrote:
 > I saw the thread [Re: Sample based profiling] on tech-userlevel@, I'm not
 > subscribed to this list but I'm answering here because it's related to
 > tprof among other things.
 > 
 > I agree that it would be better to retire gprof in base, because there are
 > more powerful tools now, and also advanced hardware support (PMC, PEBS,
> ProcessorTrace).

Speaking of old ratty tools, got a plan for replacing gcov? :-)

-- 
David A. Holland
dholl...@netbsd.org


Re: Too many PMC implementations

2018-08-25 Thread David Holland
On Sat, Aug 25, 2018 at 11:26:07AM +0100, Alexander Nasonov wrote:
 > > There is already a Lua-powered solution for traces in Linux: ktap. It
 > > uses nice rules written natively in Lua.. however it seems to be
 > > abandoned in favor of eBPF.
 > 
 > I see two potential problems with using Lua bytecode:
 > 
 > 1. It's not standartised and it will very likely change in future versions

That doesn't really matter as long as you're only using one version at
a time...

 > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned
 >an idea of bytecode validation few years ago. From Lua 5.3 manual:
 > 
 >Lua does not check the consistency of binary chunks. Maliciously
 >crafted binary chunks can crash the interpreter.

Are we talking about installing untrusted/unprivileged kernel trace
logic? Because that seems like a bad idea, or at least a very hard
thing to get right... and if not, it doesn't matter if there's a
validator.

(Also, isn't EBPF not really validatable either, or am I mixing it
up with something else?)

-- 
David A. Holland
dholl...@netbsd.org


Re: Too many PMC implementations

2018-08-25 Thread Alexander Nasonov
Kamil Rytarowski wrote:
> There is already a Lua-powered solution for traces in Linux: ktap. It
> uses nice rules written natively in Lua.. however it seems to be
> abandoned in favor of eBPF.

I see two potential problems with using Lua bytecode:

1. It's not standartised and it will very likely change in future versions

2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned
   an idea of bytecode validation few years ago. From Lua 5.3 manual:

   Lua does not check the consistency of binary chunks. Maliciously
   crafted binary chunks can crash the interpreter.

Alex


signature.asc
Description: PGP signature


Re: Too many PMC implementations

2018-08-24 Thread Kamil Rytarowski
On 25.08.2018 00:28, Rhialto wrote:
> On Thu 23 Aug 2018 at 18:48:32 +0200, Kamil Rytarowski wrote:
>> Probably DTrace is not the final word in BSD and not something I intend
>> to defend - but it's a good solution for now - (FreeBSD already
>> ports/develops a potential replacement eBPF).
> 
> I have played a bit with EBPF on Linux, and it feels weird to use a
> "packet filter" bytecode-based thing for performance monitoring.
> 

e in BPF makes a big difference. The new bytecode model is similar to a
regular generic purpose (application) CPU.

> Don't we already have a bytecode interpreter in the kernel in the form
> of Lua? I hardly know anything of Lua, but using that (being an existing
> tool) would make somewhat more sense than a glorified packet filter
> (which needs a big tool set in the form of clang to compile C to EBPF
> bytecode).
> 

There is already a Lua-powered solution for traces in Linux: ktap. It
uses nice rules written natively in Lua.. however it seems to be
abandoned in favor of eBPF.

> -Olaf.
> 




signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-08-24 Thread Rhialto
On Thu 23 Aug 2018 at 18:48:32 +0200, Kamil Rytarowski wrote:
> Probably DTrace is not the final word in BSD and not something I intend
> to defend - but it's a good solution for now - (FreeBSD already
> ports/develops a potential replacement eBPF).

I have played a bit with EBPF on Linux, and it feels weird to use a
"packet filter" bytecode-based thing for performance monitoring.

Don't we already have a bytecode interpreter in the kernel in the form
of Lua? I hardly know anything of Lua, but using that (being an existing
tool) would make somewhat more sense than a glorified packet filter
(which needs a big tool set in the form of clang to compile C to EBPF
bytecode).

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- Wayland: Those who don't understand X
\X/ rhialto/at/falu.nl  -- are condemned to reinvent it. Poorly.


signature.asc
Description: PGP signature


Re: Too many PMC implementations

2018-08-24 Thread Christos Zoulas
On Aug 23, 11:57am, t...@panix.com (Thor Lancelot Simon) wrote:
-- Subject: Re: Too many PMC implementations

| On Thu, Aug 23, 2018 at 05:09:35PM +0200, Kamil Rytarowski wrote:
| > 
| > Observing that all the useful profiling is already done with DTrace, we
| > can remove complexity from the kernel with negligible cost.
| 
| I'm not sure what to make of this.  I'm trying to come up with a way to
| make the above statement true, and I'm having some difficulty.
| 
| You can't possibly mean "Observing that (unproven premise), therefore
| (conclusion)", so I'll discard that interpretation.
| 
| Do you perhaps mean "*If* we were to observe that all useful profiling
| were done with DTrace, *then* we could remove complexity from the
| kernel with negligible cost"?
| 
| Because Ragge and others have been pointing out that in that case,
| the premise "all useful profiling is done with DTrace" does not appear
| to be true.  Profiling kernel code on VAX may not be useful *to you*
| but that does not imply it is "not useful" simpliciter.
| 

Until we port dtrace to at least a good representative set of architectures
we should not remove the only means of profiling for the kernel.

christos


Re: Too many PMC implementations

2018-08-23 Thread Thor Lancelot Simon
On Thu, Aug 23, 2018 at 10:17:29AM -0700, Jason Thorpe wrote:
> 
> > On Aug 23, 2018, at 8:47 AM, Anders Magnusson  wrote:
> 
> > I have used it not long ago for vax.  Maybe I did have to do some tweaks, 
> > do not remember,
> > but I really want to be able to use kernel profiling on vax.
> > 
> > So, I really oppose removing it and leaving vax without any kernel 
> > profiling choice.
> 
> How hard would it be to add support for dtrace on Vax?

Without FBT, probably pretty easy.  But of course FBT is the only plausible
replacement for a statistical profiler that DTrace offers.

The basic requirement for FBT is a dynamic patcher (to really do it right);
though some __predict-false branches can be inserted at the head of every
function and a global used instead.

-- 
 Thor Lancelot Simon t...@panix.com
  "Whether or not there's hope for change is not the question.  If you
   want to be a free person, you don't stand up for human rights because
   it will work, but because it is right."  --Andrei Sakharov


Re: Too many PMC implementations

2018-08-23 Thread Jason Thorpe


> On Aug 23, 2018, at 8:47 AM, Anders Magnusson  wrote:

> I have used it not long ago for vax.  Maybe I did have to do some tweaks, do 
> not remember,
> but I really want to be able to use kernel profiling on vax.
> 
> So, I really oppose removing it and leaving vax without any kernel profiling 
> choice.

How hard would it be to add support for dtrace on Vax?

-- thorpej



Re: Too many PMC implementations

2018-08-23 Thread Kamil Rytarowski
On 23.08.2018 18:35, Thor Lancelot Simon wrote:
> On Thu, Aug 23, 2018 at 06:25:56PM +0200, Kamil Rytarowski wrote:
>>
>> As useful I mean the number of commits to the src/ tree. If nothing
>> landed, probably nothing was useful. When were the most recent patches
>> from gprof or similar?
> 
> This is a plainly bogus criterion.  After we integrated DTrace, there
> were several periods of a year or more during which there were no
> "patches from DTrace or similar".  I know, and it was pretty frustrating
> to me too, since I paid a considerable amount of money to have it ported
> and maintained, and I had to justify that to my boss.
> 
> Should it have, then, been ripped out?  It's certainly a lot more complex
> than gprof.
> 

Contrary to gprof (idea from 1988), actually DTrace can be utilized
easier on a modern hardware.

Probably DTrace is not the final word in BSD and not something I intend
to defend - but it's a good solution for now - (FreeBSD already
ports/develops a potential replacement eBPF).



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-08-23 Thread Thor Lancelot Simon
On Thu, Aug 23, 2018 at 06:25:56PM +0200, Kamil Rytarowski wrote:
> 
> As useful I mean the number of commits to the src/ tree. If nothing
> landed, probably nothing was useful. When were the most recent patches
> from gprof or similar?

This is a plainly bogus criterion.  After we integrated DTrace, there
were several periods of a year or more during which there were no
"patches from DTrace or similar".  I know, and it was pretty frustrating
to me too, since I paid a considerable amount of money to have it ported
and maintained, and I had to justify that to my boss.

Should it have, then, been ripped out?  It's certainly a lot more complex
than gprof.

-- 
 Thor Lancelot Simon t...@panix.com
  "Whether or not there's hope for change is not the question.  If you
   want to be a free person, you don't stand up for human rights because
   it will work, but because it is right."  --Andrei Sakharov


Re: Too many PMC implementations

2018-08-23 Thread Kamil Rytarowski
On 23.08.2018 17:57, Thor Lancelot Simon wrote:
> On Thu, Aug 23, 2018 at 05:09:35PM +0200, Kamil Rytarowski wrote:
>>
>> Observing that all the useful profiling is already done with DTrace, we
>> can remove complexity from the kernel with negligible cost.
> 
> I'm not sure what to make of this.  I'm trying to come up with a way to
> make the above statement true, and I'm having some difficulty.
> 
> You can't possibly mean "Observing that (unproven premise), therefore
> (conclusion)", so I'll discard that interpretation.
> 
> Do you perhaps mean "*If* we were to observe that all useful profiling
> were done with DTrace, *then* we could remove complexity from the
> kernel with negligible cost"?
> 
> Because Ragge and others have been pointing out that in that case,
> the premise "all useful profiling is done with DTrace" does not appear
> to be true.  Profiling kernel code on VAX may not be useful *to you*
> but that does not imply it is "not useful" simpliciter.
> 

As useful I mean the number of commits to the src/ tree. If nothing
landed, probably nothing was useful. When were the most recent patches
from gprof or similar?



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-08-23 Thread Thor Lancelot Simon
On Thu, Aug 23, 2018 at 05:09:35PM +0200, Kamil Rytarowski wrote:
> 
> Observing that all the useful profiling is already done with DTrace, we
> can remove complexity from the kernel with negligible cost.

I'm not sure what to make of this.  I'm trying to come up with a way to
make the above statement true, and I'm having some difficulty.

You can't possibly mean "Observing that (unproven premise), therefore
(conclusion)", so I'll discard that interpretation.

Do you perhaps mean "*If* we were to observe that all useful profiling
were done with DTrace, *then* we could remove complexity from the
kernel with negligible cost"?

Because Ragge and others have been pointing out that in that case,
the premise "all useful profiling is done with DTrace" does not appear
to be true.  Profiling kernel code on VAX may not be useful *to you*
but that does not imply it is "not useful" simpliciter.

-- 
 Thor Lancelot Simon t...@panix.com
  "Whether or not there's hope for change is not the question.  If you
   want to be a free person, you don't stand up for human rights because
   it will work, but because it is right."  --Andrei Sakharov


Re: Too many PMC implementations

2018-08-23 Thread Anders Magnusson

Den 2018-08-23 kl. 17:09, skrev Kamil Rytarowski:

On 23.08.2018 16:59, Anders Magnusson wrote:

Den 2018-08-23 kl. 16:48, skrev Kamil Rytarowski:

On 23.08.2018 16:28, Anders Magnusson wrote:

Den 2018-08-23 kl. 15:53, skrev Maxime Villard:

Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit :

On 17.08.2018 17:13, Maxime Villard wrote:

Note that I'm talking about the kernel gprof, and not the userland
gprof.
In terms of kernel profiling, it's not nonsensical to say that
since we
support ARM and x86 in tprof, we can cover 99% of the MI parts of
whatever architecture. From then on, being able to profile the
kernel on
other architectures has very little interest.

Speaking realistically, probably all the recent software-based kernel
profiling was done with DTrace.

Yes. So I will proceed.

Note that the removal of the kernel gprof implies the removal of kgmon.

Just checking:  How will it work for ports like vax?
When searching for bottlenecks I normally use gprof/kgmon.  I don't know
anything about DTrace, hence the question.

-- Ragge

There is no support of DTrace for vax and probably there won't be one.
Also probably DTrace is not a final solution per se (DTrace is described
as step backwards by people such as Brendan Gregg).. but we are working
on better toolchain support to open more possibilities such as XRay.

Regarding vax there might be bottlenecks in MD code, but DTrace is a
decent one for MI code on supported ports.

Hm, so this means that we will be without kernel profiling support at
all on non-DTrace architectures?
I'm not too happy about that by obvious reasons.

It do not work to profile code paths on other architectures, since what
takes time is very different.
And yes, it is not the MD code that is the case, it's the MI code.

I may have missed something, but why remove something that works without
replacing it with something new?
Only have profiling on a few ports do not sound very clever to me.

-- Ragge



Evaluating this situation we have to be aware that this description
could be reversed and there are ports without meaningful (or any) gprof
support.

Observing that all the useful profiling is already done with DTrace, we
can remove complexity from the kernel with negligible cost.

This is not true.  Things that you will never notice is a problem on x86 
may kill a vax,
since there is a large speed factor inbetween.  This was true many years 
ago and is still true.


Bottom line:  I think it is a bad idea to be without kernel profiling 
code on vax.


-- Ragge


Re: Too many PMC implementations

2018-08-23 Thread Anders Magnusson

Den 2018-08-23 kl. 17:03, skrev Maxime Villard:

Le 23/08/2018 à 16:28, Anders Magnusson a écrit :

Den 2018-08-23 kl. 15:53, skrev Maxime Villard:

Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit :

On 17.08.2018 17:13, Maxime Villard wrote:
Note that I'm talking about the kernel gprof, and not the userland 
gprof.
In terms of kernel profiling, it's not nonsensical to say that 
since we

support ARM and x86 in tprof, we can cover 99% of the MI parts of
whatever architecture. From then on, being able to profile the 
kernel on

other architectures has very little interest.


Speaking realistically, probably all the recent software-based kernel
profiling was done with DTrace.


Yes. So I will proceed.

Note that the removal of the kernel gprof implies the removal of kgmon.

Just checking:  How will it work for ports like vax?
When searching for bottlenecks I normally use gprof/kgmon.  I don't know
anything about DTrace, hence the question.


It looks like there will be no replacement. Are you sure this is really
kgmon? Because as far as I can tell, in many architectures GPROF is just
dead code that either doesn't compile or doesn't have effect (missing
opt_gprof.h, but I did add it in February of this year in the MI parts,
so it was likely even more broken before).
I have used it not long ago for vax.  Maybe I did have to do some 
tweaks, do not remember,

but I really want to be able to use kernel profiling on vax.

So, I really oppose removing it and leaving vax without any kernel 
profiling choice.


-- Ragge



Re: Too many PMC implementations

2018-08-23 Thread Kamil Rytarowski
On 23.08.2018 16:59, Anders Magnusson wrote:
> Den 2018-08-23 kl. 16:48, skrev Kamil Rytarowski:
>> On 23.08.2018 16:28, Anders Magnusson wrote:
>>> Den 2018-08-23 kl. 15:53, skrev Maxime Villard:
 Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit :
> On 17.08.2018 17:13, Maxime Villard wrote:
>> Note that I'm talking about the kernel gprof, and not the userland
>> gprof.
>> In terms of kernel profiling, it's not nonsensical to say that
>> since we
>> support ARM and x86 in tprof, we can cover 99% of the MI parts of
>> whatever architecture. From then on, being able to profile the
>> kernel on
>> other architectures has very little interest.
> Speaking realistically, probably all the recent software-based kernel
> profiling was done with DTrace.
 Yes. So I will proceed.

 Note that the removal of the kernel gprof implies the removal of kgmon.
>>> Just checking:  How will it work for ports like vax?
>>> When searching for bottlenecks I normally use gprof/kgmon.  I don't know
>>> anything about DTrace, hence the question.
>>>
>>> -- Ragge
>> There is no support of DTrace for vax and probably there won't be one.
>> Also probably DTrace is not a final solution per se (DTrace is described
>> as step backwards by people such as Brendan Gregg).. but we are working
>> on better toolchain support to open more possibilities such as XRay.
>>
>> Regarding vax there might be bottlenecks in MD code, but DTrace is a
>> decent one for MI code on supported ports.
> Hm, so this means that we will be without kernel profiling support at
> all on non-DTrace architectures?
> I'm not too happy about that by obvious reasons.
> 
> It do not work to profile code paths on other architectures, since what
> takes time is very different.
> And yes, it is not the MD code that is the case, it's the MI code.
> 
> I may have missed something, but why remove something that works without
> replacing it with something new?
> Only have profiling on a few ports do not sound very clever to me.
> 
> -- Ragge
> 
> 

Evaluating this situation we have to be aware that this description
could be reversed and there are ports without meaningful (or any) gprof
support.

Observing that all the useful profiling is already done with DTrace, we
can remove complexity from the kernel with negligible cost.



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-08-23 Thread Maxime Villard

Le 23/08/2018 à 16:28, Anders Magnusson a écrit :

Den 2018-08-23 kl. 15:53, skrev Maxime Villard:

Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit :

On 17.08.2018 17:13, Maxime Villard wrote:

Note that I'm talking about the kernel gprof, and not the userland gprof.
In terms of kernel profiling, it's not nonsensical to say that since we
support ARM and x86 in tprof, we can cover 99% of the MI parts of
whatever architecture. From then on, being able to profile the kernel on
other architectures has very little interest.


Speaking realistically, probably all the recent software-based kernel
profiling was done with DTrace.


Yes. So I will proceed.

Note that the removal of the kernel gprof implies the removal of kgmon.

Just checking:  How will it work for ports like vax?
When searching for bottlenecks I normally use gprof/kgmon.  I don't know
anything about DTrace, hence the question.


It looks like there will be no replacement. Are you sure this is really
kgmon? Because as far as I can tell, in many architectures GPROF is just
dead code that either doesn't compile or doesn't have effect (missing
opt_gprof.h, but I did add it in February of this year in the MI parts,
so it was likely even more broken before).


Re: Too many PMC implementations

2018-08-23 Thread Anders Magnusson

Den 2018-08-23 kl. 16:48, skrev Kamil Rytarowski:

On 23.08.2018 16:28, Anders Magnusson wrote:

Den 2018-08-23 kl. 15:53, skrev Maxime Villard:

Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit :

On 17.08.2018 17:13, Maxime Villard wrote:

Note that I'm talking about the kernel gprof, and not the userland
gprof.
In terms of kernel profiling, it's not nonsensical to say that since we
support ARM and x86 in tprof, we can cover 99% of the MI parts of
whatever architecture. From then on, being able to profile the
kernel on
other architectures has very little interest.

Speaking realistically, probably all the recent software-based kernel
profiling was done with DTrace.

Yes. So I will proceed.

Note that the removal of the kernel gprof implies the removal of kgmon.

Just checking:  How will it work for ports like vax?
When searching for bottlenecks I normally use gprof/kgmon.  I don't know
anything about DTrace, hence the question.

-- Ragge

There is no support of DTrace for vax and probably there won't be one.
Also probably DTrace is not a final solution per se (DTrace is described
as step backwards by people such as Brendan Gregg).. but we are working
on better toolchain support to open more possibilities such as XRay.

Regarding vax there might be bottlenecks in MD code, but DTrace is a
decent one for MI code on supported ports.
Hm, so this means that we will be without kernel profiling support at 
all on non-DTrace architectures?

I'm not too happy about that by obvious reasons.

It do not work to profile code paths on other architectures, since what 
takes time is very different.

And yes, it is not the MD code that is the case, it's the MI code.

I may have missed something, but why remove something that works without 
replacing it with something new?

Only have profiling on a few ports do not sound very clever to me.

-- Ragge




Re: Too many PMC implementations

2018-08-23 Thread Kamil Rytarowski
On 23.08.2018 16:28, Anders Magnusson wrote:
> Den 2018-08-23 kl. 15:53, skrev Maxime Villard:
>> Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit :
>>> On 17.08.2018 17:13, Maxime Villard wrote:
 Note that I'm talking about the kernel gprof, and not the userland
 gprof.
 In terms of kernel profiling, it's not nonsensical to say that since we
 support ARM and x86 in tprof, we can cover 99% of the MI parts of
 whatever architecture. From then on, being able to profile the
 kernel on
 other architectures has very little interest.
>>>
>>> Speaking realistically, probably all the recent software-based kernel
>>> profiling was done with DTrace.
>>
>> Yes. So I will proceed.
>>
>> Note that the removal of the kernel gprof implies the removal of kgmon.
> Just checking:  How will it work for ports like vax?
> When searching for bottlenecks I normally use gprof/kgmon.  I don't know
> anything about DTrace, hence the question.
> 
> -- Ragge

There is no support of DTrace for vax and probably there won't be one.
Also probably DTrace is not a final solution per se (DTrace is described
as step backwards by people such as Brendan Gregg).. but we are working
on better toolchain support to open more possibilities such as XRay.

Regarding vax there might be bottlenecks in MD code, but DTrace is a
decent one for MI code on supported ports.



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-08-23 Thread Anders Magnusson

Den 2018-08-23 kl. 15:53, skrev Maxime Villard:

Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit :

On 17.08.2018 17:13, Maxime Villard wrote:
Note that I'm talking about the kernel gprof, and not the userland 
gprof.

In terms of kernel profiling, it's not nonsensical to say that since we
support ARM and x86 in tprof, we can cover 99% of the MI parts of
whatever architecture. From then on, being able to profile the 
kernel on

other architectures has very little interest.


Speaking realistically, probably all the recent software-based kernel
profiling was done with DTrace.


Yes. So I will proceed.

Note that the removal of the kernel gprof implies the removal of kgmon.

Just checking:  How will it work for ports like vax?
When searching for bottlenecks I normally use gprof/kgmon.  I don't know 
anything about DTrace, hence the question.


-- Ragge


Re: Too many PMC implementations

2018-08-23 Thread Maxime Villard

Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit :

On 17.08.2018 17:13, Maxime Villard wrote:

Note that I'm talking about the kernel gprof, and not the userland gprof.
In terms of kernel profiling, it's not nonsensical to say that since we
support ARM and x86 in tprof, we can cover 99% of the MI parts of
whatever architecture. From then on, being able to profile the kernel on
other architectures has very little interest.


Speaking realistically, probably all the recent software-based kernel
profiling was done with DTrace.


Yes. So I will proceed.

Note that the removal of the kernel gprof implies the removal of kgmon.


Re: Too many PMC implementations

2018-08-17 Thread Jason Thorpe



> On Aug 17, 2018, at 8:42 AM, Kamil Rytarowski  wrote:
> 
> Speaking realistically, probably all the recent software-based kernel
> profiling was done with DTrace.

Yah, I suppose I'm okay will killing off kernel GPROF support ... you can 
essentially do the same-thing-but-better with an on-cpu flame graph generated 
from dtrace data.  If the lower-tier platforms don't support this properly, the 
energy should go towards fixing that.

-- thorpej



Re: Too many PMC implementations

2018-08-17 Thread Kamil Rytarowski
On 17.08.2018 17:13, Maxime Villard wrote:
> Note that I'm talking about the kernel gprof, and not the userland gprof.
> In terms of kernel profiling, it's not nonsensical to say that since we
> support ARM and x86 in tprof, we can cover 99% of the MI parts of
> whatever architecture. From then on, being able to profile the kernel on
> other architectures has very little interest.
> 

Speaking realistically, probably all the recent software-based kernel
profiling was done with DTrace.



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-08-17 Thread Maxime Villard

Le 17/08/2018 à 16:43, Joerg Sonnenberger a écrit :

On Fri, Aug 17, 2018 at 04:20:30PM +0200, Maxime Villard wrote:

So no one has any opinion on that? Because in this case I will remove it
soon. (Talking about the kernel gprof.)


I'm quite reluctant to remove the only sample based profiler we have
right now. Esp. since we don't have any infrastructure for counter-based
profilers either AFAICT.


We do with tprof now, no?


Le 17/08/2018 à 16:50, Mouse a écrit :

I agree that it would be better to retire gprof in base, because
there are more powerful tools now, and also advanced hardware
support (PMC, PEBS, ProcessorTrace).


...for ports that _have_ "advanced hardware support", maybe.  (And what
are the "more powerful tools"?  I haven't been following the state of
the art in open-source profiling tools.)


Yes, basically I was talking about x86. I do know that many architectures
support PMCs, but I don't know how precise the events are (etc). The tools
were mentioned before, like the linux "perf", which is pretty good.

Note that I'm talking about the kernel gprof, and not the userland gprof.
In terms of kernel profiling, it's not nonsensical to say that since we
support ARM and x86 in tprof, we can cover 99% of the MI parts of
whatever architecture. From then on, being able to profile the kernel on
other architectures has very little interest.

The gprof code is rather shitty and old, I dropped it from the x86 kernels
so it's not like I care a lot now, but since I saw the thread I thought I
would bring this up.


Re: Too many PMC implementations

2018-08-17 Thread Mouse
>> I agree that it would be better to retire gprof in base, because
>> there are more powerful tools now, and also advanced hardware
>> support (PMC, PEBS, ProcessorTrace).

...for ports that _have_ "advanced hardware support", maybe.  (And what
are the "more powerful tools"?  I haven't been following the state of
the art in open-source profiling tools.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Too many PMC implementations

2018-08-17 Thread Joerg Sonnenberger
On Fri, Aug 17, 2018 at 04:20:30PM +0200, Maxime Villard wrote:
> Le 10/08/2018 à 11:40, Maxime Villard a écrit :
> > I saw the thread [Re: Sample based profiling] on tech-userlevel@, I'm not
> > subscribed to this list but I'm answering here because it's related to
> > tprof among other things.
> > 
> > I agree that it would be better to retire gprof in base, because there are
> > more powerful tools now, and also advanced hardware support (PMC, PEBS,
> > ProcessorTrace).
> > 
> > But in particular, it would be nice to retire the "kernel gprof". That is,
> > the MD/MI pieces that are surrounded by #ifdef GPROF. This kind of
> > profiling is weak, and misses many aspects of execution (branch prediction,
> > cache misses, heavy instructions, etc) that are offered by tprof.
> > 
> > I already dropped NENTRY() from x86, so KGPROF is officially not supported
> > there anymore. I think it has never worked on amd64.
> 
> So no one has any opinion on that? Because in this case I will remove it
> soon. (Talking about the kernel gprof.)

I'm quite reluctant to remove the only sample based profiler we have
right now. Esp. since we don't have any infrastructure for counter-based
profilers either AFAICT.

Joerg


Re: Too many PMC implementations

2018-08-10 Thread Maxime Villard

I saw the thread [Re: Sample based profiling] on tech-userlevel@, I'm not
subscribed to this list but I'm answering here because it's related to
tprof among other things.

I agree that it would be better to retire gprof in base, because there are
more powerful tools now, and also advanced hardware support (PMC, PEBS,
ProcessorTrace).

But in particular, it would be nice to retire the "kernel gprof". That is,
the MD/MI pieces that are surrounded by #ifdef GPROF. This kind of
profiling is weak, and misses many aspects of execution (branch prediction,
cache misses, heavy instructions, etc) that are offered by tprof.

I already dropped NENTRY() from x86, so KGPROF is officially not supported
there anymore. I think it has never worked on amd64.


Re: Too many PMC implementations

2018-07-15 Thread Jared McNeill

On Sun, 15 Jul 2018, Maxime Villard wrote:


Now I want to move:

arch/x86/x86/tprof_pmi.c
arch/x86/x86/tprof_amdpmi.c

into

dev/tprof/tprof_intel.c
dev/tprof/tprof_amd.c

I guess people are fine? I think it is better to gather all the pieces in
one dir.


I don't really have an opinion here, but I've just committed a new 
backend as dev/tprof/tprof_armv8.c. So I guess that's a vote for the 
latter :)


Cheers,
Jared


Re: Too many PMC implementations

2018-07-15 Thread Maxime Villard

Le 11/07/2018 à 18:22, Maxime Villard a écrit :

Right now we have three (or more?) different implementations for Performance
Monitoring Counters:

  * PMC: this one is MI. It is used only on one ARM model (xscale I think).
There used to be an x86 code for it, but it was broken, and I removed it.
The implementation comes with libpmc, a library we provide. The code
hasn't moved these last 15 years. I don't like this implementation, it is
really invasive (see the numerous pmc.h files that are all empty).

  * X86PMC: this one is MD, and only available for x86. I wrote it myself.
The code is small (x86/pmc.c), and functional. The PMCs are system-wide,
and retrieved on a per-cpu basis. But this implementation does not
support tracking, that is, we get numbers (about the cache misses for
example), but we don't know where they happened.

  * TPROF: this one is MI, but only x86 support is present. TPROF provides
the backend needed to support tracking: via a device, that userland can
read from, in order to absorb the event samples produced by the kernel.
The backend is pretty good, but the frontend (where the user chooses
which PMC etc) is inexistent - the CPU/event detection is not there
either. The backend is MI (/dev/tprof/tprof.c), and can be used on other
architectures. The module already exists to dynamically modload.

I think it would be good to:

  * Remove PMC entirely. Then remove libpmc too.

  * Merge X86PMC into the x86 part of TPROF. That is to say, into
x86/tprof_*. Then remove X86PMC.

  * Later, maybe, someone will want to add other architectures in TPROF, like
all the recent ARMs.

Maxime


Now I want to move:

arch/x86/x86/tprof_pmi.c
arch/x86/x86/tprof_amdpmi.c

into

dev/tprof/tprof_intel.c
dev/tprof/tprof_amd.c

I guess people are fine? I think it is better to gather all the pieces in
one dir.


Re: Too many PMC implementations

2018-07-12 Thread Kamil Rytarowski
On 12.07.2018 08:48, Maxime Villard wrote:
> Le 11/07/2018 à 19:49, Kamil Rytarowski a écrit :
>> I'm not familiar with the internals myself, but from API point of view,
>> something usable for porting rr (https://github.com/mozilla/rr) or even
>> Linux perf-top is highly desirable. I treat personally perf-top as a
>> gold standard.
> 
> Well, yes, but right now let's first try to have functional internals...

I fully understand and appreciate the option to garbage/collect
redundant implementations.

From a fuzzing point of view, we are researching during GSoC honggfuzz
(vs libFuzzer) and it can use aid from performance counters:

https://github.com/google/honggfuzz/blob/master/docs/FeedbackDrivenFuzzing.md



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-07-12 Thread Maxime Villard

Le 11/07/2018 à 18:22, Maxime Villard a écrit :

Right now we have three (or more?) different implementations for Performance
Monitoring Counters:

  * PMC: this one is MI. It is used only on one ARM model (xscale I think).
There used to be an x86 code for it, but it was broken, and I removed it.
The implementation comes with libpmc, a library we provide. The code
hasn't moved these last 15 years. I don't like this implementation, it is
really invasive (see the numerous pmc.h files that are all empty).

  * X86PMC: this one is MD, and only available for x86. I wrote it myself.
The code is small (x86/pmc.c), and functional. The PMCs are system-wide,
and retrieved on a per-cpu basis. But this implementation does not
support tracking, that is, we get numbers (about the cache misses for
example), but we don't know where they happened.

  * TPROF: this one is MI, but only x86 support is present. TPROF provides
the backend needed to support tracking: via a device, that userland can
read from, in order to absorb the event samples produced by the kernel.
The backend is pretty good, but the frontend (where the user chooses
which PMC etc) is inexistent - the CPU/event detection is not there
either. The backend is MI (/dev/tprof/tprof.c), and can be used on other
architectures. The module already exists to dynamically modload.

I think it would be good to:

  * Remove PMC entirely. Then remove libpmc too.

  * Merge X86PMC into the x86 part of TPROF. That is to say, into
x86/tprof_*. Then remove X86PMC.

  * Later, maybe, someone will want to add other architectures in TPROF, like
all the recent ARMs.

Maxime


So, I've prepared a patch. It removes "options PERFCTRS", all the pmc.h files,
the kernel sys_pmc.c, the man pages, and the PMC code of ARM XSCALE.

Other ARMs have their own small PMC code, but it is used in the MI code, and
not from the outside. These ones are obviously not removed.

The x86 code is reordered not to rely on the legacy pmc.h file (which I
recycled to put the definitions for X86PMC).

Will commit soon...


Re: Too many PMC implementations

2018-07-12 Thread Maxime Villard

Le 11/07/2018 à 19:49, Kamil Rytarowski a écrit :

I'm not familiar with the internals myself, but from API point of view,
something usable for porting rr (https://github.com/mozilla/rr) or even
Linux perf-top is highly desirable. I treat personally perf-top as a
gold standard.


Well, yes, but right now let's first try to have functional internals...


Re: Too many PMC implementations

2018-07-11 Thread Kamil Rytarowski
On 11.07.2018 18:22, Maxime Villard wrote:
> Right now we have three (or more?) different implementations for
> Performance
> Monitoring Counters:
> 
>  * PMC: this one is MI. It is used only on one ARM model (xscale I think).
>    There used to be an x86 code for it, but it was broken, and I removed
> it.
>    The implementation comes with libpmc, a library we provide. The code
>    hasn't moved these last 15 years. I don't like this implementation,
> it is
>    really invasive (see the numerous pmc.h files that are all empty).
> 
>  * X86PMC: this one is MD, and only available for x86. I wrote it myself.
>    The code is small (x86/pmc.c), and functional. The PMCs are system-wide,
>    and retrieved on a per-cpu basis. But this implementation does not
>    support tracking, that is, we get numbers (about the cache misses for
>    example), but we don't know where they happened.
> 
>  * TPROF: this one is MI, but only x86 support is present. TPROF provides
>    the backend needed to support tracking: via a device, that userland can
>    read from, in order to absorb the event samples produced by the kernel.
>    The backend is pretty good, but the frontend (where the user chooses
>    which PMC etc) is inexistent - the CPU/event detection is not there
>    either. The backend is MI (/dev/tprof/tprof.c), and can be used on other
>    architectures. The module already exists to dynamically modload.
> 
> I think it would be good to:
> 
>  * Remove PMC entirely. Then remove libpmc too.
> 
>  * Merge X86PMC into the x86 part of TPROF. That is to say, into
>    x86/tprof_*. Then remove X86PMC.
> 
>  * Later, maybe, someone will want to add other architectures in TPROF,
> like
>    all the recent ARMs.
> 
> Maxime

I'm not familiar with the internals myself, but from API point of view,
something usable for porting rr (https://github.com/mozilla/rr) or even
Linux perf-top is highly desirable. I treat personally perf-top as a
gold standard.



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-07-11 Thread Jason Thorpe
Speaking as someone who was peripherally involved in the PMC flavor below, I 
have no objections to this.

> On Jul 11, 2018, at 9:22 AM, Maxime Villard  wrote:
> 
> Right now we have three (or more?) different implementations for Performance
> Monitoring Counters:
> 
> * PMC: this one is MI. It is used only on one ARM model (xscale I think).
>   There used to be an x86 code for it, but it was broken, and I removed it.
>   The implementation comes with libpmc, a library we provide. The code
>   hasn't moved these last 15 years. I don't like this implementation, it is
>   really invasive (see the numerous pmc.h files that are all empty).
> 
> * X86PMC: this one is MD, and only available for x86. I wrote it myself.
>   The code is small (x86/pmc.c), and functional. The PMCs are system-wide,
>   and retrieved on a per-cpu basis. But this implementation does not
>   support tracking, that is, we get numbers (about the cache misses for
>   example), but we don't know where they happened.
> 
> * TPROF: this one is MI, but only x86 support is present. TPROF provides
>   the backend needed to support tracking: via a device, that userland can
>   read from, in order to absorb the event samples produced by the kernel.
>   The backend is pretty good, but the frontend (where the user chooses
>   which PMC etc) is inexistent - the CPU/event detection is not there
>   either. The backend is MI (/dev/tprof/tprof.c), and can be used on other
>   architectures. The module already exists to dynamically modload.
> 
> I think it would be good to:
> 
> * Remove PMC entirely. Then remove libpmc too.
> 
> * Merge X86PMC into the x86 part of TPROF. That is to say, into
>   x86/tprof_*. Then remove X86PMC.
> 
> * Later, maybe, someone will want to add other architectures in TPROF, like
>   all the recent ARMs.
> 
> Maxime

-- thorpej



Too many PMC implementations

2018-07-11 Thread Maxime Villard

Right now we have three (or more?) different implementations for Performance
Monitoring Counters:

 * PMC: this one is MI. It is used only on one ARM model (xscale I think).
   There used to be an x86 code for it, but it was broken, and I removed it.
   The implementation comes with libpmc, a library we provide. The code
   hasn't moved these last 15 years. I don't like this implementation, it is
   really invasive (see the numerous pmc.h files that are all empty).

 * X86PMC: this one is MD, and only available for x86. I wrote it myself.
   The code is small (x86/pmc.c), and functional. The PMCs are system-wide,
   and retrieved on a per-cpu basis. But this implementation does not
   support tracking, that is, we get numbers (about the cache misses for
   example), but we don't know where they happened.

 * TPROF: this one is MI, but only x86 support is present. TPROF provides
   the backend needed to support tracking: via a device, that userland can
   read from, in order to absorb the event samples produced by the kernel.
   The backend is pretty good, but the frontend (where the user chooses
   which PMC etc) is inexistent - the CPU/event detection is not there
   either. The backend is MI (/dev/tprof/tprof.c), and can be used on other
   architectures. The module already exists to dynamically modload.

I think it would be good to:

 * Remove PMC entirely. Then remove libpmc too.

 * Merge X86PMC into the x86 part of TPROF. That is to say, into
   x86/tprof_*. Then remove X86PMC.

 * Later, maybe, someone will want to add other architectures in TPROF, like
   all the recent ARMs.

Maxime