Re: Too many PMC implementations
On 26.08.2018 02:40, Kamil Rytarowski wrote: > On 25.08.2018 21:32, David Holland wrote: >> > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned >> >an idea of bytecode validation few years ago. From Lua 5.3 manual: >> > >> >Lua does not check the consistency of binary chunks. Maliciously >> >crafted binary chunks can crash the interpreter. >> >> Are we talking about installing untrusted/unprivileged kernel trace >> logic? Because that seems like a bad idea, or at least a very hard >> thing to get right... and if not, it doesn't matter if there's a >> validator. >> >> (Also, isn't EBPF not really validatable either, or am I mixing it >> up with something else?) >> > > For the record, eBPF has at least two stages of validation: > - CFG analysis to find infinite loops that would deadlock, > - a simulator that tries to verify whether code paths are meaningful. > > eBPF tries to prevent uninitialized memory read and read-only variable > write. > > Additionally eBPF can restrict pointer arithmetics. > > But right, it's much easier to restrict loading any code into the kernel > to a privileged user. > We do not allow loading Lua bytecode by default. This means that the verification is done by the compiler and it's difficult to crash the interpreter on accident. $ sysctl -d kern.lua.bytecode kern.lua.bytecode = 0 signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
On 25.08.2018 21:32, David Holland wrote: > > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned > >an idea of bytecode validation few years ago. From Lua 5.3 manual: > > > >Lua does not check the consistency of binary chunks. Maliciously > >crafted binary chunks can crash the interpreter. > > Are we talking about installing untrusted/unprivileged kernel trace > logic? Because that seems like a bad idea, or at least a very hard > thing to get right... and if not, it doesn't matter if there's a > validator. > > (Also, isn't EBPF not really validatable either, or am I mixing it > up with something else?) > For the record, eBPF has at least two stages of validation: - CFG analysis to find infinite loops that would deadlock, - a simulator that tries to verify whether code paths are meaningful. eBPF tries to prevent uninitialized memory read and read-only variable write. Additionally eBPF can restrict pointer arithmetics. But right, it's much easier to restrict loading any code into the kernel to a privileged user. signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
David Holland wrote: > On Sat, Aug 25, 2018 at 11:26:07AM +0100, Alexander Nasonov wrote: > > 1. It's not standartised and it will very likely change in future versions > > That doesn't really matter as long as you're only using one version at > a time... If bytecode is generated from a valid Lua program, it's indeed takes very little effort to update to a new version. But updating handcrafted bytecode may take a bit of time. > > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned > >an idea of bytecode validation few years ago. From Lua 5.3 manual: > > > >Lua does not check the consistency of binary chunks. Maliciously > >crafted binary chunks can crash the interpreter. > > Are we talking about installing untrusted/unprivileged kernel trace > logic? Because that seems like a bad idea, or at least a very hard > thing to get right... and if not, it doesn't matter if there's a > validator. Lua bytecode is turing complete and not validatable but I'm pretty sure some subset of it (e.g. no loops, no strings, etc) can be validated. > (Also, isn't EBPF not really validatable either, or am I mixing it > up with something else?) Last I checked, the author(s) of eBPF claimed that it can be validated. -- Alex
Re: Too many PMC implementations
On Fri, Aug 10, 2018 at 11:40:20AM +0200, Maxime Villard wrote: > I saw the thread [Re: Sample based profiling] on tech-userlevel@, I'm not > subscribed to this list but I'm answering here because it's related to > tprof among other things. > > I agree that it would be better to retire gprof in base, because there are > more powerful tools now, and also advanced hardware support (PMC, PEBS, > ProcessorTrace). Speaking of old ratty tools, got a plan for replacing gcov? :-) -- David A. Holland dholl...@netbsd.org
Re: Too many PMC implementations
On Sat, Aug 25, 2018 at 11:26:07AM +0100, Alexander Nasonov wrote: > > There is already a Lua-powered solution for traces in Linux: ktap. It > > uses nice rules written natively in Lua.. however it seems to be > > abandoned in favor of eBPF. > > I see two potential problems with using Lua bytecode: > > 1. It's not standartised and it will very likely change in future versions That doesn't really matter as long as you're only using one version at a time... > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned >an idea of bytecode validation few years ago. From Lua 5.3 manual: > >Lua does not check the consistency of binary chunks. Maliciously >crafted binary chunks can crash the interpreter. Are we talking about installing untrusted/unprivileged kernel trace logic? Because that seems like a bad idea, or at least a very hard thing to get right... and if not, it doesn't matter if there's a validator. (Also, isn't EBPF not really validatable either, or am I mixing it up with something else?) -- David A. Holland dholl...@netbsd.org
Re: Too many PMC implementations
Kamil Rytarowski wrote: > There is already a Lua-powered solution for traces in Linux: ktap. It > uses nice rules written natively in Lua.. however it seems to be > abandoned in favor of eBPF. I see two potential problems with using Lua bytecode: 1. It's not standartised and it will very likely change in future versions 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned an idea of bytecode validation few years ago. From Lua 5.3 manual: Lua does not check the consistency of binary chunks. Maliciously crafted binary chunks can crash the interpreter. Alex signature.asc Description: PGP signature
Re: Too many PMC implementations
On 25.08.2018 00:28, Rhialto wrote: > On Thu 23 Aug 2018 at 18:48:32 +0200, Kamil Rytarowski wrote: >> Probably DTrace is not the final word in BSD and not something I intend >> to defend - but it's a good solution for now - (FreeBSD already >> ports/develops a potential replacement eBPF). > > I have played a bit with EBPF on Linux, and it feels weird to use a > "packet filter" bytecode-based thing for performance monitoring. > e in BPF makes a big difference. The new bytecode model is similar to a regular generic purpose (application) CPU. > Don't we already have a bytecode interpreter in the kernel in the form > of Lua? I hardly know anything of Lua, but using that (being an existing > tool) would make somewhat more sense than a glorified packet filter > (which needs a big tool set in the form of clang to compile C to EBPF > bytecode). > There is already a Lua-powered solution for traces in Linux: ktap. It uses nice rules written natively in Lua.. however it seems to be abandoned in favor of eBPF. > -Olaf. > signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
On Thu 23 Aug 2018 at 18:48:32 +0200, Kamil Rytarowski wrote: > Probably DTrace is not the final word in BSD and not something I intend > to defend - but it's a good solution for now - (FreeBSD already > ports/develops a potential replacement eBPF). I have played a bit with EBPF on Linux, and it feels weird to use a "packet filter" bytecode-based thing for performance monitoring. Don't we already have a bytecode interpreter in the kernel in the form of Lua? I hardly know anything of Lua, but using that (being an existing tool) would make somewhat more sense than a glorified packet filter (which needs a big tool set in the form of clang to compile C to EBPF bytecode). -Olaf. -- ___ Olaf 'Rhialto' Seibert -- Wayland: Those who don't understand X \X/ rhialto/at/falu.nl -- are condemned to reinvent it. Poorly. signature.asc Description: PGP signature
Re: Too many PMC implementations
On Fri, Aug 24, 2018 at 07:33:05AM +0200, Maxime Villard wrote: > Well I guess kgprof will have to stay then, along with gprof (the initial > conversation). We can still clean up the dead code in the other ports, but > the interest is rather limited and I don't think I'll bother. Thanks! And we should push for dtrace on more architectures. Martin
Re: Too many PMC implementations
On Aug 23, 11:57am, t...@panix.com (Thor Lancelot Simon) wrote: -- Subject: Re: Too many PMC implementations | On Thu, Aug 23, 2018 at 05:09:35PM +0200, Kamil Rytarowski wrote: | > | > Observing that all the useful profiling is already done with DTrace, we | > can remove complexity from the kernel with negligible cost. | | I'm not sure what to make of this. I'm trying to come up with a way to | make the above statement true, and I'm having some difficulty. | | You can't possibly mean "Observing that (unproven premise), therefore | (conclusion)", so I'll discard that interpretation. | | Do you perhaps mean "*If* we were to observe that all useful profiling | were done with DTrace, *then* we could remove complexity from the | kernel with negligible cost"? | | Because Ragge and others have been pointing out that in that case, | the premise "all useful profiling is done with DTrace" does not appear | to be true. Profiling kernel code on VAX may not be useful *to you* | but that does not imply it is "not useful" simpliciter. | Until we port dtrace to at least a good representative set of architectures we should not remove the only means of profiling for the kernel. christos
Re: Too many PMC implementations
Le 23/08/2018 à 17:47, Anders Magnusson a écrit : Den 2018-08-23 kl. 17:03, skrev Maxime Villard: Le 23/08/2018 à 16:28, Anders Magnusson a écrit : Den 2018-08-23 kl. 15:53, skrev Maxime Villard: Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit : On 17.08.2018 17:13, Maxime Villard wrote: Note that I'm talking about the kernel gprof, and not the userland gprof. In terms of kernel profiling, it's not nonsensical to say that since we support ARM and x86 in tprof, we can cover 99% of the MI parts of whatever architecture. From then on, being able to profile the kernel on other architectures has very little interest. Speaking realistically, probably all the recent software-based kernel profiling was done with DTrace. Yes. So I will proceed. Note that the removal of the kernel gprof implies the removal of kgmon. Just checking: How will it work for ports like vax? When searching for bottlenecks I normally use gprof/kgmon. I don't know anything about DTrace, hence the question. It looks like there will be no replacement. Are you sure this is really kgmon? Because as far as I can tell, in many architectures GPROF is just dead code that either doesn't compile or doesn't have effect (missing opt_gprof.h, but I did add it in February of this year in the MI parts, so it was likely even more broken before). I have used it not long ago for vax. Maybe I did have to do some tweaks, do not remember, but I really want to be able to use kernel profiling on vax. So, I really oppose removing it and leaving vax without any kernel profiling choice. Well I guess kgprof will have to stay then, along with gprof (the initial conversation). We can still clean up the dead code in the other ports, but the interest is rather limited and I don't think I'll bother.
Re: Too many PMC implementations
On Thu, Aug 23, 2018 at 10:17:29AM -0700, Jason Thorpe wrote: > > > On Aug 23, 2018, at 8:47 AM, Anders Magnusson wrote: > > > I have used it not long ago for vax. Maybe I did have to do some tweaks, > > do not remember, > > but I really want to be able to use kernel profiling on vax. > > > > So, I really oppose removing it and leaving vax without any kernel > > profiling choice. > > How hard would it be to add support for dtrace on Vax? Without FBT, probably pretty easy. But of course FBT is the only plausible replacement for a statistical profiler that DTrace offers. The basic requirement for FBT is a dynamic patcher (to really do it right); though some __predict-false branches can be inserted at the head of every function and a global used instead. -- Thor Lancelot Simon t...@panix.com "Whether or not there's hope for change is not the question. If you want to be a free person, you don't stand up for human rights because it will work, but because it is right." --Andrei Sakharov
Re: Too many PMC implementations
> On Aug 23, 2018, at 8:47 AM, Anders Magnusson wrote: > I have used it not long ago for vax. Maybe I did have to do some tweaks, do > not remember, > but I really want to be able to use kernel profiling on vax. > > So, I really oppose removing it and leaving vax without any kernel profiling > choice. How hard would it be to add support for dtrace on Vax? -- thorpej
Re: Too many PMC implementations
On 23.08.2018 18:35, Thor Lancelot Simon wrote: > On Thu, Aug 23, 2018 at 06:25:56PM +0200, Kamil Rytarowski wrote: >> >> As useful I mean the number of commits to the src/ tree. If nothing >> landed, probably nothing was useful. When were the most recent patches >> from gprof or similar? > > This is a plainly bogus criterion. After we integrated DTrace, there > were several periods of a year or more during which there were no > "patches from DTrace or similar". I know, and it was pretty frustrating > to me too, since I paid a considerable amount of money to have it ported > and maintained, and I had to justify that to my boss. > > Should it have, then, been ripped out? It's certainly a lot more complex > than gprof. > Contrary to gprof (idea from 1988), actually DTrace can be utilized easier on a modern hardware. Probably DTrace is not the final word in BSD and not something I intend to defend - but it's a good solution for now - (FreeBSD already ports/develops a potential replacement eBPF). signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
On Thu, Aug 23, 2018 at 06:25:56PM +0200, Kamil Rytarowski wrote: > > As useful I mean the number of commits to the src/ tree. If nothing > landed, probably nothing was useful. When were the most recent patches > from gprof or similar? This is a plainly bogus criterion. After we integrated DTrace, there were several periods of a year or more during which there were no "patches from DTrace or similar". I know, and it was pretty frustrating to me too, since I paid a considerable amount of money to have it ported and maintained, and I had to justify that to my boss. Should it have, then, been ripped out? It's certainly a lot more complex than gprof. -- Thor Lancelot Simon t...@panix.com "Whether or not there's hope for change is not the question. If you want to be a free person, you don't stand up for human rights because it will work, but because it is right." --Andrei Sakharov
Re: Too many PMC implementations
On 23.08.2018 17:57, Thor Lancelot Simon wrote: > On Thu, Aug 23, 2018 at 05:09:35PM +0200, Kamil Rytarowski wrote: >> >> Observing that all the useful profiling is already done with DTrace, we >> can remove complexity from the kernel with negligible cost. > > I'm not sure what to make of this. I'm trying to come up with a way to > make the above statement true, and I'm having some difficulty. > > You can't possibly mean "Observing that (unproven premise), therefore > (conclusion)", so I'll discard that interpretation. > > Do you perhaps mean "*If* we were to observe that all useful profiling > were done with DTrace, *then* we could remove complexity from the > kernel with negligible cost"? > > Because Ragge and others have been pointing out that in that case, > the premise "all useful profiling is done with DTrace" does not appear > to be true. Profiling kernel code on VAX may not be useful *to you* > but that does not imply it is "not useful" simpliciter. > As useful I mean the number of commits to the src/ tree. If nothing landed, probably nothing was useful. When were the most recent patches from gprof or similar? signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
On Thu, Aug 23, 2018 at 05:09:35PM +0200, Kamil Rytarowski wrote: > > Observing that all the useful profiling is already done with DTrace, we > can remove complexity from the kernel with negligible cost. I'm not sure what to make of this. I'm trying to come up with a way to make the above statement true, and I'm having some difficulty. You can't possibly mean "Observing that (unproven premise), therefore (conclusion)", so I'll discard that interpretation. Do you perhaps mean "*If* we were to observe that all useful profiling were done with DTrace, *then* we could remove complexity from the kernel with negligible cost"? Because Ragge and others have been pointing out that in that case, the premise "all useful profiling is done with DTrace" does not appear to be true. Profiling kernel code on VAX may not be useful *to you* but that does not imply it is "not useful" simpliciter. -- Thor Lancelot Simon t...@panix.com "Whether or not there's hope for change is not the question. If you want to be a free person, you don't stand up for human rights because it will work, but because it is right." --Andrei Sakharov
Re: Too many PMC implementations
Den 2018-08-23 kl. 17:09, skrev Kamil Rytarowski: On 23.08.2018 16:59, Anders Magnusson wrote: Den 2018-08-23 kl. 16:48, skrev Kamil Rytarowski: On 23.08.2018 16:28, Anders Magnusson wrote: Den 2018-08-23 kl. 15:53, skrev Maxime Villard: Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit : On 17.08.2018 17:13, Maxime Villard wrote: Note that I'm talking about the kernel gprof, and not the userland gprof. In terms of kernel profiling, it's not nonsensical to say that since we support ARM and x86 in tprof, we can cover 99% of the MI parts of whatever architecture. From then on, being able to profile the kernel on other architectures has very little interest. Speaking realistically, probably all the recent software-based kernel profiling was done with DTrace. Yes. So I will proceed. Note that the removal of the kernel gprof implies the removal of kgmon. Just checking: How will it work for ports like vax? When searching for bottlenecks I normally use gprof/kgmon. I don't know anything about DTrace, hence the question. -- Ragge There is no support of DTrace for vax and probably there won't be one. Also probably DTrace is not a final solution per se (DTrace is described as step backwards by people such as Brendan Gregg).. but we are working on better toolchain support to open more possibilities such as XRay. Regarding vax there might be bottlenecks in MD code, but DTrace is a decent one for MI code on supported ports. Hm, so this means that we will be without kernel profiling support at all on non-DTrace architectures? I'm not too happy about that by obvious reasons. It do not work to profile code paths on other architectures, since what takes time is very different. And yes, it is not the MD code that is the case, it's the MI code. I may have missed something, but why remove something that works without replacing it with something new? Only have profiling on a few ports do not sound very clever to me. -- Ragge Evaluating this situation we have to be aware that this description could be reversed and there are ports without meaningful (or any) gprof support. Observing that all the useful profiling is already done with DTrace, we can remove complexity from the kernel with negligible cost. This is not true. Things that you will never notice is a problem on x86 may kill a vax, since there is a large speed factor inbetween. This was true many years ago and is still true. Bottom line: I think it is a bad idea to be without kernel profiling code on vax. -- Ragge
Re: Too many PMC implementations
Den 2018-08-23 kl. 17:03, skrev Maxime Villard: Le 23/08/2018 à 16:28, Anders Magnusson a écrit : Den 2018-08-23 kl. 15:53, skrev Maxime Villard: Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit : On 17.08.2018 17:13, Maxime Villard wrote: Note that I'm talking about the kernel gprof, and not the userland gprof. In terms of kernel profiling, it's not nonsensical to say that since we support ARM and x86 in tprof, we can cover 99% of the MI parts of whatever architecture. From then on, being able to profile the kernel on other architectures has very little interest. Speaking realistically, probably all the recent software-based kernel profiling was done with DTrace. Yes. So I will proceed. Note that the removal of the kernel gprof implies the removal of kgmon. Just checking: How will it work for ports like vax? When searching for bottlenecks I normally use gprof/kgmon. I don't know anything about DTrace, hence the question. It looks like there will be no replacement. Are you sure this is really kgmon? Because as far as I can tell, in many architectures GPROF is just dead code that either doesn't compile or doesn't have effect (missing opt_gprof.h, but I did add it in February of this year in the MI parts, so it was likely even more broken before). I have used it not long ago for vax. Maybe I did have to do some tweaks, do not remember, but I really want to be able to use kernel profiling on vax. So, I really oppose removing it and leaving vax without any kernel profiling choice. -- Ragge
Re: Too many PMC implementations
On 23.08.2018 16:59, Anders Magnusson wrote: > Den 2018-08-23 kl. 16:48, skrev Kamil Rytarowski: >> On 23.08.2018 16:28, Anders Magnusson wrote: >>> Den 2018-08-23 kl. 15:53, skrev Maxime Villard: Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit : > On 17.08.2018 17:13, Maxime Villard wrote: >> Note that I'm talking about the kernel gprof, and not the userland >> gprof. >> In terms of kernel profiling, it's not nonsensical to say that >> since we >> support ARM and x86 in tprof, we can cover 99% of the MI parts of >> whatever architecture. From then on, being able to profile the >> kernel on >> other architectures has very little interest. > Speaking realistically, probably all the recent software-based kernel > profiling was done with DTrace. Yes. So I will proceed. Note that the removal of the kernel gprof implies the removal of kgmon. >>> Just checking: How will it work for ports like vax? >>> When searching for bottlenecks I normally use gprof/kgmon. I don't know >>> anything about DTrace, hence the question. >>> >>> -- Ragge >> There is no support of DTrace for vax and probably there won't be one. >> Also probably DTrace is not a final solution per se (DTrace is described >> as step backwards by people such as Brendan Gregg).. but we are working >> on better toolchain support to open more possibilities such as XRay. >> >> Regarding vax there might be bottlenecks in MD code, but DTrace is a >> decent one for MI code on supported ports. > Hm, so this means that we will be without kernel profiling support at > all on non-DTrace architectures? > I'm not too happy about that by obvious reasons. > > It do not work to profile code paths on other architectures, since what > takes time is very different. > And yes, it is not the MD code that is the case, it's the MI code. > > I may have missed something, but why remove something that works without > replacing it with something new? > Only have profiling on a few ports do not sound very clever to me. > > -- Ragge > > Evaluating this situation we have to be aware that this description could be reversed and there are ports without meaningful (or any) gprof support. Observing that all the useful profiling is already done with DTrace, we can remove complexity from the kernel with negligible cost. signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
Le 23/08/2018 à 16:28, Anders Magnusson a écrit : Den 2018-08-23 kl. 15:53, skrev Maxime Villard: Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit : On 17.08.2018 17:13, Maxime Villard wrote: Note that I'm talking about the kernel gprof, and not the userland gprof. In terms of kernel profiling, it's not nonsensical to say that since we support ARM and x86 in tprof, we can cover 99% of the MI parts of whatever architecture. From then on, being able to profile the kernel on other architectures has very little interest. Speaking realistically, probably all the recent software-based kernel profiling was done with DTrace. Yes. So I will proceed. Note that the removal of the kernel gprof implies the removal of kgmon. Just checking: How will it work for ports like vax? When searching for bottlenecks I normally use gprof/kgmon. I don't know anything about DTrace, hence the question. It looks like there will be no replacement. Are you sure this is really kgmon? Because as far as I can tell, in many architectures GPROF is just dead code that either doesn't compile or doesn't have effect (missing opt_gprof.h, but I did add it in February of this year in the MI parts, so it was likely even more broken before).
Re: Too many PMC implementations
Den 2018-08-23 kl. 16:48, skrev Kamil Rytarowski: On 23.08.2018 16:28, Anders Magnusson wrote: Den 2018-08-23 kl. 15:53, skrev Maxime Villard: Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit : On 17.08.2018 17:13, Maxime Villard wrote: Note that I'm talking about the kernel gprof, and not the userland gprof. In terms of kernel profiling, it's not nonsensical to say that since we support ARM and x86 in tprof, we can cover 99% of the MI parts of whatever architecture. From then on, being able to profile the kernel on other architectures has very little interest. Speaking realistically, probably all the recent software-based kernel profiling was done with DTrace. Yes. So I will proceed. Note that the removal of the kernel gprof implies the removal of kgmon. Just checking: How will it work for ports like vax? When searching for bottlenecks I normally use gprof/kgmon. I don't know anything about DTrace, hence the question. -- Ragge There is no support of DTrace for vax and probably there won't be one. Also probably DTrace is not a final solution per se (DTrace is described as step backwards by people such as Brendan Gregg).. but we are working on better toolchain support to open more possibilities such as XRay. Regarding vax there might be bottlenecks in MD code, but DTrace is a decent one for MI code on supported ports. Hm, so this means that we will be without kernel profiling support at all on non-DTrace architectures? I'm not too happy about that by obvious reasons. It do not work to profile code paths on other architectures, since what takes time is very different. And yes, it is not the MD code that is the case, it's the MI code. I may have missed something, but why remove something that works without replacing it with something new? Only have profiling on a few ports do not sound very clever to me. -- Ragge
Re: Too many PMC implementations
On 23.08.2018 16:28, Anders Magnusson wrote: > Den 2018-08-23 kl. 15:53, skrev Maxime Villard: >> Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit : >>> On 17.08.2018 17:13, Maxime Villard wrote: Note that I'm talking about the kernel gprof, and not the userland gprof. In terms of kernel profiling, it's not nonsensical to say that since we support ARM and x86 in tprof, we can cover 99% of the MI parts of whatever architecture. From then on, being able to profile the kernel on other architectures has very little interest. >>> >>> Speaking realistically, probably all the recent software-based kernel >>> profiling was done with DTrace. >> >> Yes. So I will proceed. >> >> Note that the removal of the kernel gprof implies the removal of kgmon. > Just checking: How will it work for ports like vax? > When searching for bottlenecks I normally use gprof/kgmon. I don't know > anything about DTrace, hence the question. > > -- Ragge There is no support of DTrace for vax and probably there won't be one. Also probably DTrace is not a final solution per se (DTrace is described as step backwards by people such as Brendan Gregg).. but we are working on better toolchain support to open more possibilities such as XRay. Regarding vax there might be bottlenecks in MD code, but DTrace is a decent one for MI code on supported ports. signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
Den 2018-08-23 kl. 15:53, skrev Maxime Villard: Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit : On 17.08.2018 17:13, Maxime Villard wrote: Note that I'm talking about the kernel gprof, and not the userland gprof. In terms of kernel profiling, it's not nonsensical to say that since we support ARM and x86 in tprof, we can cover 99% of the MI parts of whatever architecture. From then on, being able to profile the kernel on other architectures has very little interest. Speaking realistically, probably all the recent software-based kernel profiling was done with DTrace. Yes. So I will proceed. Note that the removal of the kernel gprof implies the removal of kgmon. Just checking: How will it work for ports like vax? When searching for bottlenecks I normally use gprof/kgmon. I don't know anything about DTrace, hence the question. -- Ragge
Re: Too many PMC implementations
Le 17/08/2018 à 17:42, Kamil Rytarowski a écrit : On 17.08.2018 17:13, Maxime Villard wrote: Note that I'm talking about the kernel gprof, and not the userland gprof. In terms of kernel profiling, it's not nonsensical to say that since we support ARM and x86 in tprof, we can cover 99% of the MI parts of whatever architecture. From then on, being able to profile the kernel on other architectures has very little interest. Speaking realistically, probably all the recent software-based kernel profiling was done with DTrace. Yes. So I will proceed. Note that the removal of the kernel gprof implies the removal of kgmon.
Re: Too many PMC implementations
> On Aug 17, 2018, at 8:42 AM, Kamil Rytarowski wrote: > > Speaking realistically, probably all the recent software-based kernel > profiling was done with DTrace. Yah, I suppose I'm okay will killing off kernel GPROF support ... you can essentially do the same-thing-but-better with an on-cpu flame graph generated from dtrace data. If the lower-tier platforms don't support this properly, the energy should go towards fixing that. -- thorpej
Re: Too many PMC implementations
On 17.08.2018 17:13, Maxime Villard wrote: > Note that I'm talking about the kernel gprof, and not the userland gprof. > In terms of kernel profiling, it's not nonsensical to say that since we > support ARM and x86 in tprof, we can cover 99% of the MI parts of > whatever architecture. From then on, being able to profile the kernel on > other architectures has very little interest. > Speaking realistically, probably all the recent software-based kernel profiling was done with DTrace. signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
Le 17/08/2018 à 16:43, Joerg Sonnenberger a écrit : On Fri, Aug 17, 2018 at 04:20:30PM +0200, Maxime Villard wrote: So no one has any opinion on that? Because in this case I will remove it soon. (Talking about the kernel gprof.) I'm quite reluctant to remove the only sample based profiler we have right now. Esp. since we don't have any infrastructure for counter-based profilers either AFAICT. We do with tprof now, no? Le 17/08/2018 à 16:50, Mouse a écrit : I agree that it would be better to retire gprof in base, because there are more powerful tools now, and also advanced hardware support (PMC, PEBS, ProcessorTrace). ...for ports that _have_ "advanced hardware support", maybe. (And what are the "more powerful tools"? I haven't been following the state of the art in open-source profiling tools.) Yes, basically I was talking about x86. I do know that many architectures support PMCs, but I don't know how precise the events are (etc). The tools were mentioned before, like the linux "perf", which is pretty good. Note that I'm talking about the kernel gprof, and not the userland gprof. In terms of kernel profiling, it's not nonsensical to say that since we support ARM and x86 in tprof, we can cover 99% of the MI parts of whatever architecture. From then on, being able to profile the kernel on other architectures has very little interest. The gprof code is rather shitty and old, I dropped it from the x86 kernels so it's not like I care a lot now, but since I saw the thread I thought I would bring this up.
Re: Too many PMC implementations
>> I agree that it would be better to retire gprof in base, because >> there are more powerful tools now, and also advanced hardware >> support (PMC, PEBS, ProcessorTrace). ...for ports that _have_ "advanced hardware support", maybe. (And what are the "more powerful tools"? I haven't been following the state of the art in open-source profiling tools.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Too many PMC implementations
On Fri, Aug 17, 2018 at 04:20:30PM +0200, Maxime Villard wrote: > Le 10/08/2018 à 11:40, Maxime Villard a écrit : > > I saw the thread [Re: Sample based profiling] on tech-userlevel@, I'm not > > subscribed to this list but I'm answering here because it's related to > > tprof among other things. > > > > I agree that it would be better to retire gprof in base, because there are > > more powerful tools now, and also advanced hardware support (PMC, PEBS, > > ProcessorTrace). > > > > But in particular, it would be nice to retire the "kernel gprof". That is, > > the MD/MI pieces that are surrounded by #ifdef GPROF. This kind of > > profiling is weak, and misses many aspects of execution (branch prediction, > > cache misses, heavy instructions, etc) that are offered by tprof. > > > > I already dropped NENTRY() from x86, so KGPROF is officially not supported > > there anymore. I think it has never worked on amd64. > > So no one has any opinion on that? Because in this case I will remove it > soon. (Talking about the kernel gprof.) I'm quite reluctant to remove the only sample based profiler we have right now. Esp. since we don't have any infrastructure for counter-based profilers either AFAICT. Joerg
Re: Too many PMC implementations
Le 10/08/2018 à 11:40, Maxime Villard a écrit : I saw the thread [Re: Sample based profiling] on tech-userlevel@, I'm not subscribed to this list but I'm answering here because it's related to tprof among other things. I agree that it would be better to retire gprof in base, because there are more powerful tools now, and also advanced hardware support (PMC, PEBS, ProcessorTrace). But in particular, it would be nice to retire the "kernel gprof". That is, the MD/MI pieces that are surrounded by #ifdef GPROF. This kind of profiling is weak, and misses many aspects of execution (branch prediction, cache misses, heavy instructions, etc) that are offered by tprof. I already dropped NENTRY() from x86, so KGPROF is officially not supported there anymore. I think it has never worked on amd64. So no one has any opinion on that? Because in this case I will remove it soon. (Talking about the kernel gprof.)
Re: Too many PMC implementations
I saw the thread [Re: Sample based profiling] on tech-userlevel@, I'm not subscribed to this list but I'm answering here because it's related to tprof among other things. I agree that it would be better to retire gprof in base, because there are more powerful tools now, and also advanced hardware support (PMC, PEBS, ProcessorTrace). But in particular, it would be nice to retire the "kernel gprof". That is, the MD/MI pieces that are surrounded by #ifdef GPROF. This kind of profiling is weak, and misses many aspects of execution (branch prediction, cache misses, heavy instructions, etc) that are offered by tprof. I already dropped NENTRY() from x86, so KGPROF is officially not supported there anymore. I think it has never worked on amd64.
Re: Too many PMC implementations
On Sun, 15 Jul 2018, Maxime Villard wrote: Now I want to move: arch/x86/x86/tprof_pmi.c arch/x86/x86/tprof_amdpmi.c into dev/tprof/tprof_intel.c dev/tprof/tprof_amd.c I guess people are fine? I think it is better to gather all the pieces in one dir. I don't really have an opinion here, but I've just committed a new backend as dev/tprof/tprof_armv8.c. So I guess that's a vote for the latter :) Cheers, Jared
Re: Too many PMC implementations
Le 11/07/2018 à 18:22, Maxime Villard a écrit : Right now we have three (or more?) different implementations for Performance Monitoring Counters: * PMC: this one is MI. It is used only on one ARM model (xscale I think). There used to be an x86 code for it, but it was broken, and I removed it. The implementation comes with libpmc, a library we provide. The code hasn't moved these last 15 years. I don't like this implementation, it is really invasive (see the numerous pmc.h files that are all empty). * X86PMC: this one is MD, and only available for x86. I wrote it myself. The code is small (x86/pmc.c), and functional. The PMCs are system-wide, and retrieved on a per-cpu basis. But this implementation does not support tracking, that is, we get numbers (about the cache misses for example), but we don't know where they happened. * TPROF: this one is MI, but only x86 support is present. TPROF provides the backend needed to support tracking: via a device, that userland can read from, in order to absorb the event samples produced by the kernel. The backend is pretty good, but the frontend (where the user chooses which PMC etc) is inexistent - the CPU/event detection is not there either. The backend is MI (/dev/tprof/tprof.c), and can be used on other architectures. The module already exists to dynamically modload. I think it would be good to: * Remove PMC entirely. Then remove libpmc too. * Merge X86PMC into the x86 part of TPROF. That is to say, into x86/tprof_*. Then remove X86PMC. * Later, maybe, someone will want to add other architectures in TPROF, like all the recent ARMs. Maxime Now I want to move: arch/x86/x86/tprof_pmi.c arch/x86/x86/tprof_amdpmi.c into dev/tprof/tprof_intel.c dev/tprof/tprof_amd.c I guess people are fine? I think it is better to gather all the pieces in one dir.
Re: Too many PMC implementations
On 12.07.2018 08:48, Maxime Villard wrote: > Le 11/07/2018 à 19:49, Kamil Rytarowski a écrit : >> I'm not familiar with the internals myself, but from API point of view, >> something usable for porting rr (https://github.com/mozilla/rr) or even >> Linux perf-top is highly desirable. I treat personally perf-top as a >> gold standard. > > Well, yes, but right now let's first try to have functional internals... I fully understand and appreciate the option to garbage/collect redundant implementations. From a fuzzing point of view, we are researching during GSoC honggfuzz (vs libFuzzer) and it can use aid from performance counters: https://github.com/google/honggfuzz/blob/master/docs/FeedbackDrivenFuzzing.md signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
Le 11/07/2018 à 18:22, Maxime Villard a écrit : Right now we have three (or more?) different implementations for Performance Monitoring Counters: * PMC: this one is MI. It is used only on one ARM model (xscale I think). There used to be an x86 code for it, but it was broken, and I removed it. The implementation comes with libpmc, a library we provide. The code hasn't moved these last 15 years. I don't like this implementation, it is really invasive (see the numerous pmc.h files that are all empty). * X86PMC: this one is MD, and only available for x86. I wrote it myself. The code is small (x86/pmc.c), and functional. The PMCs are system-wide, and retrieved on a per-cpu basis. But this implementation does not support tracking, that is, we get numbers (about the cache misses for example), but we don't know where they happened. * TPROF: this one is MI, but only x86 support is present. TPROF provides the backend needed to support tracking: via a device, that userland can read from, in order to absorb the event samples produced by the kernel. The backend is pretty good, but the frontend (where the user chooses which PMC etc) is inexistent - the CPU/event detection is not there either. The backend is MI (/dev/tprof/tprof.c), and can be used on other architectures. The module already exists to dynamically modload. I think it would be good to: * Remove PMC entirely. Then remove libpmc too. * Merge X86PMC into the x86 part of TPROF. That is to say, into x86/tprof_*. Then remove X86PMC. * Later, maybe, someone will want to add other architectures in TPROF, like all the recent ARMs. Maxime So, I've prepared a patch. It removes "options PERFCTRS", all the pmc.h files, the kernel sys_pmc.c, the man pages, and the PMC code of ARM XSCALE. Other ARMs have their own small PMC code, but it is used in the MI code, and not from the outside. These ones are obviously not removed. The x86 code is reordered not to rely on the legacy pmc.h file (which I recycled to put the definitions for X86PMC). Will commit soon...
Re: Too many PMC implementations
Le 11/07/2018 à 19:49, Kamil Rytarowski a écrit : I'm not familiar with the internals myself, but from API point of view, something usable for porting rr (https://github.com/mozilla/rr) or even Linux perf-top is highly desirable. I treat personally perf-top as a gold standard. Well, yes, but right now let's first try to have functional internals...
Re: Too many PMC implementations
On 11.07.2018 18:22, Maxime Villard wrote: > Right now we have three (or more?) different implementations for > Performance > Monitoring Counters: > > * PMC: this one is MI. It is used only on one ARM model (xscale I think). > There used to be an x86 code for it, but it was broken, and I removed > it. > The implementation comes with libpmc, a library we provide. The code > hasn't moved these last 15 years. I don't like this implementation, > it is > really invasive (see the numerous pmc.h files that are all empty). > > * X86PMC: this one is MD, and only available for x86. I wrote it myself. > The code is small (x86/pmc.c), and functional. The PMCs are system-wide, > and retrieved on a per-cpu basis. But this implementation does not > support tracking, that is, we get numbers (about the cache misses for > example), but we don't know where they happened. > > * TPROF: this one is MI, but only x86 support is present. TPROF provides > the backend needed to support tracking: via a device, that userland can > read from, in order to absorb the event samples produced by the kernel. > The backend is pretty good, but the frontend (where the user chooses > which PMC etc) is inexistent - the CPU/event detection is not there > either. The backend is MI (/dev/tprof/tprof.c), and can be used on other > architectures. The module already exists to dynamically modload. > > I think it would be good to: > > * Remove PMC entirely. Then remove libpmc too. > > * Merge X86PMC into the x86 part of TPROF. That is to say, into > x86/tprof_*. Then remove X86PMC. > > * Later, maybe, someone will want to add other architectures in TPROF, > like > all the recent ARMs. > > Maxime I'm not familiar with the internals myself, but from API point of view, something usable for porting rr (https://github.com/mozilla/rr) or even Linux perf-top is highly desirable. I treat personally perf-top as a gold standard. signature.asc Description: OpenPGP digital signature
Re: Too many PMC implementations
Speaking as someone who was peripherally involved in the PMC flavor below, I have no objections to this. > On Jul 11, 2018, at 9:22 AM, Maxime Villard wrote: > > Right now we have three (or more?) different implementations for Performance > Monitoring Counters: > > * PMC: this one is MI. It is used only on one ARM model (xscale I think). > There used to be an x86 code for it, but it was broken, and I removed it. > The implementation comes with libpmc, a library we provide. The code > hasn't moved these last 15 years. I don't like this implementation, it is > really invasive (see the numerous pmc.h files that are all empty). > > * X86PMC: this one is MD, and only available for x86. I wrote it myself. > The code is small (x86/pmc.c), and functional. The PMCs are system-wide, > and retrieved on a per-cpu basis. But this implementation does not > support tracking, that is, we get numbers (about the cache misses for > example), but we don't know where they happened. > > * TPROF: this one is MI, but only x86 support is present. TPROF provides > the backend needed to support tracking: via a device, that userland can > read from, in order to absorb the event samples produced by the kernel. > The backend is pretty good, but the frontend (where the user chooses > which PMC etc) is inexistent - the CPU/event detection is not there > either. The backend is MI (/dev/tprof/tprof.c), and can be used on other > architectures. The module already exists to dynamically modload. > > I think it would be good to: > > * Remove PMC entirely. Then remove libpmc too. > > * Merge X86PMC into the x86 part of TPROF. That is to say, into > x86/tprof_*. Then remove X86PMC. > > * Later, maybe, someone will want to add other architectures in TPROF, like > all the recent ARMs. > > Maxime -- thorpej