Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Tuesday, September 5, 2017 5:40:58 AM CEST Andi Kleen wrote: > > The cycle value gets associated with a sample via it's period value, which > > is used by `perf report` in the analysis. If I get a single "broken" > > sample with > > I always thought it just used the number of samples? No, that is not the case. It uses the cycle period as "weight" by default. Note also the recent patch set for `perf annotate` by Taeung Song which adds the functionality to switch between sample or period fractions. I'm actually not sure whether that exists for `perf report` yet. In some situations where the cycle period values associated with the samples are correct, I actually also saw how this is useful: I have seen perf data files, where tons of samples got recorded around syscall entry/exit, but most samples only had tiny cycle values associated with them. If one only looks at number of samples, then it would look like the syscalls are expensive. While in reality, way more cycles are executed in userspace. This issue was/is apparent when looking at `perf report` values vs. the FlameGraph visualization created by the normal `stackcollapse-perf.pl` script. The latter does only look at the number of samples, the former takes the sample period value. Hotspot also used the former approach, but then I adapted perf's behavior. > > a cycle count of, say 1E14 and then a million other samples, each with > > "sane" cycle counts of let's say 1E5, then the one broken sample will > > hold 50% of the total amount of measured cycles. So perf report will show > > that the function where the broken sample points to will have a cost of > > 50%. > > I don't think I've seen such a situation. Did you? I have seen this situation. This is what this current revival of this thread is all about. Without such issues, I wouldn't claim it's such a serious issue. > BTW I'm not arguing against fixing it, but typically I just > recommend to avoid frequency mode. The fast sampling at the beginning > has caused a range of low level PMU bugs and it is hard to reason about > because of its complex behavior. Also it has no protection against > synchronizing with repeated patterns in the execution, which > can cause bad shadowing effects. If you use the Intel > event aliases they have all sensible periods set by default. I think we both agree that the frequency mode as-is should not be the default. But it is, and this is a serious issue in my opinion. We will need to find a sensible default for the event period and use that mode by default. I nowadays always add something like `-c 300` which gives me roughly 1k samples per second on a ~3GHz machine. It's just a ballpark figure and of course gets influenced by frequency scaling, but it's good enough for me. We could use a similar approach to find a period based on the max CPU clock rate automatically. But of course that would only work for cycles, and not for instructions or any of the other fancy event counters. But since the frequency mode is probably similarly broken there, it should not be the default. Better ask the user for explicit values rather than doing something automatically which can lead to broken results. Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts smime.p7s Description: S/MIME cryptographic signature
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Tuesday, September 5, 2017 5:40:58 AM CEST Andi Kleen wrote: > > The cycle value gets associated with a sample via it's period value, which > > is used by `perf report` in the analysis. If I get a single "broken" > > sample with > > I always thought it just used the number of samples? No, that is not the case. It uses the cycle period as "weight" by default. Note also the recent patch set for `perf annotate` by Taeung Song which adds the functionality to switch between sample or period fractions. I'm actually not sure whether that exists for `perf report` yet. In some situations where the cycle period values associated with the samples are correct, I actually also saw how this is useful: I have seen perf data files, where tons of samples got recorded around syscall entry/exit, but most samples only had tiny cycle values associated with them. If one only looks at number of samples, then it would look like the syscalls are expensive. While in reality, way more cycles are executed in userspace. This issue was/is apparent when looking at `perf report` values vs. the FlameGraph visualization created by the normal `stackcollapse-perf.pl` script. The latter does only look at the number of samples, the former takes the sample period value. Hotspot also used the former approach, but then I adapted perf's behavior. > > a cycle count of, say 1E14 and then a million other samples, each with > > "sane" cycle counts of let's say 1E5, then the one broken sample will > > hold 50% of the total amount of measured cycles. So perf report will show > > that the function where the broken sample points to will have a cost of > > 50%. > > I don't think I've seen such a situation. Did you? I have seen this situation. This is what this current revival of this thread is all about. Without such issues, I wouldn't claim it's such a serious issue. > BTW I'm not arguing against fixing it, but typically I just > recommend to avoid frequency mode. The fast sampling at the beginning > has caused a range of low level PMU bugs and it is hard to reason about > because of its complex behavior. Also it has no protection against > synchronizing with repeated patterns in the execution, which > can cause bad shadowing effects. If you use the Intel > event aliases they have all sensible periods set by default. I think we both agree that the frequency mode as-is should not be the default. But it is, and this is a serious issue in my opinion. We will need to find a sensible default for the event period and use that mode by default. I nowadays always add something like `-c 300` which gives me roughly 1k samples per second on a ~3GHz machine. It's just a ballpark figure and of course gets influenced by frequency scaling, but it's good enough for me. We could use a similar approach to find a period based on the max CPU clock rate automatically. But of course that would only work for cycles, and not for instructions or any of the other fancy event counters. But since the frequency mode is probably similarly broken there, it should not be the default. Better ask the user for explicit values rather than doing something automatically which can lead to broken results. Cheers -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts smime.p7s Description: S/MIME cryptographic signature
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
> The cycle value gets associated with a sample via it's period value, which is > used by `perf report` in the analysis. If I get a single "broken" sample with I always thought it just used the number of samples? > a cycle count of, say 1E14 and then a million other samples, each with "sane" > cycle counts of let's say 1E5, then the one broken sample will hold 50% of > the > total amount of measured cycles. So perf report will show that the function > where the broken sample points to will have a cost of 50%. I don't think I've seen such a situation. Did you? BTW I'm not arguing against fixing it, but typically I just recommend to avoid frequency mode. The fast sampling at the beginning has caused a range of low level PMU bugs and it is hard to reason about because of its complex behavior. Also it has no protection against synchronizing with repeated patterns in the execution, which can cause bad shadowing effects. If you use the Intel event aliases they have all sensible periods set by default. -Andi
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
> The cycle value gets associated with a sample via it's period value, which is > used by `perf report` in the analysis. If I get a single "broken" sample with I always thought it just used the number of samples? > a cycle count of, say 1E14 and then a million other samples, each with "sane" > cycle counts of let's say 1E5, then the one broken sample will hold 50% of > the > total amount of measured cycles. So perf report will show that the function > where the broken sample points to will have a cost of 50%. I don't think I've seen such a situation. Did you? BTW I'm not arguing against fixing it, but typically I just recommend to avoid frequency mode. The fast sampling at the beginning has caused a range of low level PMU bugs and it is hard to reason about because of its complex behavior. Also it has no protection against synchronizing with repeated patterns in the execution, which can cause bad shadowing effects. If you use the Intel event aliases they have all sensible periods set by default. -Andi
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Friday, September 1, 2017 6:48:20 PM CEST Andi Kleen wrote: > Milian Wolffwrites: > > do you have any input on this issue? I really wonder why noone else is > > complaining about the frequency mode being unreliable or right out broken > > in many situations. Since it's the default mode, I think this urgently > > needs to be investigated - it makes perf unusable for a large group of > > users who want to use it but don't know about `-c N` as a workaround... > > It's likely related due to the frequency algorithm starting with 0. So > at the beginning the samples are very fast (like 1 cycle) and likely > something breaks down in perf or your frequency calculation for very > short samples. > > Also for inherited events this happens on every fork. If you > trace fork events too you'll likely see it correlated. > If you use -a and disable inheritance (no-inherit=1) it will > also likely be only at the beginning. > > However I fail to see what it would actually break. The frequency > just needs to be roughly accurate over the whole measurement > period to get good sampling coverage. But there's nothing > in the profile that uses the actual frequency. It's just a means > to get samples, not a measurement by itself. The cycle value gets associated with a sample via it's period value, which is used by `perf report` in the analysis. If I get a single "broken" sample with a cycle count of, say 1E14 and then a million other samples, each with "sane" cycle counts of let's say 1E5, then the one broken sample will hold 50% of the total amount of measured cycles. So perf report will show that the function where the broken sample points to will have a cost of 50%. To me, this is clearly a really big issue. Don't you think so too? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts smime.p7s Description: S/MIME cryptographic signature
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Friday, September 1, 2017 6:48:20 PM CEST Andi Kleen wrote: > Milian Wolff writes: > > do you have any input on this issue? I really wonder why noone else is > > complaining about the frequency mode being unreliable or right out broken > > in many situations. Since it's the default mode, I think this urgently > > needs to be investigated - it makes perf unusable for a large group of > > users who want to use it but don't know about `-c N` as a workaround... > > It's likely related due to the frequency algorithm starting with 0. So > at the beginning the samples are very fast (like 1 cycle) and likely > something breaks down in perf or your frequency calculation for very > short samples. > > Also for inherited events this happens on every fork. If you > trace fork events too you'll likely see it correlated. > If you use -a and disable inheritance (no-inherit=1) it will > also likely be only at the beginning. > > However I fail to see what it would actually break. The frequency > just needs to be roughly accurate over the whole measurement > period to get good sampling coverage. But there's nothing > in the profile that uses the actual frequency. It's just a means > to get samples, not a measurement by itself. The cycle value gets associated with a sample via it's period value, which is used by `perf report` in the analysis. If I get a single "broken" sample with a cycle count of, say 1E14 and then a million other samples, each with "sane" cycle counts of let's say 1E5, then the one broken sample will hold 50% of the total amount of measured cycles. So perf report will show that the function where the broken sample points to will have a cost of 50%. To me, this is clearly a really big issue. Don't you think so too? Thanks -- Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts smime.p7s Description: S/MIME cryptographic signature
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
Milian Wolffwrites: > > do you have any input on this issue? I really wonder why noone else is > complaining about the frequency mode being unreliable or right out broken in > many situations. Since it's the default mode, I think this urgently needs to > be investigated - it makes perf unusable for a large group of users who want > to use it but don't know about `-c N` as a workaround... It's likely related due to the frequency algorithm starting with 0. So at the beginning the samples are very fast (like 1 cycle) and likely something breaks down in perf or your frequency calculation for very short samples. Also for inherited events this happens on every fork. If you trace fork events too you'll likely see it correlated. If you use -a and disable inheritance (no-inherit=1) it will also likely be only at the beginning. However I fail to see what it would actually break. The frequency just needs to be roughly accurate over the whole measurement period to get good sampling coverage. But there's nothing in the profile that uses the actual frequency. It's just a means to get samples, not a measurement by itself. -Andi
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
Milian Wolff writes: > > do you have any input on this issue? I really wonder why noone else is > complaining about the frequency mode being unreliable or right out broken in > many situations. Since it's the default mode, I think this urgently needs to > be investigated - it makes perf unusable for a large group of users who want > to use it but don't know about `-c N` as a workaround... It's likely related due to the frequency algorithm starting with 0. So at the beginning the samples are very fast (like 1 cycle) and likely something breaks down in perf or your frequency calculation for very short samples. Also for inherited events this happens on every fork. If you trace fork events too you'll likely see it correlated. If you use -a and disable inheritance (no-inherit=1) it will also likely be only at the beginning. However I fail to see what it would actually break. The frequency just needs to be roughly accurate over the whole measurement period to get good sampling coverage. But there's nothing in the profile that uses the actual frequency. It's just a means to get samples, not a measurement by itself. -Andi
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Montag, 28. August 2017 19:28:08 CEST Andi Kleen wrote: > Adding Peter. > > The original thread starts at > https://www.spinics.net/lists/linux-perf-users/msg03486.html Peter, Arnaldo, do you have any input on this issue? I really wonder why noone else is complaining about the frequency mode being unreliable or right out broken in many situations. Since it's the default mode, I think this urgently needs to be investigated - it makes perf unusable for a large group of users who want to use it but don't know about `-c N` as a workaround... > On Mon, Aug 28, 2017 at 04:40:43PM +0200, Milian Wolff wrote: > > On Monday, August 28, 2017 4:08:47 PM CEST Milian Wolff wrote: > > > On Sunday, June 18, 2017 9:53:05 PM CEST Milian Wolff wrote: > > > > On Sonntag, 18. Juni 2017 06:22:19 CEST Andi Kleen wrote: > > > > > Milian Wolffwrites: > > > > > > But when I look at the naively calculated first derivative, to > > > > > > visualize > > > > > > CPU load, i.e. CPU clock rate in Hz, then things start to become > > > > > > somewhat > > > > > > confusing: > > > > > > > > > > > > > > > > > > perf script -F time,period | awk 'BEGIN {lastTime = -1;} { time = > > > > > > $1 + > > > > > > 0.0; if (lastTime != -1) {printf("%.6f\t%f\n", time, $2 / (time - > > > > > > lastTime));} lastTime = time; }' | gnuplot -p -e "plot '-' with > > > > > > linespoints" > > > > > > > > > > > > > > > > The perf time stamps approach the maximum precision of double (12 vs > > > > > 15 digits). Likely the division loses too many digits, which may > > > > > cause > > > > > the bogus results. I've ran into similar problems before. > > > > > > > > I don't think so, just look at the raw values: > > > > > > > > $ perf script -F time,period --ns > > > > 71789.438122347: 1 > > > > 71789.438127160: 1 > > > > 71789.438129599: 7 > > > > 71789.438131844: 94 > > > > 71789.438134282: 1391 > > > > 71789.438139871: 19106 > > > > 71789.438156426: 123336 > > > > ... > > > > > > > > $ qalc '123336/(71789.438156426s - 71789.438139871s) to Hz' > > > > 123336 / ((71789.438 * second) - (71789.438 * second)) = approx. > > > > 7.4500755E9 Hz > > > > > > > > > One way around is is to normalize the time stamps first that they > > > > > start with 0, but this only works for shorter traces. > > > > > Or use some bignum float library > > > > > > > > I take the time delta between two samples, so a normalization of the > > > > individual times to 0 would not affect my calculations - the delta > > > > stays > > > > the same after all. > > > > > > > > Also, using bignum in my calculations wouldn't change anything either. > > > > If > > > > perf tells me that 123336 cycles have been executed in 16.555 us, then > > > > it > > > > will always be larger than any expected value. At 3.2GHz it should be > > > > maximally 52976 cycles in such a short timeframe... > > > > > > > > > Also at the beginning of frequency the periods are very small, and > > > > > the default us resolution will give big jumps for such a > > > > > calculation. > > > > > > > > OK, but who/what measures the large cycle values then? Is this a PMU > > > > limitation? Or is this an issue with the interaction with the kernel, > > > > when > > > > the algorithm tries to find a good frequency at the beginning? > > > > > > > > > It's better to use the script --ns option then, but that makes the > > > > > double precision problem event worse. > > > > > > > > See above, using `--ns` doesn't change anything. And qalc e.g. already > > > > uses > > > > bignum internally. > > > > > > > > > In generally you get better results by avoiding frequency mode, > > > > > but always specify a fixed period. > > > > > > > > This indeed removes the spikes at the beginning: > > > > > > > > perf record --switch-events --call-graph dwarf -P -c 50 > > > > > > > > The value is chosen to give a similar sample count to frequency mode. > > > > > > Hey all, > > > > > > I want to revive the above discussion as I'm still completely puzzled by > > > the observation. The tl;dr; for those who have not followed the > > > previous discussion: > > > > > > perf record in frequency mode (i.e. "record ~1000 samples per second") > > > sometimes reports excessively large cycle counts. > > > > > > In the previous mails I have outlined how to visualize this issue > > > graphically with gnuplot, by drawing the first derivative of the cycles > > > over time which gives nicely comparable numbers to ones CPU clock speed. > > > > > > Sometimes, this value goe up to 10Ghz and beyond. Sometimes the values > > > are > > > so broken (i.e. so high), that they completely break the analysis with > > > perf > > > report or similar, as they completely skew the total event count and > > > thereby drastically influence the fractional cost reported by perf. E.g. > > > just now I ran `perf record` on another application and got this result: > > > > > > ~ > >
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Montag, 28. August 2017 19:28:08 CEST Andi Kleen wrote: > Adding Peter. > > The original thread starts at > https://www.spinics.net/lists/linux-perf-users/msg03486.html Peter, Arnaldo, do you have any input on this issue? I really wonder why noone else is complaining about the frequency mode being unreliable or right out broken in many situations. Since it's the default mode, I think this urgently needs to be investigated - it makes perf unusable for a large group of users who want to use it but don't know about `-c N` as a workaround... > On Mon, Aug 28, 2017 at 04:40:43PM +0200, Milian Wolff wrote: > > On Monday, August 28, 2017 4:08:47 PM CEST Milian Wolff wrote: > > > On Sunday, June 18, 2017 9:53:05 PM CEST Milian Wolff wrote: > > > > On Sonntag, 18. Juni 2017 06:22:19 CEST Andi Kleen wrote: > > > > > Milian Wolff writes: > > > > > > But when I look at the naively calculated first derivative, to > > > > > > visualize > > > > > > CPU load, i.e. CPU clock rate in Hz, then things start to become > > > > > > somewhat > > > > > > confusing: > > > > > > > > > > > > > > > > > > perf script -F time,period | awk 'BEGIN {lastTime = -1;} { time = > > > > > > $1 + > > > > > > 0.0; if (lastTime != -1) {printf("%.6f\t%f\n", time, $2 / (time - > > > > > > lastTime));} lastTime = time; }' | gnuplot -p -e "plot '-' with > > > > > > linespoints" > > > > > > > > > > > > > > > > The perf time stamps approach the maximum precision of double (12 vs > > > > > 15 digits). Likely the division loses too many digits, which may > > > > > cause > > > > > the bogus results. I've ran into similar problems before. > > > > > > > > I don't think so, just look at the raw values: > > > > > > > > $ perf script -F time,period --ns > > > > 71789.438122347: 1 > > > > 71789.438127160: 1 > > > > 71789.438129599: 7 > > > > 71789.438131844: 94 > > > > 71789.438134282: 1391 > > > > 71789.438139871: 19106 > > > > 71789.438156426: 123336 > > > > ... > > > > > > > > $ qalc '123336/(71789.438156426s - 71789.438139871s) to Hz' > > > > 123336 / ((71789.438 * second) - (71789.438 * second)) = approx. > > > > 7.4500755E9 Hz > > > > > > > > > One way around is is to normalize the time stamps first that they > > > > > start with 0, but this only works for shorter traces. > > > > > Or use some bignum float library > > > > > > > > I take the time delta between two samples, so a normalization of the > > > > individual times to 0 would not affect my calculations - the delta > > > > stays > > > > the same after all. > > > > > > > > Also, using bignum in my calculations wouldn't change anything either. > > > > If > > > > perf tells me that 123336 cycles have been executed in 16.555 us, then > > > > it > > > > will always be larger than any expected value. At 3.2GHz it should be > > > > maximally 52976 cycles in such a short timeframe... > > > > > > > > > Also at the beginning of frequency the periods are very small, and > > > > > the default us resolution will give big jumps for such a > > > > > calculation. > > > > > > > > OK, but who/what measures the large cycle values then? Is this a PMU > > > > limitation? Or is this an issue with the interaction with the kernel, > > > > when > > > > the algorithm tries to find a good frequency at the beginning? > > > > > > > > > It's better to use the script --ns option then, but that makes the > > > > > double precision problem event worse. > > > > > > > > See above, using `--ns` doesn't change anything. And qalc e.g. already > > > > uses > > > > bignum internally. > > > > > > > > > In generally you get better results by avoiding frequency mode, > > > > > but always specify a fixed period. > > > > > > > > This indeed removes the spikes at the beginning: > > > > > > > > perf record --switch-events --call-graph dwarf -P -c 50 > > > > > > > > The value is chosen to give a similar sample count to frequency mode. > > > > > > Hey all, > > > > > > I want to revive the above discussion as I'm still completely puzzled by > > > the observation. The tl;dr; for those who have not followed the > > > previous discussion: > > > > > > perf record in frequency mode (i.e. "record ~1000 samples per second") > > > sometimes reports excessively large cycle counts. > > > > > > In the previous mails I have outlined how to visualize this issue > > > graphically with gnuplot, by drawing the first derivative of the cycles > > > over time which gives nicely comparable numbers to ones CPU clock speed. > > > > > > Sometimes, this value goe up to 10Ghz and beyond. Sometimes the values > > > are > > > so broken (i.e. so high), that they completely break the analysis with > > > perf > > > report or similar, as they completely skew the total event count and > > > thereby drastically influence the fractional cost reported by perf. E.g. > > > just now I ran `perf record` on another application and got this result: > > > > > > ~ > > > $ perf script | grep
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
Adding Peter. The original thread starts at https://www.spinics.net/lists/linux-perf-users/msg03486.html On Mon, Aug 28, 2017 at 04:40:43PM +0200, Milian Wolff wrote: > On Monday, August 28, 2017 4:08:47 PM CEST Milian Wolff wrote: > > On Sunday, June 18, 2017 9:53:05 PM CEST Milian Wolff wrote: > > > On Sonntag, 18. Juni 2017 06:22:19 CEST Andi Kleen wrote: > > > > Milian Wolffwrites: > > > > > But when I look at the naively calculated first derivative, to > > > > > visualize > > > > > CPU load, i.e. CPU clock rate in Hz, then things start to become > > > > > somewhat > > > > > confusing: > > > > > > > > > > > > > > > perf script -F time,period | awk 'BEGIN {lastTime = -1;} { time = $1 + > > > > > 0.0; if (lastTime != -1) {printf("%.6f\t%f\n", time, $2 / (time - > > > > > lastTime));} lastTime = time; }' | gnuplot -p -e "plot '-' with > > > > > linespoints" > > > > > > > > > > > > > The perf time stamps approach the maximum precision of double (12 vs > > > > 15 digits). Likely the division loses too many digits, which may cause > > > > the bogus results. I've ran into similar problems before. > > > > > > I don't think so, just look at the raw values: > > > > > > $ perf script -F time,period --ns > > > 71789.438122347: 1 > > > 71789.438127160: 1 > > > 71789.438129599: 7 > > > 71789.438131844: 94 > > > 71789.438134282: 1391 > > > 71789.438139871: 19106 > > > 71789.438156426: 123336 > > > ... > > > > > > $ qalc '123336/(71789.438156426s - 71789.438139871s) to Hz' > > > 123336 / ((71789.438 * second) - (71789.438 * second)) = approx. > > > 7.4500755E9 Hz > > > > > > > One way around is is to normalize the time stamps first that they > > > > start with 0, but this only works for shorter traces. > > > > Or use some bignum float library > > > > > > I take the time delta between two samples, so a normalization of the > > > individual times to 0 would not affect my calculations - the delta stays > > > the same after all. > > > > > > Also, using bignum in my calculations wouldn't change anything either. If > > > perf tells me that 123336 cycles have been executed in 16.555 us, then it > > > will always be larger than any expected value. At 3.2GHz it should be > > > maximally 52976 cycles in such a short timeframe... > > > > > > > Also at the beginning of frequency the periods are very small, and > > > > the default us resolution will give big jumps for such a calculation. > > > > > > OK, but who/what measures the large cycle values then? Is this a PMU > > > limitation? Or is this an issue with the interaction with the kernel, when > > > the algorithm tries to find a good frequency at the beginning? > > > > > > > It's better to use the script --ns option then, but that makes the > > > > double precision problem event worse. > > > > > > See above, using `--ns` doesn't change anything. And qalc e.g. already > > > uses > > > bignum internally. > > > > > > > In generally you get better results by avoiding frequency mode, > > > > but always specify a fixed period. > > > > > > This indeed removes the spikes at the beginning: > > > > > > perf record --switch-events --call-graph dwarf -P -c 50 > > > > > > The value is chosen to give a similar sample count to frequency mode. > > > > Hey all, > > > > I want to revive the above discussion as I'm still completely puzzled by the > > observation. The tl;dr; for those who have not followed the previous > > discussion: > > > > perf record in frequency mode (i.e. "record ~1000 samples per second") > > sometimes reports excessively large cycle counts. > > > > In the previous mails I have outlined how to visualize this issue > > graphically with gnuplot, by drawing the first derivative of the cycles > > over time which gives nicely comparable numbers to ones CPU clock speed. > > > > Sometimes, this value goe up to 10Ghz and beyond. Sometimes the values are > > so broken (i.e. so high), that they completely break the analysis with perf > > report or similar, as they completely skew the total event count and > > thereby drastically influence the fractional cost reported by perf. E.g. > > just now I ran `perf record` on another application and got this result: > > > > ~ > > $ perf script | grep page_remove_rmap -C 10 > > QXcbEventReader 23866 605019.879412: 128193 cycles:ppp: > > bb23fbf5 __fget_light (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > > lab_mandelbrot 23865 605019.879469: 1 cycles:ppp: > > bb064504 native_write_msr > > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > > QXcbEventReader 23866 605019.879471:1810360 cycles:ppp: > > bb1cb4ec find_vma (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > > lab_mandelbrot 23865 605019.879472: 1 cycles:ppp: > > bb064504 native_write_msr > > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > > lab_mandelbrot 23865 605019.879474: 10 cycles:ppp: > >
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
Adding Peter. The original thread starts at https://www.spinics.net/lists/linux-perf-users/msg03486.html On Mon, Aug 28, 2017 at 04:40:43PM +0200, Milian Wolff wrote: > On Monday, August 28, 2017 4:08:47 PM CEST Milian Wolff wrote: > > On Sunday, June 18, 2017 9:53:05 PM CEST Milian Wolff wrote: > > > On Sonntag, 18. Juni 2017 06:22:19 CEST Andi Kleen wrote: > > > > Milian Wolff writes: > > > > > But when I look at the naively calculated first derivative, to > > > > > visualize > > > > > CPU load, i.e. CPU clock rate in Hz, then things start to become > > > > > somewhat > > > > > confusing: > > > > > > > > > > > > > > > perf script -F time,period | awk 'BEGIN {lastTime = -1;} { time = $1 + > > > > > 0.0; if (lastTime != -1) {printf("%.6f\t%f\n", time, $2 / (time - > > > > > lastTime));} lastTime = time; }' | gnuplot -p -e "plot '-' with > > > > > linespoints" > > > > > > > > > > > > > The perf time stamps approach the maximum precision of double (12 vs > > > > 15 digits). Likely the division loses too many digits, which may cause > > > > the bogus results. I've ran into similar problems before. > > > > > > I don't think so, just look at the raw values: > > > > > > $ perf script -F time,period --ns > > > 71789.438122347: 1 > > > 71789.438127160: 1 > > > 71789.438129599: 7 > > > 71789.438131844: 94 > > > 71789.438134282: 1391 > > > 71789.438139871: 19106 > > > 71789.438156426: 123336 > > > ... > > > > > > $ qalc '123336/(71789.438156426s - 71789.438139871s) to Hz' > > > 123336 / ((71789.438 * second) - (71789.438 * second)) = approx. > > > 7.4500755E9 Hz > > > > > > > One way around is is to normalize the time stamps first that they > > > > start with 0, but this only works for shorter traces. > > > > Or use some bignum float library > > > > > > I take the time delta between two samples, so a normalization of the > > > individual times to 0 would not affect my calculations - the delta stays > > > the same after all. > > > > > > Also, using bignum in my calculations wouldn't change anything either. If > > > perf tells me that 123336 cycles have been executed in 16.555 us, then it > > > will always be larger than any expected value. At 3.2GHz it should be > > > maximally 52976 cycles in such a short timeframe... > > > > > > > Also at the beginning of frequency the periods are very small, and > > > > the default us resolution will give big jumps for such a calculation. > > > > > > OK, but who/what measures the large cycle values then? Is this a PMU > > > limitation? Or is this an issue with the interaction with the kernel, when > > > the algorithm tries to find a good frequency at the beginning? > > > > > > > It's better to use the script --ns option then, but that makes the > > > > double precision problem event worse. > > > > > > See above, using `--ns` doesn't change anything. And qalc e.g. already > > > uses > > > bignum internally. > > > > > > > In generally you get better results by avoiding frequency mode, > > > > but always specify a fixed period. > > > > > > This indeed removes the spikes at the beginning: > > > > > > perf record --switch-events --call-graph dwarf -P -c 50 > > > > > > The value is chosen to give a similar sample count to frequency mode. > > > > Hey all, > > > > I want to revive the above discussion as I'm still completely puzzled by the > > observation. The tl;dr; for those who have not followed the previous > > discussion: > > > > perf record in frequency mode (i.e. "record ~1000 samples per second") > > sometimes reports excessively large cycle counts. > > > > In the previous mails I have outlined how to visualize this issue > > graphically with gnuplot, by drawing the first derivative of the cycles > > over time which gives nicely comparable numbers to ones CPU clock speed. > > > > Sometimes, this value goe up to 10Ghz and beyond. Sometimes the values are > > so broken (i.e. so high), that they completely break the analysis with perf > > report or similar, as they completely skew the total event count and > > thereby drastically influence the fractional cost reported by perf. E.g. > > just now I ran `perf record` on another application and got this result: > > > > ~ > > $ perf script | grep page_remove_rmap -C 10 > > QXcbEventReader 23866 605019.879412: 128193 cycles:ppp: > > bb23fbf5 __fget_light (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > > lab_mandelbrot 23865 605019.879469: 1 cycles:ppp: > > bb064504 native_write_msr > > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > > QXcbEventReader 23866 605019.879471:1810360 cycles:ppp: > > bb1cb4ec find_vma (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > > lab_mandelbrot 23865 605019.879472: 1 cycles:ppp: > > bb064504 native_write_msr > > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > > lab_mandelbrot 23865 605019.879474: 10 cycles:ppp: > > bb064504 native_write_msr >
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Monday, August 28, 2017 4:08:47 PM CEST Milian Wolff wrote: > On Sunday, June 18, 2017 9:53:05 PM CEST Milian Wolff wrote: > > On Sonntag, 18. Juni 2017 06:22:19 CEST Andi Kleen wrote: > > > Milian Wolffwrites: > > > > But when I look at the naively calculated first derivative, to > > > > visualize > > > > CPU load, i.e. CPU clock rate in Hz, then things start to become > > > > somewhat > > > > confusing: > > > > > > > > > > > > perf script -F time,period | awk 'BEGIN {lastTime = -1;} { time = $1 + > > > > 0.0; if (lastTime != -1) {printf("%.6f\t%f\n", time, $2 / (time - > > > > lastTime));} lastTime = time; }' | gnuplot -p -e "plot '-' with > > > > linespoints" > > > > > > > > > > The perf time stamps approach the maximum precision of double (12 vs > > > 15 digits). Likely the division loses too many digits, which may cause > > > the bogus results. I've ran into similar problems before. > > > > I don't think so, just look at the raw values: > > > > $ perf script -F time,period --ns > > 71789.438122347: 1 > > 71789.438127160: 1 > > 71789.438129599: 7 > > 71789.438131844: 94 > > 71789.438134282: 1391 > > 71789.438139871: 19106 > > 71789.438156426: 123336 > > ... > > > > $ qalc '123336/(71789.438156426s - 71789.438139871s) to Hz' > > 123336 / ((71789.438 * second) - (71789.438 * second)) = approx. > > 7.4500755E9 Hz > > > > > One way around is is to normalize the time stamps first that they > > > start with 0, but this only works for shorter traces. > > > Or use some bignum float library > > > > I take the time delta between two samples, so a normalization of the > > individual times to 0 would not affect my calculations - the delta stays > > the same after all. > > > > Also, using bignum in my calculations wouldn't change anything either. If > > perf tells me that 123336 cycles have been executed in 16.555 us, then it > > will always be larger than any expected value. At 3.2GHz it should be > > maximally 52976 cycles in such a short timeframe... > > > > > Also at the beginning of frequency the periods are very small, and > > > the default us resolution will give big jumps for such a calculation. > > > > OK, but who/what measures the large cycle values then? Is this a PMU > > limitation? Or is this an issue with the interaction with the kernel, when > > the algorithm tries to find a good frequency at the beginning? > > > > > It's better to use the script --ns option then, but that makes the > > > double precision problem event worse. > > > > See above, using `--ns` doesn't change anything. And qalc e.g. already > > uses > > bignum internally. > > > > > In generally you get better results by avoiding frequency mode, > > > but always specify a fixed period. > > > > This indeed removes the spikes at the beginning: > > > > perf record --switch-events --call-graph dwarf -P -c 50 > > > > The value is chosen to give a similar sample count to frequency mode. > > Hey all, > > I want to revive the above discussion as I'm still completely puzzled by the > observation. The tl;dr; for those who have not followed the previous > discussion: > > perf record in frequency mode (i.e. "record ~1000 samples per second") > sometimes reports excessively large cycle counts. > > In the previous mails I have outlined how to visualize this issue > graphically with gnuplot, by drawing the first derivative of the cycles > over time which gives nicely comparable numbers to ones CPU clock speed. > > Sometimes, this value goe up to 10Ghz and beyond. Sometimes the values are > so broken (i.e. so high), that they completely break the analysis with perf > report or similar, as they completely skew the total event count and > thereby drastically influence the fractional cost reported by perf. E.g. > just now I ran `perf record` on another application and got this result: > > ~ > $ perf script | grep page_remove_rmap -C 10 > QXcbEventReader 23866 605019.879412: 128193 cycles:ppp: > bb23fbf5 __fget_light (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879469: 1 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > QXcbEventReader 23866 605019.879471:1810360 cycles:ppp: > bb1cb4ec find_vma (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879472: 1 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879474: 10 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879475:216 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879486: 5106 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865
Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Monday, August 28, 2017 4:08:47 PM CEST Milian Wolff wrote: > On Sunday, June 18, 2017 9:53:05 PM CEST Milian Wolff wrote: > > On Sonntag, 18. Juni 2017 06:22:19 CEST Andi Kleen wrote: > > > Milian Wolff writes: > > > > But when I look at the naively calculated first derivative, to > > > > visualize > > > > CPU load, i.e. CPU clock rate in Hz, then things start to become > > > > somewhat > > > > confusing: > > > > > > > > > > > > perf script -F time,period | awk 'BEGIN {lastTime = -1;} { time = $1 + > > > > 0.0; if (lastTime != -1) {printf("%.6f\t%f\n", time, $2 / (time - > > > > lastTime));} lastTime = time; }' | gnuplot -p -e "plot '-' with > > > > linespoints" > > > > > > > > > > The perf time stamps approach the maximum precision of double (12 vs > > > 15 digits). Likely the division loses too many digits, which may cause > > > the bogus results. I've ran into similar problems before. > > > > I don't think so, just look at the raw values: > > > > $ perf script -F time,period --ns > > 71789.438122347: 1 > > 71789.438127160: 1 > > 71789.438129599: 7 > > 71789.438131844: 94 > > 71789.438134282: 1391 > > 71789.438139871: 19106 > > 71789.438156426: 123336 > > ... > > > > $ qalc '123336/(71789.438156426s - 71789.438139871s) to Hz' > > 123336 / ((71789.438 * second) - (71789.438 * second)) = approx. > > 7.4500755E9 Hz > > > > > One way around is is to normalize the time stamps first that they > > > start with 0, but this only works for shorter traces. > > > Or use some bignum float library > > > > I take the time delta between two samples, so a normalization of the > > individual times to 0 would not affect my calculations - the delta stays > > the same after all. > > > > Also, using bignum in my calculations wouldn't change anything either. If > > perf tells me that 123336 cycles have been executed in 16.555 us, then it > > will always be larger than any expected value. At 3.2GHz it should be > > maximally 52976 cycles in such a short timeframe... > > > > > Also at the beginning of frequency the periods are very small, and > > > the default us resolution will give big jumps for such a calculation. > > > > OK, but who/what measures the large cycle values then? Is this a PMU > > limitation? Or is this an issue with the interaction with the kernel, when > > the algorithm tries to find a good frequency at the beginning? > > > > > It's better to use the script --ns option then, but that makes the > > > double precision problem event worse. > > > > See above, using `--ns` doesn't change anything. And qalc e.g. already > > uses > > bignum internally. > > > > > In generally you get better results by avoiding frequency mode, > > > but always specify a fixed period. > > > > This indeed removes the spikes at the beginning: > > > > perf record --switch-events --call-graph dwarf -P -c 50 > > > > The value is chosen to give a similar sample count to frequency mode. > > Hey all, > > I want to revive the above discussion as I'm still completely puzzled by the > observation. The tl;dr; for those who have not followed the previous > discussion: > > perf record in frequency mode (i.e. "record ~1000 samples per second") > sometimes reports excessively large cycle counts. > > In the previous mails I have outlined how to visualize this issue > graphically with gnuplot, by drawing the first derivative of the cycles > over time which gives nicely comparable numbers to ones CPU clock speed. > > Sometimes, this value goe up to 10Ghz and beyond. Sometimes the values are > so broken (i.e. so high), that they completely break the analysis with perf > report or similar, as they completely skew the total event count and > thereby drastically influence the fractional cost reported by perf. E.g. > just now I ran `perf record` on another application and got this result: > > ~ > $ perf script | grep page_remove_rmap -C 10 > QXcbEventReader 23866 605019.879412: 128193 cycles:ppp: > bb23fbf5 __fget_light (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879469: 1 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > QXcbEventReader 23866 605019.879471:1810360 cycles:ppp: > bb1cb4ec find_vma (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879472: 1 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879474: 10 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879475:216 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879486: 5106 cycles:ppp: > bb064504 native_write_msr > (/lib/modules/4.12.8-2-ARCH/build/vmlinux) > lab_mandelbrot 23865 605019.879489:
broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Sunday, June 18, 2017 9:53:05 PM CEST Milian Wolff wrote: > On Sonntag, 18. Juni 2017 06:22:19 CEST Andi Kleen wrote: > > Milian Wolffwrites: > > > But when I look at the naively calculated first derivative, to visualize > > > CPU load, i.e. CPU clock rate in Hz, then things start to become > > > somewhat > > > confusing: > > > > > > > > > perf script -F time,period | awk 'BEGIN {lastTime = -1;} { time = $1 + > > > 0.0; if (lastTime != -1) {printf("%.6f\t%f\n", time, $2 / (time - > > > lastTime));} lastTime = time; }' | gnuplot -p -e "plot '-' with > > > linespoints" > > > > > > > The perf time stamps approach the maximum precision of double (12 vs > > 15 digits). Likely the division loses too many digits, which may cause > > the bogus results. I've ran into similar problems before. > > I don't think so, just look at the raw values: > > $ perf script -F time,period --ns > 71789.438122347: 1 > 71789.438127160: 1 > 71789.438129599: 7 > 71789.438131844: 94 > 71789.438134282: 1391 > 71789.438139871: 19106 > 71789.438156426: 123336 > ... > > $ qalc '123336/(71789.438156426s - 71789.438139871s) to Hz' > 123336 / ((71789.438 * second) - (71789.438 * second)) = approx. 7.4500755E9 > Hz > > > One way around is is to normalize the time stamps first that they > > start with 0, but this only works for shorter traces. > > Or use some bignum float library > > I take the time delta between two samples, so a normalization of the > individual times to 0 would not affect my calculations - the delta stays the > same after all. > > Also, using bignum in my calculations wouldn't change anything either. If > perf tells me that 123336 cycles have been executed in 16.555 us, then it > will always be larger than any expected value. At 3.2GHz it should be > maximally 52976 cycles in such a short timeframe... > > > Also at the beginning of frequency the periods are very small, and > > the default us resolution will give big jumps for such a calculation. > > OK, but who/what measures the large cycle values then? Is this a PMU > limitation? Or is this an issue with the interaction with the kernel, when > the algorithm tries to find a good frequency at the beginning? > > > It's better to use the script --ns option then, but that makes the > > double precision problem event worse. > > See above, using `--ns` doesn't change anything. And qalc e.g. already uses > bignum internally. > > > In generally you get better results by avoiding frequency mode, > > but always specify a fixed period. > > This indeed removes the spikes at the beginning: > > perf record --switch-events --call-graph dwarf -P -c 50 > > The value is chosen to give a similar sample count to frequency mode. Hey all, I want to revive the above discussion as I'm still completely puzzled by the observation. The tl;dr; for those who have not followed the previous discussion: perf record in frequency mode (i.e. "record ~1000 samples per second") sometimes reports excessively large cycle counts. In the previous mails I have outlined how to visualize this issue graphically with gnuplot, by drawing the first derivative of the cycles over time which gives nicely comparable numbers to ones CPU clock speed. Sometimes, this value goe up to 10Ghz and beyond. Sometimes the values are so broken (i.e. so high), that they completely break the analysis with perf report or similar, as they completely skew the total event count and thereby drastically influence the fractional cost reported by perf. E.g. just now I ran `perf record` on another application and got this result: ~ $ perf script | grep page_remove_rmap -C 10 QXcbEventReader 23866 605019.879412: 128193 cycles:ppp: bb23fbf5 __fget_light (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879469: 1 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) QXcbEventReader 23866 605019.879471:1810360 cycles:ppp: bb1cb4ec find_vma (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879472: 1 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879474: 10 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879475:216 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879486: 5106 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879489: 19381 cycles:ppp: 7f85beae64ce QWaitCondition::wait (/usr/lib/libQt5Core.so.5.9.1) lab_mandelbrot 23865 605019.879495: 251346 cycles:ppp: 7f85bf2567c1 QScreen::~QScreen (/usr/lib/libQt5Gui.so.5.9.1) lab_mandelbrot 23865 605019.880722:
broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
On Sunday, June 18, 2017 9:53:05 PM CEST Milian Wolff wrote: > On Sonntag, 18. Juni 2017 06:22:19 CEST Andi Kleen wrote: > > Milian Wolff writes: > > > But when I look at the naively calculated first derivative, to visualize > > > CPU load, i.e. CPU clock rate in Hz, then things start to become > > > somewhat > > > confusing: > > > > > > > > > perf script -F time,period | awk 'BEGIN {lastTime = -1;} { time = $1 + > > > 0.0; if (lastTime != -1) {printf("%.6f\t%f\n", time, $2 / (time - > > > lastTime));} lastTime = time; }' | gnuplot -p -e "plot '-' with > > > linespoints" > > > > > > > The perf time stamps approach the maximum precision of double (12 vs > > 15 digits). Likely the division loses too many digits, which may cause > > the bogus results. I've ran into similar problems before. > > I don't think so, just look at the raw values: > > $ perf script -F time,period --ns > 71789.438122347: 1 > 71789.438127160: 1 > 71789.438129599: 7 > 71789.438131844: 94 > 71789.438134282: 1391 > 71789.438139871: 19106 > 71789.438156426: 123336 > ... > > $ qalc '123336/(71789.438156426s - 71789.438139871s) to Hz' > 123336 / ((71789.438 * second) - (71789.438 * second)) = approx. 7.4500755E9 > Hz > > > One way around is is to normalize the time stamps first that they > > start with 0, but this only works for shorter traces. > > Or use some bignum float library > > I take the time delta between two samples, so a normalization of the > individual times to 0 would not affect my calculations - the delta stays the > same after all. > > Also, using bignum in my calculations wouldn't change anything either. If > perf tells me that 123336 cycles have been executed in 16.555 us, then it > will always be larger than any expected value. At 3.2GHz it should be > maximally 52976 cycles in such a short timeframe... > > > Also at the beginning of frequency the periods are very small, and > > the default us resolution will give big jumps for such a calculation. > > OK, but who/what measures the large cycle values then? Is this a PMU > limitation? Or is this an issue with the interaction with the kernel, when > the algorithm tries to find a good frequency at the beginning? > > > It's better to use the script --ns option then, but that makes the > > double precision problem event worse. > > See above, using `--ns` doesn't change anything. And qalc e.g. already uses > bignum internally. > > > In generally you get better results by avoiding frequency mode, > > but always specify a fixed period. > > This indeed removes the spikes at the beginning: > > perf record --switch-events --call-graph dwarf -P -c 50 > > The value is chosen to give a similar sample count to frequency mode. Hey all, I want to revive the above discussion as I'm still completely puzzled by the observation. The tl;dr; for those who have not followed the previous discussion: perf record in frequency mode (i.e. "record ~1000 samples per second") sometimes reports excessively large cycle counts. In the previous mails I have outlined how to visualize this issue graphically with gnuplot, by drawing the first derivative of the cycles over time which gives nicely comparable numbers to ones CPU clock speed. Sometimes, this value goe up to 10Ghz and beyond. Sometimes the values are so broken (i.e. so high), that they completely break the analysis with perf report or similar, as they completely skew the total event count and thereby drastically influence the fractional cost reported by perf. E.g. just now I ran `perf record` on another application and got this result: ~ $ perf script | grep page_remove_rmap -C 10 QXcbEventReader 23866 605019.879412: 128193 cycles:ppp: bb23fbf5 __fget_light (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879469: 1 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) QXcbEventReader 23866 605019.879471:1810360 cycles:ppp: bb1cb4ec find_vma (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879472: 1 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879474: 10 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879475:216 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879486: 5106 cycles:ppp: bb064504 native_write_msr (/lib/modules/4.12.8-2-ARCH/build/vmlinux) lab_mandelbrot 23865 605019.879489: 19381 cycles:ppp: 7f85beae64ce QWaitCondition::wait (/usr/lib/libQt5Core.so.5.9.1) lab_mandelbrot 23865 605019.879495: 251346 cycles:ppp: 7f85bf2567c1 QScreen::~QScreen (/usr/lib/libQt5Gui.so.5.9.1) lab_mandelbrot 23865 605019.880722:3210571 cycles:ppp: