Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
On 29.04.2021 14:53, Roger Pau Monné wrote: > On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote: >> Reading the platform timer isn't cheap, so we'd better avoid it when the >> resulting value is of no interest to anyone. >> >> The consumer of master_stime, obtained by >> time_calibration_{std,tsc}_rendezvous() and propagated through >> this_cpu(cpu_calibration), is local_time_calibration(). With >> CONSTANT_TSC the latter function uses an early exit path, which doesn't >> explicitly use the field. While this_cpu(cpu_calibration) (including the >> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that >> path, both structures' fields get consumed only by the !CONSTANT_TSC >> logic of the function. >> >> Signed-off-by: Jan Beulich > > Acked-by: Roger Pau Monné Thanks. > Albeit as said on my other email I would prefer performance related > changes like this one to be accompanied with some proof that they > actually make a difference, or else we risk making the code more > complicated for no concrete benefit. I'm not sure that's always sensible or useful. Removing an operation that may take hundreds of clocks is surely not going to make things worse performance-wise. Whether it's measurable in any way with real world workloads is hard to predict. Micro-measurement, as expected, shows an improvement. >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -52,6 +52,7 @@ unsigned long pit0_ticks; >> struct cpu_time_stamp { >> u64 local_tsc; >> s_time_t local_stime; >> +/* Next field unconditionally valid only when !CONSTANT_TSC. */ > > Could you also mention this is only true for the cpu_time_stamp that's > used in cpu_calibration? > > For ap_bringup_ref master_stime is valid regardless of CONSTANT_TSC. Well, that's precisely why I put "unconditionally" there. I'm not convinced it's helpful to point out ap_bringup_ref in particular, as the comment would then likely not get updated when yet another instance appears which sets the field in all cases. If you have a suggestion on how to word this better without mentioning particular instances of the struct, I'll be happy to consider taking that. Jan
Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote: > Reading the platform timer isn't cheap, so we'd better avoid it when the > resulting value is of no interest to anyone. > > The consumer of master_stime, obtained by > time_calibration_{std,tsc}_rendezvous() and propagated through > this_cpu(cpu_calibration), is local_time_calibration(). With > CONSTANT_TSC the latter function uses an early exit path, which doesn't > explicitly use the field. While this_cpu(cpu_calibration) (including the > master_stime field) gets propagated to this_cpu(cpu_time).stamp on that > path, both structures' fields get consumed only by the !CONSTANT_TSC > logic of the function. > > Signed-off-by: Jan Beulich Acked-by: Roger Pau Monné Albeit as said on my other email I would prefer performance related changes like this one to be accompanied with some proof that they actually make a difference, or else we risk making the code more complicated for no concrete benefit. > --- > v4: New. > --- > I realize there's some risk associated with potential new uses of the > field down the road. What would people think about compiling time.c a > 2nd time into a dummy object file, with a conditional enabled to force > assuming CONSTANT_TSC, and with that conditional used to suppress > presence of the field as well as all audited used of it (i.e. in > particular that large part of local_time_calibration())? Unexpected new > users of the field would then cause build time errors. > > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -52,6 +52,7 @@ unsigned long pit0_ticks; > struct cpu_time_stamp { > u64 local_tsc; > s_time_t local_stime; > +/* Next field unconditionally valid only when !CONSTANT_TSC. */ Could you also mention this is only true for the cpu_time_stamp that's used in cpu_calibration? For ap_bringup_ref master_stime is valid regardless of CONSTANT_TSC. Thanks, Roger.
Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
On Wed, Apr 21, 2021 at 12:06:34PM +0200, Jan Beulich wrote: > On 20.04.2021 18:12, Roger Pau Monné wrote: > > On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote: > >> Reading the platform timer isn't cheap, so we'd better avoid it when the > >> resulting value is of no interest to anyone. > >> > >> The consumer of master_stime, obtained by > >> time_calibration_{std,tsc}_rendezvous() and propagated through > >> this_cpu(cpu_calibration), is local_time_calibration(). With > >> CONSTANT_TSC the latter function uses an early exit path, which doesn't > >> explicitly use the field. While this_cpu(cpu_calibration) (including the > >> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that > >> path, both structures' fields get consumed only by the !CONSTANT_TSC > >> logic of the function. > >> > >> Signed-off-by: Jan Beulich > >> --- > >> v4: New. > >> --- > >> I realize there's some risk associated with potential new uses of the > >> field down the road. What would people think about compiling time.c a > >> 2nd time into a dummy object file, with a conditional enabled to force > >> assuming CONSTANT_TSC, and with that conditional used to suppress > >> presence of the field as well as all audited used of it (i.e. in > >> particular that large part of local_time_calibration())? Unexpected new > >> users of the field would then cause build time errors. > > > > Wouldn't that add quite a lot of churn to the file itself in the form > > of pre-processor conditionals? > > Possibly - I didn't try yet, simply because of fearing this might > not be liked even without presenting it in patch form. > > > Could we instead set master_stime to an invalid value that would make > > the consumers explode somehow? > > No idea whether there is any such "reliable" value. > > > I know there might be new consumers, but those should be able to > > figure whether the value is sane by looking at the existing ones. > > This could be the hope, yes. But the effort of auditing the code to > confirm the potential of optimizing this (after vaguely getting the > impression there might be room) was non-negligible (in fact I did > three runs just to be really certain). This in particular means > that I'm in no way certain that looking at existing consumers would > point out the possible pitfall. > > > Also, since this is only done on the BSP on the last iteration I > > wonder if it really makes such a difference performance-wise to > > warrant all this trouble. > > By "all this trouble", do you mean the outlined further steps or > the patch itself? Yes, either the further steps or the fact that we would have to be careful to not introduce new users of master_stime that expect it to be set when CONSTANT_TSC is true. > In the latter case, while it's only the BSP to > read the value, all other CPUs are waiting for the BSP to get its > part done. So the extra time it takes to read the platform clock > affects the overall duration of the rendezvous, and hence the time > not "usefully" spent by _all_ of the CPUs. Right, but that's only during the time rendezvous, which doesn't happen that often. And I guess that just the rendezvous of all CPUs is biggest hit in terms of performance. While I don't think I would have done the work myself, I guess there's no reason to block it. In any case I would prefer if such performance related changes come with some proof that they do indeed make a difference, or else we might just be making the code more complicated for no concrete performance benefit. Thanks, Roger.
Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
On 21.04.2021 12:06, Jan Beulich wrote: > On 20.04.2021 18:12, Roger Pau Monné wrote: >> On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote: >>> Reading the platform timer isn't cheap, so we'd better avoid it when the >>> resulting value is of no interest to anyone. >>> >>> The consumer of master_stime, obtained by >>> time_calibration_{std,tsc}_rendezvous() and propagated through >>> this_cpu(cpu_calibration), is local_time_calibration(). With >>> CONSTANT_TSC the latter function uses an early exit path, which doesn't >>> explicitly use the field. While this_cpu(cpu_calibration) (including the >>> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that >>> path, both structures' fields get consumed only by the !CONSTANT_TSC >>> logic of the function. >>> >>> Signed-off-by: Jan Beulich >>> --- >>> v4: New. >>> --- >>> I realize there's some risk associated with potential new uses of the >>> field down the road. What would people think about compiling time.c a >>> 2nd time into a dummy object file, with a conditional enabled to force >>> assuming CONSTANT_TSC, and with that conditional used to suppress >>> presence of the field as well as all audited used of it (i.e. in >>> particular that large part of local_time_calibration())? Unexpected new >>> users of the field would then cause build time errors. >> >> Wouldn't that add quite a lot of churn to the file itself in the form >> of pre-processor conditionals? > > Possibly - I didn't try yet, simply because of fearing this might > not be liked even without presenting it in patch form. > >> Could we instead set master_stime to an invalid value that would make >> the consumers explode somehow? > > No idea whether there is any such "reliable" value. > >> I know there might be new consumers, but those should be able to >> figure whether the value is sane by looking at the existing ones. > > This could be the hope, yes. But the effort of auditing the code to > confirm the potential of optimizing this (after vaguely getting the > impression there might be room) was non-negligible (in fact I did > three runs just to be really certain). This in particular means > that I'm in no way certain that looking at existing consumers would > point out the possible pitfall. > >> Also, since this is only done on the BSP on the last iteration I >> wonder if it really makes such a difference performance-wise to >> warrant all this trouble. > > By "all this trouble", do you mean the outlined further steps or > the patch itself? In the latter case, while it's only the BSP to > read the value, all other CPUs are waiting for the BSP to get its > part done. So the extra time it takes to read the platform clock > affects the overall duration of the rendezvous, and hence the time > not "usefully" spent by _all_ of the CPUs. Ping? Your answer here has a significant effect on the disposition of this change. Jan
Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
On 20.04.2021 18:12, Roger Pau Monné wrote: > On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote: >> Reading the platform timer isn't cheap, so we'd better avoid it when the >> resulting value is of no interest to anyone. >> >> The consumer of master_stime, obtained by >> time_calibration_{std,tsc}_rendezvous() and propagated through >> this_cpu(cpu_calibration), is local_time_calibration(). With >> CONSTANT_TSC the latter function uses an early exit path, which doesn't >> explicitly use the field. While this_cpu(cpu_calibration) (including the >> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that >> path, both structures' fields get consumed only by the !CONSTANT_TSC >> logic of the function. >> >> Signed-off-by: Jan Beulich >> --- >> v4: New. >> --- >> I realize there's some risk associated with potential new uses of the >> field down the road. What would people think about compiling time.c a >> 2nd time into a dummy object file, with a conditional enabled to force >> assuming CONSTANT_TSC, and with that conditional used to suppress >> presence of the field as well as all audited used of it (i.e. in >> particular that large part of local_time_calibration())? Unexpected new >> users of the field would then cause build time errors. > > Wouldn't that add quite a lot of churn to the file itself in the form > of pre-processor conditionals? Possibly - I didn't try yet, simply because of fearing this might not be liked even without presenting it in patch form. > Could we instead set master_stime to an invalid value that would make > the consumers explode somehow? No idea whether there is any such "reliable" value. > I know there might be new consumers, but those should be able to > figure whether the value is sane by looking at the existing ones. This could be the hope, yes. But the effort of auditing the code to confirm the potential of optimizing this (after vaguely getting the impression there might be room) was non-negligible (in fact I did three runs just to be really certain). This in particular means that I'm in no way certain that looking at existing consumers would point out the possible pitfall. > Also, since this is only done on the BSP on the last iteration I > wonder if it really makes such a difference performance-wise to > warrant all this trouble. By "all this trouble", do you mean the outlined further steps or the patch itself? In the latter case, while it's only the BSP to read the value, all other CPUs are waiting for the BSP to get its part done. So the extra time it takes to read the platform clock affects the overall duration of the rendezvous, and hence the time not "usefully" spent by _all_ of the CPUs. Jan
Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote: > Reading the platform timer isn't cheap, so we'd better avoid it when the > resulting value is of no interest to anyone. > > The consumer of master_stime, obtained by > time_calibration_{std,tsc}_rendezvous() and propagated through > this_cpu(cpu_calibration), is local_time_calibration(). With > CONSTANT_TSC the latter function uses an early exit path, which doesn't > explicitly use the field. While this_cpu(cpu_calibration) (including the > master_stime field) gets propagated to this_cpu(cpu_time).stamp on that > path, both structures' fields get consumed only by the !CONSTANT_TSC > logic of the function. > > Signed-off-by: Jan Beulich > --- > v4: New. > --- > I realize there's some risk associated with potential new uses of the > field down the road. What would people think about compiling time.c a > 2nd time into a dummy object file, with a conditional enabled to force > assuming CONSTANT_TSC, and with that conditional used to suppress > presence of the field as well as all audited used of it (i.e. in > particular that large part of local_time_calibration())? Unexpected new > users of the field would then cause build time errors. Wouldn't that add quite a lot of churn to the file itself in the form of pre-processor conditionals? Could we instead set master_stime to an invalid value that would make the consumers explode somehow? I know there might be new consumers, but those should be able to figure whether the value is sane by looking at the existing ones. Also, since this is only done on the BSP on the last iteration I wonder if it really makes such a difference performance-wise to warrant all this trouble. Thanks, Roger.
[PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
Reading the platform timer isn't cheap, so we'd better avoid it when the resulting value is of no interest to anyone. The consumer of master_stime, obtained by time_calibration_{std,tsc}_rendezvous() and propagated through this_cpu(cpu_calibration), is local_time_calibration(). With CONSTANT_TSC the latter function uses an early exit path, which doesn't explicitly use the field. While this_cpu(cpu_calibration) (including the master_stime field) gets propagated to this_cpu(cpu_time).stamp on that path, both structures' fields get consumed only by the !CONSTANT_TSC logic of the function. Signed-off-by: Jan Beulich --- v4: New. --- I realize there's some risk associated with potential new uses of the field down the road. What would people think about compiling time.c a 2nd time into a dummy object file, with a conditional enabled to force assuming CONSTANT_TSC, and with that conditional used to suppress presence of the field as well as all audited used of it (i.e. in particular that large part of local_time_calibration())? Unexpected new users of the field would then cause build time errors. --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -52,6 +52,7 @@ unsigned long pit0_ticks; struct cpu_time_stamp { u64 local_tsc; s_time_t local_stime; +/* Next field unconditionally valid only when !CONSTANT_TSC. */ s_time_t master_stime; }; @@ -1702,7 +1703,7 @@ static void time_calibration_tsc_rendezv * iteration. */ r->master_tsc_stamp = r->max_tsc_stamp; -else if ( i == 0 ) +else if ( !boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && i == 0 ) r->master_stime = read_platform_stime(NULL); atomic_inc(>semaphore); @@ -1776,8 +1777,11 @@ static void time_calibration_std_rendezv { while ( atomic_read(>semaphore) != (total_cpus - 1) ) cpu_relax(); -r->master_stime = read_platform_stime(NULL); -smp_wmb(); /* write r->master_stime /then/ signal */ +if ( !boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) +{ +r->master_stime = read_platform_stime(NULL); +smp_wmb(); /* write r->master_stime /then/ signal */ +} atomic_inc(>semaphore); } else