Re: [PATCH] KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc
On 2023-04-12 12:34:13, Kautuk Consul wrote: > Hi, > > On 2023-04-11 16:35:10, Michael Ellerman wrote: > > Kautuk Consul writes: > > > On 2023-04-07 09:01:29, Sean Christopherson wrote: > > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote: > > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > > >> > > I used the unlikely() macro on the return values of the k.alloc > > >> > > calls and found that it changes the code generation a bit. > > >> > > Optimize all return paths of k.alloc calls by improving > > >> > > branch prediction on return value of k.alloc. > > >> > > >> Nit, this is improving code generation, not branch prediction. > > > Sorry my mistake. > > >> > > >> > What about below? > > >> > > > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using > > >> > unlikely() macro to optimize their return paths." > > >> > > >> Another nit, using unlikely() doesn't necessarily provide a measurable > > >> optimization. > > >> As above, it does often improve code generation for the happy path, but > > >> that doesn't > > >> always equate to improved performance, e.g. if the CPU can easily > > >> predict the branch > > >> and/or there is no impact on the cache footprint. > > > > > I see. I will submit a v2 of the patch with a better and more accurate > > > description. Does anyone else have any comments before I do so ? > > > > In general I think unlikely should be saved for cases where either the > > compiler is generating terrible code, or the likelyness of the condition > > might be surprising to a human reader. > > > > eg. if you had some code that does a NULL check and it's *expected* that > > the value is NULL, then wrapping that check in likely() actually adds > > information for a human reader. > > > > Also please don't use unlikely in init paths or other cold paths, it > > clutters the code (only slightly but a little) and that's not worth the > > possible tiny benefit for code that only runs once or infrequently. > > > > I would expect the compilers to do the right thing in all > > these cases without the unlikely. But if you can demonstrate that they > > meaningfully improve the code generation with a before/after > > dissassembly then I'd be interested. > Just FYI, the last email by kautuk.consul...@gmail.com was by me. > That last email contains a diff file attachment which compares 2 files: > before my changes and after my changes. > This diff file shows a lot of changes in code generation. Im assuming > all those changes are made by the compiler towards optimizing all return > paths to k.alloc calls. > Kindly review and comment. Any comments on the numerous code generation changes as shown by the files I attached to this mail chain ? Sorry I don't have concrete figures of any type to prove that this leads to any measurable performance improvements. I am just assuming that the compiler's modified code generation (due to the use of the unlikely macro) would be optimal. Thanks. > > cheers
Re: [PATCH] KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc
Hi, On 2023-04-11 16:35:10, Michael Ellerman wrote: > Kautuk Consul writes: > > On 2023-04-07 09:01:29, Sean Christopherson wrote: > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote: > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > >> > > I used the unlikely() macro on the return values of the k.alloc > >> > > calls and found that it changes the code generation a bit. > >> > > Optimize all return paths of k.alloc calls by improving > >> > > branch prediction on return value of k.alloc. > >> > >> Nit, this is improving code generation, not branch prediction. > > Sorry my mistake. > >> > >> > What about below? > >> > > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using > >> > unlikely() macro to optimize their return paths." > >> > >> Another nit, using unlikely() doesn't necessarily provide a measurable > >> optimization. > >> As above, it does often improve code generation for the happy path, but > >> that doesn't > >> always equate to improved performance, e.g. if the CPU can easily predict > >> the branch > >> and/or there is no impact on the cache footprint. > > > I see. I will submit a v2 of the patch with a better and more accurate > > description. Does anyone else have any comments before I do so ? > > In general I think unlikely should be saved for cases where either the > compiler is generating terrible code, or the likelyness of the condition > might be surprising to a human reader. > > eg. if you had some code that does a NULL check and it's *expected* that > the value is NULL, then wrapping that check in likely() actually adds > information for a human reader. > > Also please don't use unlikely in init paths or other cold paths, it > clutters the code (only slightly but a little) and that's not worth the > possible tiny benefit for code that only runs once or infrequently. > > I would expect the compilers to do the right thing in all > these cases without the unlikely. But if you can demonstrate that they > meaningfully improve the code generation with a before/after > dissassembly then I'd be interested. Just FYI, the last email by kautuk.consul...@gmail.com was by me. That last email contains a diff file attachment which compares 2 files: before my changes and after my changes. This diff file shows a lot of changes in code generation. Im assuming all those changes are made by the compiler towards optimizing all return paths to k.alloc calls. Kindly review and comment. > cheers
Re: [PATCH] KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc
Kautuk Consul writes: > On 2023-04-07 09:01:29, Sean Christopherson wrote: >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote: >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: >> > > I used the unlikely() macro on the return values of the k.alloc >> > > calls and found that it changes the code generation a bit. >> > > Optimize all return paths of k.alloc calls by improving >> > > branch prediction on return value of k.alloc. >> >> Nit, this is improving code generation, not branch prediction. > Sorry my mistake. >> >> > What about below? >> > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using >> > unlikely() macro to optimize their return paths." >> >> Another nit, using unlikely() doesn't necessarily provide a measurable >> optimization. >> As above, it does often improve code generation for the happy path, but that >> doesn't >> always equate to improved performance, e.g. if the CPU can easily predict >> the branch >> and/or there is no impact on the cache footprint. > I see. I will submit a v2 of the patch with a better and more accurate > description. Does anyone else have any comments before I do so ? In general I think unlikely should be saved for cases where either the compiler is generating terrible code, or the likelyness of the condition might be surprising to a human reader. eg. if you had some code that does a NULL check and it's *expected* that the value is NULL, then wrapping that check in likely() actually adds information for a human reader. Also please don't use unlikely in init paths or other cold paths, it clutters the code (only slightly but a little) and that's not worth the possible tiny benefit for code that only runs once or infrequently. I would expect the compilers to do the right thing in all these cases without the unlikely. But if you can demonstrate that they meaningfully improve the code generation with a before/after dissassembly then I'd be interested. cheers
Re: [PATCH] KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc
On 2023-04-07 09:01:29, Sean Christopherson wrote: > On Fri, Apr 07, 2023, Bagas Sanjaya wrote: > > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > > > I used the unlikely() macro on the return values of the k.alloc > > > calls and found that it changes the code generation a bit. > > > Optimize all return paths of k.alloc calls by improving > > > branch prediction on return value of k.alloc. > > Nit, this is improving code generation, not branch prediction. Sorry my mistake. > > > What about below? > > > > "Improve branch prediction on kmalloc() and kzalloc() call by using > > unlikely() macro to optimize their return paths." > > Another nit, using unlikely() doesn't necessarily provide a measurable > optimization. > As above, it does often improve code generation for the happy path, but that > doesn't > always equate to improved performance, e.g. if the CPU can easily predict the > branch > and/or there is no impact on the cache footprint. I see. I will submit a v2 of the patch with a better and more accurate description. Does anyone else have any comments before I do so ?
Re: [PATCH] KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc
On Fri, Apr 07, 2023, Bagas Sanjaya wrote: > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > > I used the unlikely() macro on the return values of the k.alloc > > calls and found that it changes the code generation a bit. > > Optimize all return paths of k.alloc calls by improving > > branch prediction on return value of k.alloc. Nit, this is improving code generation, not branch prediction. > What about below? > > "Improve branch prediction on kmalloc() and kzalloc() call by using > unlikely() macro to optimize their return paths." Another nit, using unlikely() doesn't necessarily provide a measurable optimization. As above, it does often improve code generation for the happy path, but that doesn't always equate to improved performance, e.g. if the CPU can easily predict the branch and/or there is no impact on the cache footprint.
Re: [PATCH] KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc
On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > I used the unlikely() macro on the return values of the k.alloc > calls and found that it changes the code generation a bit. > Optimize all return paths of k.alloc calls by improving > branch prediction on return value of k.alloc. What about below? "Improve branch prediction on kmalloc() and kzalloc() call by using unlikely() macro to optimize their return paths." That is, try to avoid first-person construct (I). Thanks. -- An old man doll... just what I always wanted! - Clara signature.asc Description: PGP signature
[PATCH] KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc
I used the unlikely() macro on the return values of the k.alloc calls and found that it changes the code generation a bit. Optimize all return paths of k.alloc calls by improving branch prediction on return value of k.alloc. Signed-off-by: Kautuk Consul --- arch/powerpc/kvm/book3s_hv_nested.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 5a64a1341e6f..dbf2dd073e1f 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -446,7 +446,7 @@ long kvmhv_nested_init(void) ptb_order = 12; pseries_partition_tb = kmalloc(sizeof(struct patb_entry) << ptb_order, GFP_KERNEL); - if (!pseries_partition_tb) { + if (unlikely(!pseries_partition_tb)) { pr_err("kvm-hv: failed to allocated nested partition table\n"); return -ENOMEM; } @@ -575,7 +575,7 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu) return H_PARAMETER; buf = kzalloc(n, GFP_KERNEL | __GFP_NOWARN); - if (!buf) + if (unlikely(!buf)) return H_NO_MEM; gp = kvmhv_get_nested(vcpu->kvm, l1_lpid, false); @@ -689,7 +689,7 @@ static struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int long shadow_lpid; gp = kzalloc(sizeof(*gp), GFP_KERNEL); - if (!gp) + if (unlikely(!gp)) return NULL; gp->l1_host = kvm; gp->l1_lpid = lpid; @@ -1633,7 +1633,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu, /* 4. Insert the pte into our shadow_pgtable */ n_rmap = kzalloc(sizeof(*n_rmap), GFP_KERNEL); - if (!n_rmap) + if (unlikely(!n_rmap)) return RESUME_GUEST; /* Let the guest try again */ n_rmap->rmap = (n_gpa & RMAP_NESTED_GPA_MASK) | (((unsigned long) gp->l1_lpid) << RMAP_NESTED_LPID_SHIFT); -- 2.39.2