On 26/08/2021 11:14, Jan Beulich wrote: > The contents of the output arrays are undefined in both cases anyway > when the operation itself gets marked as failed. There's no value in > trying to continue after a guest memory access failure. > > Signed-off-by: Jan Beulich <jbeul...@suse.com>
Not really Acked-by: Andrew Cooper <andrew.coop...@citrix.com> This is an example of a bad loop adjustment. Taking just one example to demonstrate with, > --- a/xen/common/grant_table.c > +++ b/xen/common/grant_table.c > @@ -3289,17 +3292,15 @@ gnttab_get_status_frames(XEN_GUEST_HANDL > "status frames, but has only %u\n", > d->domain_id, op.nr_frames, nr_status_frames(gt)); > op.status = GNTST_general_error; > - goto unlock; > } > > - for ( i = 0; i < op.nr_frames; i++ ) > + for ( i = 0; op.status == GNTST_okay && i < op.nr_frames; i++ ) > { > gmfn = gfn_x(gnttab_status_gfn(d, gt, i)); > if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) ) > op.status = GNTST_bad_virt_addr; > } > > - unlock: > grant_read_unlock(gt); > out2: > rcu_unlock_domain(d); > If instead, this were written for ( i = 0; i < op.nr_frames; i++ ) { gmfn = gfn_x(gnttab_status_gfn(d, gt, i)); if ( __copy_to_guest_offset(op.frame_list, i, &gmfn, 1) ) { op.status = GNTST_bad_virt_addr; goto unlock; } } then the delta vs your version is -36 bytes, and faster to run because the loop doesn't need a memory read and compare on every iteration when you can exit based purely on existing control flow. Furthermore, the version with a goto is clearer to follow, because the exit condition is much more obvious. The compat change can do the same with breaks rather than gotos, for a slightly more modest -11 saving. A form with the op.status changes adjustments *not* added to the loop condition, Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com> In reference to the hypercall ABI adjustments, it occurs to me that loops like this (which we have loads of, even in hypercall hotpaths) are horrifying for performance. For HVM, we're redoing the nested pagewalk for every uint64_t element of an array. A "copy array to guest" primitive would more efficient still than a plain virt->phys translation, because we'd be able to drop the p2m walks too. Obviously, we don't want every instance like this to be doing its own manual bounce buffering, so perhaps we should invest in some buffered copy helpers as part of trying to improve hypercall performance. ~Andrew