On Sun, Jan 27, 2019 at 10:07:12AM -0800, Richard Henderson wrote:
> On 1/27/19 9:45 AM, Mark Cave-Ayland wrote:
> >> I would expect the i < n/2 loop to be faster, because the assignments are
> >> unconditional. FWIW.
> >
> > Do you have any idea as to how much faster? Is it something that would show
> > up as significant within the context of QEMU?
>
> I don't have any numbers on that, no.
>
> > As well as eliminating the HI_IDX/LO_IDX constants I do find the updated
> > version much easier to read, so I would prefer to keep it if possible.
> > What about unrolling the loop into 2 separate ones...
>
> I doubt that would be helpful.
>
> I would think that
>
> #define VMRG_DO(name, access, ofs)
> ...
> int i, half = ARRAY_SIZE(r->access(0)) / 2;
> ...
> for (i = 0; i < half; i++) {
> result.access(2 * i + 0) = a->access(i + ofs);
> result.access(2 * i + 1) = b->access(i + ofs);
> }
>
> where OFS = 0 for HI and half for LO is best. I find it quite readable, and
> it
> avoids duplicating code between LO and HI as you're currently doing.
Marc, Richard, where are we at with this?
Should I wait on a revised version of this patch before applying the
series?
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
