On 29/01/2019 02:28, David Gibson wrote: > On Sun, Jan 27, 2019 at 10:07:12AM -0800, Richard Henderson wrote: >> On 1/27/19 9:45 AM, Mark Cave-Ayland wrote: >>>> I would expect the i < n/2 loop to be faster, because the assignments are >>>> unconditional. FWIW. >>> >>> Do you have any idea as to how much faster? Is it something that would show >>> up as significant within the context of QEMU? >> >> I don't have any numbers on that, no. >> >>> As well as eliminating the HI_IDX/LO_IDX constants I do find the updated >>> version much easier to read, so I would prefer to keep it if possible. >>> What about unrolling the loop into 2 separate ones... >> >> I doubt that would be helpful. >> >> I would think that >> >> #define VMRG_DO(name, access, ofs) >> ... >> int i, half = ARRAY_SIZE(r->access(0)) / 2; >> ... >> for (i = 0; i < half; i++) { >> result.access(2 * i + 0) = a->access(i + ofs); >> result.access(2 * i + 1) = b->access(i + ofs); >> } >> >> where OFS = 0 for HI and half for LO is best. I find it quite readable, and >> it >> avoids duplicating code between LO and HI as you're currently doing. > > Marc, Richard, where are we at with this? > > Should I wait on a revised version of this patch before applying the > series?
Certainly the v3 as posted is correct (I've tested this particular patch on both big and small endian machines), so I believe the only question is whether this introduces any noticeable performance penalty. Let me try and run a few simple tests and report back. BTW are you able to take my qemu-macppc queue posted yesterday at https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg07263.html? There's no functional change except for PPC MacOS users who explicitly enable the new QEMU EDID support on the command line. ATB, Mark.