On 29/01/2019 02:28, David Gibson wrote:
> On Sun, Jan 27, 2019 at 10:07:12AM -0800, Richard Henderson wrote:
>> On 1/27/19 9:45 AM, Mark Cave-Ayland wrote:
>>>> I would expect the i < n/2 loop to be faster, because the assignments are
>>>> unconditional. FWIW.
>>>
>>> Do you have any idea as to how much faster? Is it something that would show
>>> up as significant within the context of QEMU?
>>
>> I don't have any numbers on that, no.
>>
>>> As well as eliminating the HI_IDX/LO_IDX constants I do find the updated
>>> version much easier to read, so I would prefer to keep it if possible.
>>> What about unrolling the loop into 2 separate ones...
>>
>> I doubt that would be helpful.
>>
>> I would think that
>>
>> #define VMRG_DO(name, access, ofs)
>> ...
>> int i, half = ARRAY_SIZE(r->access(0)) / 2;
>> ...
>> for (i = 0; i < half; i++) {
>> result.access(2 * i + 0) = a->access(i + ofs);
>> result.access(2 * i + 1) = b->access(i + ofs);
>> }
>>
>> where OFS = 0 for HI and half for LO is best. I find it quite readable, and
>> it
>> avoids duplicating code between LO and HI as you're currently doing.
>
> Marc, Richard, where are we at with this?
>
> Should I wait on a revised version of this patch before applying the
> series?
Certainly the v3 as posted is correct (I've tested this particular patch on
both big
and small endian machines), so I believe the only question is whether this
introduces
any noticeable performance penalty.
Let me try and run a few simple tests and report back.
BTW are you able to take my qemu-macppc queue posted yesterday at
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg07263.html? There's no
functional change except for PPC MacOS users who explicitly enable the new QEMU
EDID
support on the command line.
ATB,
Mark.