On 29/01/2019 02:28, David Gibson wrote:

> On Sun, Jan 27, 2019 at 10:07:12AM -0800, Richard Henderson wrote:
>> On 1/27/19 9:45 AM, Mark Cave-Ayland wrote:
>>>> I would expect the i < n/2 loop to be faster, because the assignments are
>>>> unconditional.  FWIW.
>>>
>>> Do you have any idea as to how much faster? Is it something that would show
>>> up as significant within the context of QEMU?
>>
>> I don't have any numbers on that, no.
>>
>>> As well as eliminating the HI_IDX/LO_IDX constants I do find the updated
>>> version much easier to read, so I would prefer to keep it if possible.
>>> What about unrolling the loop into 2 separate ones...
>>
>> I doubt that would be helpful.
>>
>> I would think that
>>
>> #define VMRG_DO(name, access, ofs)
>> ...
>>     int i, half = ARRAY_SIZE(r->access(0)) / 2;
>> ...
>>     for (i = 0; i < half; i++) {
>>         result.access(2 * i + 0) = a->access(i + ofs);
>>         result.access(2 * i + 1) = b->access(i + ofs);
>>     }
>>
>> where OFS = 0 for HI and half for LO is best.  I find it quite readable, and 
>> it
>> avoids duplicating code between LO and HI as you're currently doing.
> 
> Marc, Richard, where are we at with this?
> 
> Should I wait on a revised version of this patch before applying the
> series?

Certainly the v3 as posted is correct (I've tested this particular patch on 
both big
and small endian machines), so I believe the only question is whether this 
introduces
any noticeable performance penalty.

Let me try and run a few simple tests and report back.

BTW are you able to take my qemu-macppc queue posted yesterday at
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg07263.html? There's no
functional change except for PPC MacOS users who explicitly enable the new QEMU 
EDID
support on the command line.


ATB,

Mark.

Reply via email to