On 02/05/2025 9:04 am, Jan Beulich wrote:
> On 02.05.2025 01:30, Andrew Cooper wrote:
>>  * For vpic_get_priority(), introduce a common ror8() helper in plain C.  One
>>    thing that I can't persuade the compiler to realise is that a non-zero
>>    value rotated is still non-zero, so use __builtin_clz() to help the
>>    optimiser out.
>>
>>  * vpic_ioport_write() can be simplified to just for_each_set_bit(), which
>>    avoids spilling pending to the stack each loop iteration.  Changing 
>> pending
>>    from unsigned int to uint8_t isn't even strictly necessary given the
>>    underlying types of vpic->isr and vpic->irr, but done so clarity.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> Reviewed-by: Jan Beulich <jbeul...@suse.com>

Thanks.

>
>> One thing I can't seem to avoid is GCC zero extending the result of ror8()
>> before passing it to BSF/TZCNT.  Then again, that's very specific to uint8_t,
>> and x86's preserving behaviour on byte writes.
> It can't know that the upper bits are "don't care" here, related to the aspect
> above (it not inferring non-zero of the result of the rotate when the input is
> non-zero).

Hmm, I suppose so.

> And I expect it would be too much of a special case to warrant
> getting all of this accounted for, just to remove the zero-ext.
>
> For this specific case we might be able to cheat, but I'm unsure that's worth
> it either (and I also haven't tried whether it actually has the intended
> effect):
>
>     val8 = ror8(mask, vpic->priority_add);
>     asm ( "" : "=r" (val32) : "0" (val8) );
>     return __builtin_ctz(val32);

It's one zero-extend.  I was only curious because I was looking in
detail at the code generation, but it's not something I'm losing sleep over.

~Andrew

Reply via email to