On 02.05.2025 01:30, Andrew Cooper wrote:
>  * For vpic_get_priority(), introduce a common ror8() helper in plain C.  One
>    thing that I can't persuade the compiler to realise is that a non-zero
>    value rotated is still non-zero, so use __builtin_clz() to help the
>    optimiser out.
> 
>  * vpic_ioport_write() can be simplified to just for_each_set_bit(), which
>    avoids spilling pending to the stack each loop iteration.  Changing pending
>    from unsigned int to uint8_t isn't even strictly necessary given the
>    underlying types of vpic->isr and vpic->irr, but done so clarity.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>

Reviewed-by: Jan Beulich <jbeul...@suse.com>

> One thing I can't seem to avoid is GCC zero extending the result of ror8()
> before passing it to BSF/TZCNT.  Then again, that's very specific to uint8_t,
> and x86's preserving behaviour on byte writes.

It can't know that the upper bits are "don't care" here, related to the aspect
above (it not inferring non-zero of the result of the rotate when the input is
non-zero). And I expect it would be too much of a special case to warrant
getting all of this accounted for, just to remove the zero-ext.

For this specific case we might be able to cheat, but I'm unsure that's worth
it either (and I also haven't tried whether it actually has the intended
effect):

    val8 = ror8(mask, vpic->priority_add);
    asm ( "" : "=r" (val32) : "0" (val8) );
    return __builtin_ctz(val32);

Jan

Reply via email to