On 02.05.2025 01:30, Andrew Cooper wrote: > * For vpic_get_priority(), introduce a common ror8() helper in plain C. One > thing that I can't persuade the compiler to realise is that a non-zero > value rotated is still non-zero, so use __builtin_clz() to help the > optimiser out. > > * vpic_ioport_write() can be simplified to just for_each_set_bit(), which > avoids spilling pending to the stack each loop iteration. Changing pending > from unsigned int to uint8_t isn't even strictly necessary given the > underlying types of vpic->isr and vpic->irr, but done so clarity. > > No functional change. > > Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com> > One thing I can't seem to avoid is GCC zero extending the result of ror8() > before passing it to BSF/TZCNT. Then again, that's very specific to uint8_t, > and x86's preserving behaviour on byte writes. It can't know that the upper bits are "don't care" here, related to the aspect above (it not inferring non-zero of the result of the rotate when the input is non-zero). And I expect it would be too much of a special case to warrant getting all of this accounted for, just to remove the zero-ext. For this specific case we might be able to cheat, but I'm unsure that's worth it either (and I also haven't tried whether it actually has the intended effect): val8 = ror8(mask, vpic->priority_add); asm ( "" : "=r" (val32) : "0" (val8) ); return __builtin_ctz(val32); Jan