On 02/05/2025 9:04 am, Jan Beulich wrote: > On 02.05.2025 01:30, Andrew Cooper wrote: >> * For vpic_get_priority(), introduce a common ror8() helper in plain C. One >> thing that I can't persuade the compiler to realise is that a non-zero >> value rotated is still non-zero, so use __builtin_clz() to help the >> optimiser out. >> >> * vpic_ioport_write() can be simplified to just for_each_set_bit(), which >> avoids spilling pending to the stack each loop iteration. Changing >> pending >> from unsigned int to uint8_t isn't even strictly necessary given the >> underlying types of vpic->isr and vpic->irr, but done so clarity. >> >> No functional change. >> >> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> > Reviewed-by: Jan Beulich <jbeul...@suse.com>
Thanks. > >> One thing I can't seem to avoid is GCC zero extending the result of ror8() >> before passing it to BSF/TZCNT. Then again, that's very specific to uint8_t, >> and x86's preserving behaviour on byte writes. > It can't know that the upper bits are "don't care" here, related to the aspect > above (it not inferring non-zero of the result of the rotate when the input is > non-zero). Hmm, I suppose so. > And I expect it would be too much of a special case to warrant > getting all of this accounted for, just to remove the zero-ext. > > For this specific case we might be able to cheat, but I'm unsure that's worth > it either (and I also haven't tried whether it actually has the intended > effect): > > val8 = ror8(mask, vpic->priority_add); > asm ( "" : "=r" (val32) : "0" (val8) ); > return __builtin_ctz(val32); It's one zero-extend. I was only curious because I was looking in detail at the code generation, but it's not something I'm losing sleep over. ~Andrew