Re: linux-next: x86-latest/powerpc-next merge conflict
* Stephen Rothwell [EMAIL PROTECTED] wrote: Hi all, Today's linux-next merge of the x86-latest tree got a conflict in include/asm-powerpc/bitops.h between commit cd008c0f03f3d451e5fbd108b8e74079d402be64 (generic: implement __fls on all 64-bit archs) from the x86-latest tree and commit 9f264be6101c42cb9e471c58322fb83a5cde1461 ([POWERPC] Optimize fls64() on 64-bit processors) from the powerpc-next tree. The fixup was not quite trivial and is worth a look to see if I got it right. Paul, do you agree with those generic bitops changes? Just in case it's not obvious from previous discussions: we'll push them upstream via a separate pull request, not via usual x86.git changes. They originated from x86.git but grew into a more generic improvement for all. They sit in x86.git for tester convenience but are of course not pure x86 changes anymore. Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
linux-next: x86-latest/powerpc-next merge conflict
Hi all, Today's linux-next merge of the x86-latest tree got a conflict in include/asm-powerpc/bitops.h between commit cd008c0f03f3d451e5fbd108b8e74079d402be64 (generic: implement __fls on all 64-bit archs) from the x86-latest tree and commit 9f264be6101c42cb9e471c58322fb83a5cde1461 ([POWERPC] Optimize fls64() on 64-bit processors) from the powerpc-next tree. The fixup was not quite trivial and is worth a look to see if I got it right. -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgp03NLeyUduM.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: linux-next: x86-latest/powerpc-next merge conflict
Alexander van Heukelum writes: Powerpc would pick up an optimized version via this chain: generic fls64 - powerpc __fls -- __ilog2 -- asm (PPC_CNTLZL %0,%1 : =r (lz) : r (x)). Why wouldn't powerpc continue to use the fls64 that I have in there now? However, the generic version of fls64 first tests the argument for zero. From your code I derive that the count-leading-zeroes instruction for argument zero is defined as cntlzl(0) == BITS_PER_LONG. That is correct. If the argument is 0 then all of the zero bits are leading zeroes. :) Regards, Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: linux-next: x86-latest/powerpc-next merge conflict
Ingo Molnar writes: Paul, do you agree with those generic bitops changes? Just in case it's Well, it looks OK, but I'm sure people are going to get confused with fls vs. fls64 vs. __fls all being subtly different. I'd say it's worth putting a little file in the Documentation directory to explain it all. not obvious from previous discussions: we'll push them upstream via a separate pull request, not via usual x86.git changes. They originated from x86.git but grew into a more generic improvement for all. They sit in x86.git for tester convenience but are of course not pure x86 changes anymore. I'm not sure why the add __fls to all 64-bit architectures change has to be done as a single patch rather than a patch per architecture going through the architecture maintainers. I suppose that avoids any problem with some maintainers not sending it upstream quickly. I would expect that if it is a single cross-architecture patch that it would go through Andrew Morton, though. But if Andrew wants you to handle it then I'm happy to give you an Acked-by for it. Regards, Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: linux-next: x86-latest/powerpc-next merge conflict
On Mon, 21 Apr 2008 15:36:06 +0200, Gabriel Paubert [EMAIL PROTECTED] said: On Mon, Apr 21, 2008 at 03:07:13PM +0200, Alexander van Heukelum wrote: On Mon, 21 Apr 2008 22:13:06 +1000, Paul Mackerras [EMAIL PROTECTED] said: Alexander van Heukelum writes: Powerpc would pick up an optimized version via this chain: generic fls64 - powerpc __fls -- __ilog2 -- asm (PPC_CNTLZL %0,%1 : =r (lz) : r (x)). Why wouldn't powerpc continue to use the fls64 that I have in there now? In Linus' tree that would be the generic one that uses (the 32-bit) fls(): static inline int fls64(__u64 x) { __u32 h = x 32; if (h) return fls(h) + 32; return fls(x); } However, the generic version of fls64 first tests the argument for zero. From your code I derive that the count-leading-zeroes instruction for argument zero is defined as cntlzl(0) == BITS_PER_LONG. That is correct. If the argument is 0 then all of the zero bits are leading zeroes. :) So... for 64-bit powerpc it makes sense to have its own implementation and ignore the (improved) generic one and for 32-bit powerpc the generic implementation of fls64 is fine. The current situation in linux-next seems optimal to me. Not so sure, the optimal version of fls64 for 32 bit PPC seems to be: cntlzw ch,h ; ch = fls32(h) where h = x32 cntlzw cl,l ; cl = fls32(l) where l = (__u32)x srwit1,ch,5 neg t1,t1 ; t1 = (h==0) ? -1 : 0 and cl,t1,cl ; cl = (h==0) ? cl : 0 add result,ch,cl That's only 6 instructions without any branch, although the dependency chain is 5 instructions long. Good luck getting the compiler to generate something as compact as this. I should not have said the magic word optimal, I guess ;). The code you show would fit nicely as an arch-specific optimized version of fls64 for 32-bit powerpc in include/arch-powerpc/bitops.h. Greetings, Alexander (who is not going to write and test a patch with powerpc inline assembly soon. srwi?) Don't worry about the number of cntlzw, it's one clock on all 32 bit PPC processors I know, some may even be able to perform 2 or 3 cntlzw per clock. Regards, Gabriel -- Alexander van Heukelum [EMAIL PROTECTED] -- http://www.fastmail.fm - Same, same, but differentÂ… ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev