[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-19 Thread Kazu Hirata via Phabricator via cfe-commits
kazu added a comment.

In D141798#4065025 , @RKSimon wrote:

> @kazu Thanks for dealing with this!
>
> I'd like to build on this and create llvm variants of the C++20  
> countl_zero/countr_zero/countl_one/countr_one template functions similar to 
> what I did for popcount in D132407  (and 
> have MathExtras.h reuse it) - I just wanted to ensure you hadn't already 
> started anything similar?

I just uploaded:

https://reviews.llvm.org/D142078


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-19 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added a comment.

@kazu Thanks for dealing with this!

I'd like to build on this and create llvm variants of the C++20  
countl_zero/countr_zero/countl_one/countr_one template functions similar to 
what I did for popcount in D132407  (and have 
MathExtras.h reuse it) - I just wanted to ensure you hadn't already started 
anything similar?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-18 Thread Kazu Hirata via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
kazu marked 2 inline comments as done.
Closed by commit rG83d56fb17a4d: Drop the ZeroBehavior parameter from 
countLeadingZeros and the like (NFC) (authored by kazu).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

Files:
  llvm/include/llvm/Support/MathExtras.h
  llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
  llvm/lib/Transforms/IPO/LowerTypeTests.cpp
  llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
  mlir/lib/Bytecode/Reader/BytecodeReader.cpp

Index: mlir/lib/Bytecode/Reader/BytecodeReader.cpp
===
--- mlir/lib/Bytecode/Reader/BytecodeReader.cpp
+++ mlir/lib/Bytecode/Reader/BytecodeReader.cpp
@@ -281,8 +281,7 @@
 // here because we only care about the first byte, and so that be actually
 // get ctz intrinsic calls when possible (the `uint8_t` overload uses a loop
 // implementation).
-uint32_t numBytes =
-llvm::countTrailingZeros(result, llvm::ZB_Undefined);
+uint32_t numBytes = llvm::countTrailingZeros(result);
 assert(numBytes > 0 && numBytes <= 7 &&
"unexpected number of trailing zeros in varint encoding");
 
Index: llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
===
--- llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+++ llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
@@ -259,8 +259,7 @@
 if (I < B.size())
   BitsUsed |= B[I];
   if (BitsUsed != 0xff)
-return (MinByte + I) * 8 +
-   countTrailingZeros(uint8_t(~BitsUsed), ZB_Undefined);
+return (MinByte + I) * 8 + countTrailingZeros(uint8_t(~BitsUsed));
 }
   } else {
 // Find a free (Size/8) byte region in each member of Used.
Index: llvm/lib/Transforms/IPO/LowerTypeTests.cpp
===
--- llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -172,7 +172,7 @@
 
   BSI.AlignLog2 = 0;
   if (Mask != 0)
-BSI.AlignLog2 = countTrailingZeros(Mask, ZB_Undefined);
+BSI.AlignLog2 = countTrailingZeros(Mask);
 
   // Build the compressed bitset while normalizing the offsets against the
   // computed alignment.
Index: llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
===
--- llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
+++ llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
@@ -180,7 +180,7 @@
   // or ADDIW. If there are trailing zeros, try generating a sign extended
   // constant with no trailing zeros and use a final SLLI to restore them.
   if ((Val & 0xfff) != 0 && (Val & 1) == 0 && Res.size() >= 2) {
-unsigned TrailingZeros = countTrailingZeros((uint64_t)Val, ZB_Undefined);
+unsigned TrailingZeros = countTrailingZeros((uint64_t)Val);
 int64_t ShiftedVal = Val >> TrailingZeros;
 // If we can use C.LI+C.SLLI instead of LUI+ADDI(W) prefer that since
 // its more compressible. But only if LUI+ADDI(W) isn't fusable.
@@ -202,7 +202,7 @@
   if (Val > 0 && Res.size() > 2) {
 assert(ActiveFeatures[RISCV::Feature64Bit] &&
"Expected RV32 to only need 2 instructions");
-unsigned LeadingZeros = countLeadingZeros((uint64_t)Val, ZB_Undefined);
+unsigned LeadingZeros = countLeadingZeros((uint64_t)Val);
 uint64_t ShiftedVal = (uint64_t)Val << LeadingZeros;
 // Fill in the bits that will be shifted out with 1s. An example where this
 // helps is trailing one masks with 32 or more ones. This will generate
Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp
===
--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -2451,8 +2451,7 @@
   unsigned PsInputBits = Info->getPSInputAddr() & Info->getPSInputEnable();
   if ((PsInputBits & 0x7F) == 0 ||
   ((PsInputBits & 0xF) == 0 && (PsInputBits >> 11 & 1)))
-Info->markPSInputEnabled(
-countTrailingZeros(Info->getPSInputAddr(), ZB_Undefined));
+Info->markPSInputEnabled(countTrailingZeros(Info->getPSInputAddr()));
 }
   } else if (IsKernel) {
 assert(Info->hasWorkGroupIDX() && Info->hasWorkItemIDX());
Index: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
===
--- llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+++ llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
@@ -701,8 +701,7 @@
   if ((PsInputBits & 0x7F) == 0 ||
   ((PsInputBits & 0xF) == 0 &&
(PsInputBits >> 11 & 1)))
-Info->markPSInputEnabled(
-  countTrailingZeros(In

[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-18 Thread Kazu Hirata via Phabricator via cfe-commits
kazu marked an inline comment as done.
kazu added inline comments.



Comment at: llvm/include/llvm/Support/MathExtras.h:225
 ///
 /// \param ZB the behavior on an input of 0. Only ZB_Max and ZB_Undefined are
 ///   valid arguments.

jrtc27 wrote:
> A bunch of functions still have this documentation, but these are now the 
> only possible values, so this seems redundant
I've adjusted the redundant comments on two functions - `findFirstSet` and 
`findLastSet`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-18 Thread Kazu Hirata via Phabricator via cfe-commits
kazu updated this revision to Diff 490363.
kazu added a comment.

I've adjusted comments for findFirstSet and findLastSet to reflect the
fact that ZeroBehavior now takes only two values -- ZB_Undefined and
ZB_Max.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

Files:
  llvm/include/llvm/Support/MathExtras.h
  llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
  llvm/lib/Transforms/IPO/LowerTypeTests.cpp
  llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
  mlir/lib/Bytecode/Reader/BytecodeReader.cpp

Index: mlir/lib/Bytecode/Reader/BytecodeReader.cpp
===
--- mlir/lib/Bytecode/Reader/BytecodeReader.cpp
+++ mlir/lib/Bytecode/Reader/BytecodeReader.cpp
@@ -281,8 +281,7 @@
 // here because we only care about the first byte, and so that be actually
 // get ctz intrinsic calls when possible (the `uint8_t` overload uses a loop
 // implementation).
-uint32_t numBytes =
-llvm::countTrailingZeros(result, llvm::ZB_Undefined);
+uint32_t numBytes = llvm::countTrailingZeros(result);
 assert(numBytes > 0 && numBytes <= 7 &&
"unexpected number of trailing zeros in varint encoding");
 
Index: llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
===
--- llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+++ llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
@@ -259,8 +259,7 @@
 if (I < B.size())
   BitsUsed |= B[I];
   if (BitsUsed != 0xff)
-return (MinByte + I) * 8 +
-   countTrailingZeros(uint8_t(~BitsUsed), ZB_Undefined);
+return (MinByte + I) * 8 + countTrailingZeros(uint8_t(~BitsUsed));
 }
   } else {
 // Find a free (Size/8) byte region in each member of Used.
Index: llvm/lib/Transforms/IPO/LowerTypeTests.cpp
===
--- llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -172,7 +172,7 @@
 
   BSI.AlignLog2 = 0;
   if (Mask != 0)
-BSI.AlignLog2 = countTrailingZeros(Mask, ZB_Undefined);
+BSI.AlignLog2 = countTrailingZeros(Mask);
 
   // Build the compressed bitset while normalizing the offsets against the
   // computed alignment.
Index: llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
===
--- llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
+++ llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
@@ -180,7 +180,7 @@
   // or ADDIW. If there are trailing zeros, try generating a sign extended
   // constant with no trailing zeros and use a final SLLI to restore them.
   if ((Val & 0xfff) != 0 && (Val & 1) == 0 && Res.size() >= 2) {
-unsigned TrailingZeros = countTrailingZeros((uint64_t)Val, ZB_Undefined);
+unsigned TrailingZeros = countTrailingZeros((uint64_t)Val);
 int64_t ShiftedVal = Val >> TrailingZeros;
 // If we can use C.LI+C.SLLI instead of LUI+ADDI(W) prefer that since
 // its more compressible. But only if LUI+ADDI(W) isn't fusable.
@@ -202,7 +202,7 @@
   if (Val > 0 && Res.size() > 2) {
 assert(ActiveFeatures[RISCV::Feature64Bit] &&
"Expected RV32 to only need 2 instructions");
-unsigned LeadingZeros = countLeadingZeros((uint64_t)Val, ZB_Undefined);
+unsigned LeadingZeros = countLeadingZeros((uint64_t)Val);
 uint64_t ShiftedVal = (uint64_t)Val << LeadingZeros;
 // Fill in the bits that will be shifted out with 1s. An example where this
 // helps is trailing one masks with 32 or more ones. This will generate
Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp
===
--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -2451,8 +2451,7 @@
   unsigned PsInputBits = Info->getPSInputAddr() & Info->getPSInputEnable();
   if ((PsInputBits & 0x7F) == 0 ||
   ((PsInputBits & 0xF) == 0 && (PsInputBits >> 11 & 1)))
-Info->markPSInputEnabled(
-countTrailingZeros(Info->getPSInputAddr(), ZB_Undefined));
+Info->markPSInputEnabled(countTrailingZeros(Info->getPSInputAddr()));
 }
   } else if (IsKernel) {
 assert(Info->hasWorkGroupIDX() && Info->hasWorkItemIDX());
Index: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
===
--- llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+++ llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
@@ -701,8 +701,7 @@
   if ((PsInputBits & 0x7F) == 0 ||
   ((PsInputBits & 0xF) == 0 &&
(PsInputBits >> 11 & 1)))
-Info->markPSInputEnabled(
-  countTrailingZeros(Info->getPSInputAddr(), ZB_Undefined));
+Info->markPSInputEnabled(countTrail

[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-18 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

In D141798#4064142 , @MaskRay wrote:

> `ZB_Max` is the strange mode that should be dropped, perhaps also 
> `ZB_Undefined`.
>
> In D141798#4064114 , @arsenm wrote:
>
>>> If you care about compilation speed, you should build LLVM with an 
>>> appropriate -march= to take advantage of lzcnt and tzcnt.
>>
>> I think this is bad reasoning, nobody really uses -march
>
> I agree. The reason should be clarified that the lzcnt performance here 
> really doesn't matter.
> Note: 
> https://stackoverflow.com/questions/21390165/why-does-breaking-the-output-dependency-of-lzcnt-matter
>  lzcnt/tzcnt have false dependencies on older (pre-Skylake) Intel processors. 
> But this doesn't really matter for LLVM, at least the minor issue does not 
> justify keeping the weird mode `ZB_Undefined`.

I'm not sure the false dependency issue is relevant here. Without -march, 
compilers will emit BSR/BSF and possibly a cmov to handle the zero behavior. 
Both BSR/BSF always have a false dependency because the behavior for 0 input is 
to keep the old value of the output register. Knowing we can use lzcnt/tzcnt is 
always better than BSR/BSF.

If we know zero doesn't matter, without march, gcc and recent clang will always 
emit BSF using the encoding for TZCNT. On CPUs without TZCNT this encoding is 
treated as BSF. This allows us to use TZCNT at runtime if the CPU is modern 
even without -march being passed. We can't do the same for BSF/LZCNT because 
their outputs are inverted from each other.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-18 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added a comment.

`ZB_Max` is the strange mode that should be dropped, perhaps also 
`ZB_Undefined`.

In D141798#4064114 , @arsenm wrote:

>> If you care about compilation speed, you should build LLVM with an 
>> appropriate -march= to take advantage of lzcnt and tzcnt.
>
> I think this is bad reasoning, nobody really uses -march

I agree. The reason should be clarified that the lzcnt performance here really 
doesn't matter.
Note: 
https://stackoverflow.com/questions/21390165/why-does-breaking-the-output-dependency-of-lzcnt-matter
 lzcnt/tzcnt have false dependencies on older (pre-Skylake) Intel processors. 
But this doesn't really matter for LLVM, at least the minor issue does not 
justify keeping the weird mode `ZB_Undefined`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-18 Thread Jessica Clarke via Phabricator via cfe-commits
jrtc27 added inline comments.



Comment at: llvm/include/llvm/Support/MathExtras.h:225
 ///
 /// \param ZB the behavior on an input of 0. Only ZB_Max and ZB_Undefined are
 ///   valid arguments.

A bunch of functions still have this documentation, but these are now the only 
possible values, so this seems redundant


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-18 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment.

> If you care about compilation speed, you should build LLVM

with an appropriate -march= to take advantage of lzcnt and tzcnt.

I think this is bad reasoning, nobody really uses -march


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-18 Thread Sergei Barannikov via Phabricator via cfe-commits
barannikov88 accepted this revision.
barannikov88 added a comment.
This revision is now accepted and ready to land.

LGTM




Comment at: llvm/include/llvm/Support/MathExtras.h:212
 /// Only unsigned integral types are allowed.
-///
-/// \param ZB the behavior on an input of 0. Only ZB_Max and ZB_Undefined are
-///   valid arguments.
-template  T findFirstSet(T Val, ZeroBehavior ZB = ZB_Max) {
-  if (ZB == ZB_Max && Val == 0)
+template  T findFirstSet(T Val) {
+  if (Val == 0)

kazu wrote:
> craig.topper wrote:
> > Note, x86 does not have an efficient instruction for find first set with 
> > zero returning -1. It will require a cmov to handle zero.
> The new iteration of the patch leaves findFirstSet and findLastSet untouched.
(optional) ZB could be made a template argument and constexpr-ifed.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141798: Drop the ZeroBehavior parameter from countLeadingZeros and the like (NFC)

2023-01-18 Thread Kazu Hirata via Phabricator via cfe-commits
kazu marked an inline comment as done.
kazu added a comment.

Please take a look.  Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141798/new/

https://reviews.llvm.org/D141798

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits