[PATCH] D138702: support for HIP non hostcall printf

2023-02-10 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH abandoned this revision.
vikramRH added a comment.

A new implentation is under discussion


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138702/new/

https://reviews.llvm.org/D138702

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138702: support for HIP non hostcall printf

2022-12-01 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH added a comment.

There were some issues reported where it was found that pcie atomics are not 
available in some environments, as a result hostcall and HIP-Printf were also 
not available. This Patch is intended to provide an alternative in such 
scenarios while hostcall based implementation remains the primary option.

After discussions with Sameer, I am convinced that it is better to do this 
transformation at clang CodeGen phase rather than a opt pass (similar to 
current hostcall based implementation),
Hence this patch needs rework and I shall include the review comments in the 
updated patch. (I agree on the option too, I shall think of a option name with 
a named argument)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138702/new/

https://reviews.llvm.org/D138702

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D138702: support for HIP non hostcall printf

2022-11-28 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm requested changes to this revision.
arsenm added a comment.
Herald added a subscriber: wdng.

I have a few questions. First, why surface this to users? If we really need to, 
I don't think this is the right flag name/design. A named argument to some kind 
of printf lowering flag would be better. Second, I thought we were moving 
towards hostcall printf, not away from it.




Comment at: clang/include/clang/Basic/LangOptions.def:275
 LANGOPT(OffloadingNewDriver, 1, 0, "use the new driver for generating 
offloading code.")
+LANGOPT(DelayedPrintf, 1, 0, "version onf printf function to be used, hostcall 
or buffer based")
 

Typo 'onf'



Comment at: clang/include/clang/Driver/Options.td:985
+def fdelayed_printf : Flag<["-"], "fdelayed-printf">,
+  HelpText<"Specifies which version of printf is to be used while CodeGen">,
+  Flags<[CC1Option]>,

Help text reads weird



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:4655-4667
+
+if (JA.isDeviceOffloading(Action::OFK_HIP)) {
+  // Device side compilation printf
+  if (Args.getLastArg(options::OPT_fdelayed_printf))
+CmdArgs.push_back("-fdelayed-printf");
+}
   }

Missing clang side tests



Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:135-136
 
-bool AMDGPUPrintfRuntimeBindingImpl::shouldPrintAsStr(char Specifier,
-  Type *OpType) const {
-  if (Specifier != 's')
-return false;
-  const PointerType *PT = dyn_cast(OpType);
-  if (!PT || PT->getAddressSpace() != AMDGPUAS::CONSTANT_ADDRESS)
-return false;
-  Type *ElemType = PT->getContainedType(0);
-  if (ElemType->getTypeID() != Type::IntegerTyID)
-return false;
-  IntegerType *ElemIType = cast(ElemType);
-  return ElemIType->getBitWidth() == 8;
+// This function is essentially a copy from the file
+// Transforms/Utils/AMDGPUEmitPrintf.cpp
+static Value *getStrlenWithNull(IRBuilder<> , Value *Str) {

Why not share these? They should be in the same place. This should also be a 
separate change from any flag changes



Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:141
+
+  auto CharZero = Builder.getInt8(0);
+  auto One = Builder.getInt64(1);

auto *



Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:150
+  BasicBlock *Join = nullptr;
+  if (Prev->getTerminator()) {
+Join = Prev->splitBasicBlock(Builder.GetInsertPoint(), "strlen.join");

Why would the block be missing a terminator here?



Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:183-185
+  auto Begin = Builder.CreatePtrToInt(Str, Int64Ty);
+  auto End = Builder.CreatePtrToInt(PtrPhi, Int64Ty);
+  auto Len = Builder.CreateSub(End, Begin);

Really should try hard to avoid introducing ptrtoint. Do you really need to do 
a pointer subtract?



Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:215
+  StringRef Str;
+  llvm::Value *RealSize;
+  llvm::Value *AlignedSize;

Don't need llvm::



Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:439
 
-  Type *idPointer = PointerType::get(I32Ty, AMDGPUAS::GLOBAL_ADDRESS);
+  Type *idPointer = PointerType::get(Int32Ty, AMDGPUAS::GLOBAL_ADDRESS);
   Value *id_gep_cast =

Broken for opaque pointers?



Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:443
 
-  new StoreInst(ConstantInt::get(I32Ty, UniqID), id_gep_cast, Brnch);
+  new StoreInst(ConstantInt::get(Int32Ty, UniqID), id_gep_cast, Brnch);
 

Why isn't this using the IRBuilder?



Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:491-492
+  if (S[0]) {
+char *MyNewStr = new char[NSizeStr]();
+strcpy(MyNewStr, S.str().c_str());
+int NumInts = NSizeStr / 4;

Why do we have raw new and strcpy here? We have better string classes



Comment at: llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll:1
+; RUN: opt -mtriple=amdgcn--amdhsa -passes=amdgpu-printf-runtime-binding -S < 
%s | FileCheck --check-prefix=FUNC --check-prefix=GCN --check-prefix=METADATA %s
+

Don't need these separate prefixes



Comment at: llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll:3
+
+; FUNC-LABEL: @test_kernel(
+; GCN-LABEL: entry

Should use generated checks



Comment at: llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll:39
+  store i32 25, ptr %p.ascast, align 4
+  %0 = load i32, ptr %p.ascast, align 4
+  %cmp = icmp sgt i32 %0, 30

Use named values in tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  

[PATCH] D138702: support for HIP non hostcall printf

2022-11-25 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH created this revision.
vikramRH added reviewers: sameerds, b-sumner, yaxunl.
vikramRH added a project: LLVM.
Herald added subscribers: kosarev, foad, kerbowa, hiraditya, Anastasia, 
jvesely, arsenm.
Herald added a project: All.
vikramRH requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, MaskRay.
Herald added a project: clang.

The patch essentially ports the existing non-hostcall printf support in OpenCL 
to HIP (to be controlled via new option "-fdelayed-printf"), with following 
changes

1. Code refactoring -> we now use API's "getConstantStringInfo()" to extract 
constant string contents at compile time.
2. Support to print non-const null terminated strings, required in HIP context. 
This is achieved as follows
  - calculate string size using a function "getStrlenWithNull()" and reserve 
the space in printf buffer using __printf_alloc() (as was the case in OpenCL, 
but number of bytes allocated could be dynamic now)
  - copy the string contents to buffer using previously calculated size and the 
pointer to string (a memcpy intrinsic is generated here)


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D138702

Files:
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/Builtins.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
  llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll
  llvm/test/CodeGen/AMDGPU/opencl-printf.ll

Index: llvm/test/CodeGen/AMDGPU/opencl-printf.ll
===
--- llvm/test/CodeGen/AMDGPU/opencl-printf.ll
+++ llvm/test/CodeGen/AMDGPU/opencl-printf.ll
@@ -9,18 +9,17 @@
 ; R600: call i32 (i8 addrspace(2)*, ...) @printf(i8 addrspace(2)* getelementptr inbounds ([6 x i8], [6 x i8] addrspace(2)* @.str, i32 0, i32 0), i8* %arraydecay, i32 %n)
 ; GCN-LABEL: entry
 ; GCN: call i8 addrspace(1)* @__printf_alloc
-; GCN-LABEL: entry.split
+; GCN-LABEL: strlen.join.split
 ; GCN: icmp ne i8 addrspace(1)* %printf_alloc_fn, null
+; GCN: br i1 %14, label %15, label %16
 ; GCN: %PrintBuffID = getelementptr i8, i8 addrspace(1)* %printf_alloc_fn, i32 0
 ; GCN: %PrintBuffIdCast = bitcast i8 addrspace(1)* %PrintBuffID to i32 addrspace(1)*
-; GCN: store i32 1, i32 addrspace(1)* %PrintBuffIdCast
+; GCN: store i32 1, i32 addrspace(1)* %PrintBuffIdCast, align 4
 ; GCN: %PrintBuffGep = getelementptr i8, i8 addrspace(1)* %printf_alloc_fn, i32 4
-; GCN: %PrintArgPtr = ptrtoint i8* %arraydecay to i64
-; GCN: %PrintBuffPtrCast = bitcast i8 addrspace(1)* %PrintBuffGep to i64 addrspace(1)*
-; GCN: store i64 %PrintArgPtr, i64 addrspace(1)* %PrintBuffPtrCast
-; GCN: %PrintBuffNextPtr = getelementptr i8, i8 addrspace(1)* %PrintBuffGep, i32 8
-; GCN: %PrintBuffPtrCast1 = bitcast i8 addrspace(1)* %PrintBuffNextPtr to i32 addrspace(1)*
-; GCN: store i32 %n, i32 addrspace(1)* %PrintBuffPtrCast1
+; GCN: call void @llvm.memcpy.p1i8.p0i8.i64(i8 addrspace(1)* %PrintBuffGep, i8* %arraydecay, i64 %9, i1 false)
+; GCN: %PrintBuffNextPtr = getelementptr i8, i8 addrspace(1)* %PrintBuffGep, i64 %11
+; GCN: %PrintBuffPtrCast = bitcast i8 addrspace(1)* %PrintBuffNextPtr to i32 addrspace(1)*
+; GCN: store i32 %n, i32 addrspace(1)* %PrintBuffPtrCast, align 4
 
 @.str = private unnamed_addr addrspace(2) constant [6 x i8] c"%s:%d\00", align 1
 
Index: llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll
@@ -0,0 +1,57 @@
+; RUN: opt -mtriple=amdgcn--amdhsa -passes=amdgpu-printf-runtime-binding -S < %s | FileCheck --check-prefix=FUNC --check-prefix=GCN --check-prefix=METADATA %s
+
+; FUNC-LABEL: @test_kernel(
+; GCN-LABEL: entry
+; GCN-LABEL: strlen.while
+; GCN: br i1 %6, label %strlen.while.done, label %strlen.while
+; GCN-LABEL: strlen.join
+; GCN: %12 = add i64 %11, 3
+; GCN: %13 = and i64 %12, 4294967292
+; GCN: %14 = add i64 %13, 4
+; GCN: %15 = trunc i64 %14 to i32
+; GCN: %printf_alloc_fn = call ptr addrspace(1) @__printf_alloc(i32 %15)
+; GCN-LABEL: strlen.join.split
+; GCN: %16 = icmp ne ptr addrspace(1) %printf_alloc_fn, null
+; GCN: br i1 %16, label %17, label %18
+; GCN: %PrintBuffID = getelementptr i8, ptr addrspace(1) %printf_alloc_fn, i32 0
+; GCN: %PrintBuffIdCast = bitcast ptr addrspace(1) %PrintBuffID to ptr addrspace(1)
+; GCN: store i32 1, ptr addrspace(1) %PrintBuffIdCast, align 4
+; GCN: %PrintBuffGep = getelementptr i8, ptr addrspace(1) %printf_alloc_fn, i32 4
+; GCN: call void @llvm.memcpy.p1.p0.i64(ptr addrspace(1) %PrintBuffGep, ptr %1, i64 %11, i1 false)
+; GCN: %PrintBuffNextPtr = getelementptr i8, ptr addrspace(1) %PrintBuffGep, i64 %13
+; GCN: br label %18
+
+; METADATA: !llvm.printf.fmts = !{!0}
+; METADATA: !0 = !{!"1:1:8:%s"}
+
+@.str = private unnamed_addr addrspace(4) constant [3 x i8] c"%s\00", align 1
+@.str.1 = private unnamed_addr addrspace(4) constant [6 x i8]