[PATCH] D138702: support for HIP non hostcall printf
vikramRH abandoned this revision. vikramRH added a comment. A new implentation is under discussion Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D138702/new/ https://reviews.llvm.org/D138702 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D138702: support for HIP non hostcall printf
vikramRH added a comment. There were some issues reported where it was found that pcie atomics are not available in some environments, as a result hostcall and HIP-Printf were also not available. This Patch is intended to provide an alternative in such scenarios while hostcall based implementation remains the primary option. After discussions with Sameer, I am convinced that it is better to do this transformation at clang CodeGen phase rather than a opt pass (similar to current hostcall based implementation), Hence this patch needs rework and I shall include the review comments in the updated patch. (I agree on the option too, I shall think of a option name with a named argument) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D138702/new/ https://reviews.llvm.org/D138702 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D138702: support for HIP non hostcall printf
arsenm requested changes to this revision. arsenm added a comment. Herald added a subscriber: wdng. I have a few questions. First, why surface this to users? If we really need to, I don't think this is the right flag name/design. A named argument to some kind of printf lowering flag would be better. Second, I thought we were moving towards hostcall printf, not away from it. Comment at: clang/include/clang/Basic/LangOptions.def:275 LANGOPT(OffloadingNewDriver, 1, 0, "use the new driver for generating offloading code.") +LANGOPT(DelayedPrintf, 1, 0, "version onf printf function to be used, hostcall or buffer based") Typo 'onf' Comment at: clang/include/clang/Driver/Options.td:985 +def fdelayed_printf : Flag<["-"], "fdelayed-printf">, + HelpText<"Specifies which version of printf is to be used while CodeGen">, + Flags<[CC1Option]>, Help text reads weird Comment at: clang/lib/Driver/ToolChains/Clang.cpp:4655-4667 + +if (JA.isDeviceOffloading(Action::OFK_HIP)) { + // Device side compilation printf + if (Args.getLastArg(options::OPT_fdelayed_printf)) +CmdArgs.push_back("-fdelayed-printf"); +} } Missing clang side tests Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:135-136 -bool AMDGPUPrintfRuntimeBindingImpl::shouldPrintAsStr(char Specifier, - Type *OpType) const { - if (Specifier != 's') -return false; - const PointerType *PT = dyn_cast(OpType); - if (!PT || PT->getAddressSpace() != AMDGPUAS::CONSTANT_ADDRESS) -return false; - Type *ElemType = PT->getContainedType(0); - if (ElemType->getTypeID() != Type::IntegerTyID) -return false; - IntegerType *ElemIType = cast(ElemType); - return ElemIType->getBitWidth() == 8; +// This function is essentially a copy from the file +// Transforms/Utils/AMDGPUEmitPrintf.cpp +static Value *getStrlenWithNull(IRBuilder<> , Value *Str) { Why not share these? They should be in the same place. This should also be a separate change from any flag changes Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:141 + + auto CharZero = Builder.getInt8(0); + auto One = Builder.getInt64(1); auto * Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:150 + BasicBlock *Join = nullptr; + if (Prev->getTerminator()) { +Join = Prev->splitBasicBlock(Builder.GetInsertPoint(), "strlen.join"); Why would the block be missing a terminator here? Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:183-185 + auto Begin = Builder.CreatePtrToInt(Str, Int64Ty); + auto End = Builder.CreatePtrToInt(PtrPhi, Int64Ty); + auto Len = Builder.CreateSub(End, Begin); Really should try hard to avoid introducing ptrtoint. Do you really need to do a pointer subtract? Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:215 + StringRef Str; + llvm::Value *RealSize; + llvm::Value *AlignedSize; Don't need llvm:: Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:439 - Type *idPointer = PointerType::get(I32Ty, AMDGPUAS::GLOBAL_ADDRESS); + Type *idPointer = PointerType::get(Int32Ty, AMDGPUAS::GLOBAL_ADDRESS); Value *id_gep_cast = Broken for opaque pointers? Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:443 - new StoreInst(ConstantInt::get(I32Ty, UniqID), id_gep_cast, Brnch); + new StoreInst(ConstantInt::get(Int32Ty, UniqID), id_gep_cast, Brnch); Why isn't this using the IRBuilder? Comment at: llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp:491-492 + if (S[0]) { +char *MyNewStr = new char[NSizeStr](); +strcpy(MyNewStr, S.str().c_str()); +int NumInts = NSizeStr / 4; Why do we have raw new and strcpy here? We have better string classes Comment at: llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll:1 +; RUN: opt -mtriple=amdgcn--amdhsa -passes=amdgpu-printf-runtime-binding -S < %s | FileCheck --check-prefix=FUNC --check-prefix=GCN --check-prefix=METADATA %s + Don't need these separate prefixes Comment at: llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll:3 + +; FUNC-LABEL: @test_kernel( +; GCN-LABEL: entry Should use generated checks Comment at: llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll:39 + store i32 25, ptr %p.ascast, align 4 + %0 = load i32, ptr %p.ascast, align 4 + %cmp = icmp sgt i32 %0, 30 Use named values in tests Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION
[PATCH] D138702: support for HIP non hostcall printf
vikramRH created this revision. vikramRH added reviewers: sameerds, b-sumner, yaxunl. vikramRH added a project: LLVM. Herald added subscribers: kosarev, foad, kerbowa, hiraditya, Anastasia, jvesely, arsenm. Herald added a project: All. vikramRH requested review of this revision. Herald added subscribers: llvm-commits, cfe-commits, MaskRay. Herald added a project: clang. The patch essentially ports the existing non-hostcall printf support in OpenCL to HIP (to be controlled via new option "-fdelayed-printf"), with following changes 1. Code refactoring -> we now use API's "getConstantStringInfo()" to extract constant string contents at compile time. 2. Support to print non-const null terminated strings, required in HIP context. This is achieved as follows - calculate string size using a function "getStrlenWithNull()" and reserve the space in printf buffer using __printf_alloc() (as was the case in OpenCL, but number of bytes allocated could be dynamic now) - copy the string contents to buffer using previously calculated size and the pointer to string (a memcpy intrinsic is generated here) Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D138702 Files: clang/include/clang/Basic/LangOptions.def clang/include/clang/Driver/Options.td clang/lib/Basic/Builtins.cpp clang/lib/Driver/ToolChains/Clang.cpp llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll llvm/test/CodeGen/AMDGPU/opencl-printf.ll Index: llvm/test/CodeGen/AMDGPU/opencl-printf.ll === --- llvm/test/CodeGen/AMDGPU/opencl-printf.ll +++ llvm/test/CodeGen/AMDGPU/opencl-printf.ll @@ -9,18 +9,17 @@ ; R600: call i32 (i8 addrspace(2)*, ...) @printf(i8 addrspace(2)* getelementptr inbounds ([6 x i8], [6 x i8] addrspace(2)* @.str, i32 0, i32 0), i8* %arraydecay, i32 %n) ; GCN-LABEL: entry ; GCN: call i8 addrspace(1)* @__printf_alloc -; GCN-LABEL: entry.split +; GCN-LABEL: strlen.join.split ; GCN: icmp ne i8 addrspace(1)* %printf_alloc_fn, null +; GCN: br i1 %14, label %15, label %16 ; GCN: %PrintBuffID = getelementptr i8, i8 addrspace(1)* %printf_alloc_fn, i32 0 ; GCN: %PrintBuffIdCast = bitcast i8 addrspace(1)* %PrintBuffID to i32 addrspace(1)* -; GCN: store i32 1, i32 addrspace(1)* %PrintBuffIdCast +; GCN: store i32 1, i32 addrspace(1)* %PrintBuffIdCast, align 4 ; GCN: %PrintBuffGep = getelementptr i8, i8 addrspace(1)* %printf_alloc_fn, i32 4 -; GCN: %PrintArgPtr = ptrtoint i8* %arraydecay to i64 -; GCN: %PrintBuffPtrCast = bitcast i8 addrspace(1)* %PrintBuffGep to i64 addrspace(1)* -; GCN: store i64 %PrintArgPtr, i64 addrspace(1)* %PrintBuffPtrCast -; GCN: %PrintBuffNextPtr = getelementptr i8, i8 addrspace(1)* %PrintBuffGep, i32 8 -; GCN: %PrintBuffPtrCast1 = bitcast i8 addrspace(1)* %PrintBuffNextPtr to i32 addrspace(1)* -; GCN: store i32 %n, i32 addrspace(1)* %PrintBuffPtrCast1 +; GCN: call void @llvm.memcpy.p1i8.p0i8.i64(i8 addrspace(1)* %PrintBuffGep, i8* %arraydecay, i64 %9, i1 false) +; GCN: %PrintBuffNextPtr = getelementptr i8, i8 addrspace(1)* %PrintBuffGep, i64 %11 +; GCN: %PrintBuffPtrCast = bitcast i8 addrspace(1)* %PrintBuffNextPtr to i32 addrspace(1)* +; GCN: store i32 %n, i32 addrspace(1)* %PrintBuffPtrCast, align 4 @.str = private unnamed_addr addrspace(2) constant [6 x i8] c"%s:%d\00", align 1 Index: llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll === --- /dev/null +++ llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll @@ -0,0 +1,57 @@ +; RUN: opt -mtriple=amdgcn--amdhsa -passes=amdgpu-printf-runtime-binding -S < %s | FileCheck --check-prefix=FUNC --check-prefix=GCN --check-prefix=METADATA %s + +; FUNC-LABEL: @test_kernel( +; GCN-LABEL: entry +; GCN-LABEL: strlen.while +; GCN: br i1 %6, label %strlen.while.done, label %strlen.while +; GCN-LABEL: strlen.join +; GCN: %12 = add i64 %11, 3 +; GCN: %13 = and i64 %12, 4294967292 +; GCN: %14 = add i64 %13, 4 +; GCN: %15 = trunc i64 %14 to i32 +; GCN: %printf_alloc_fn = call ptr addrspace(1) @__printf_alloc(i32 %15) +; GCN-LABEL: strlen.join.split +; GCN: %16 = icmp ne ptr addrspace(1) %printf_alloc_fn, null +; GCN: br i1 %16, label %17, label %18 +; GCN: %PrintBuffID = getelementptr i8, ptr addrspace(1) %printf_alloc_fn, i32 0 +; GCN: %PrintBuffIdCast = bitcast ptr addrspace(1) %PrintBuffID to ptr addrspace(1) +; GCN: store i32 1, ptr addrspace(1) %PrintBuffIdCast, align 4 +; GCN: %PrintBuffGep = getelementptr i8, ptr addrspace(1) %printf_alloc_fn, i32 4 +; GCN: call void @llvm.memcpy.p1.p0.i64(ptr addrspace(1) %PrintBuffGep, ptr %1, i64 %11, i1 false) +; GCN: %PrintBuffNextPtr = getelementptr i8, ptr addrspace(1) %PrintBuffGep, i64 %13 +; GCN: br label %18 + +; METADATA: !llvm.printf.fmts = !{!0} +; METADATA: !0 = !{!"1:1:8:%s"} + +@.str = private unnamed_addr addrspace(4) constant [3 x i8] c"%s\00", align 1 +@.str.1 = private unnamed_addr addrspace(4) constant [6 x i8]