[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct
mvermeulen commented on issue #13666: URL: https://github.com/apache/tvm/issues/13666#issuecomment-1413902505 > I get the error on AMD gfx908 device . The error is ValueError:Cannot find global > function tvm.contrib.miopen.conv2d.setup . > How to fix it ? What is your setting for USE_MIOPEN configuration variable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct
mvermeulen commented on issue #13666: URL: https://github.com/apache/tvm/issues/13666#issuecomment-1404483929 @masahi - some further characterization: 1. Using Radeon VII (gfx906): both the onnx_rocm.py and the from_pytorch.py work as expected. In particular, the relay/torch graphs both indicate id 281 and other results are as reported above. 2. Using RX 6900XT (gfx1030); I see similar failure to what you report above. However, if I change the target specification to be: ```target = tvm.target.Target("rocm -libs=rocblas", host="llvm")``` then it behaves the same as Radeon VII. 3. Using RX 6800m (gfx1031); then I need to set the environment variable ```HSA_OVERRIDE_GFX_VERSION=10.3.0``` and then using ```-libs=rocblas``` again causes things to pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct
mvermeulen commented on issue #13666: URL: https://github.com/apache/tvm/issues/13666#issuecomment-1397176511 @masahi most likely you are missing arguments when starting up the docker container. Here is how I run it: ``` docker run -it --device=/dev/dri --device=/dev/kfd --network=host --group-add=render -v /home/mev:/home/mev mevermeulen/rocm-tvm:5.4.2 /bin/bash ``` The --device options make sure the GPU devices are also available inside the docker image. When this is done, /dev/kfd is created and has read/write permissions by the "render" group. On my system, I happened to run as root so it worked anyways but if I were somehow running as a non-root user (either inside or outside the docker), I would want to be part of the group to get permissions to the device files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct
mvermeulen commented on issue #13666: URL: https://github.com/apache/tvm/issues/13666#issuecomment-1387643084 Tried this with the following docker image I built from latest ROCm: ``` docker pull mevermeulen/rocm-tvm:5.4.2 ``` I didn't have OpenCL built in that so I compared with CPU execution and I don't see an issue: ``` root@chilecito:/src/rocm-tvm/qa# python3 /home/mev/onnx_rocm.py [0.5488135 0.71518934 0.60276335 0.5448832 0.4236548 0.6458941 0.4375872 0.891773 0.96366274 0.3834415 ] [-0.22859086 -0.25806987 -0.43340546 0.4846983 -0.6018106 0.22698797 0.85465795 -0.9607101 0.5279621 -1.1830723 ] [-0.22859041 -0.25806972 -0.43340546 0.4846975 -0.6018108 0.2269876 0.8546581 -0.9607104 0.527962 -1.1830723 ] ``` To compare against the CPU, I modified the last part of the program as follows: ```def main(): np.random.seed(0) I_np = np.random.uniform(size = input_size).astype(dtype) print(I_np[0][0][0][:10]) onnx_model = onnx.load("/home/mev/mnist-7.onnx") mod,params = relay.frontend.from_onnx(onnx_model,{"Input3":I_np.shape}) rocm_output = build("rocm",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size) cpu_output = build("llvm",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size) #opencl_output = build("opencl",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size) print(rocm_output[0][:10]) print(cpu_output[0][:10]) #print(opencl_output[0][:10]) ``` @wangzy0327 does my docker work for you? If so, a spot you can use for comparison. Also can you cross check that your ROCm installation and driver is properly installed. For example you can try: ``` prompt% rocminfo prompt% cd /opt/rocm/share/hip/samples/0_Intro/square prompt% make prompt% cat square.out ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org