[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct

2023-02-02 Thread via GitHub


mvermeulen commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1413902505

   > I get the error on AMD gfx908 device . The error is ValueError:Cannot find 
global
   > function tvm.contrib.miopen.conv2d.setup .
   > How to fix it ?
   
   What is your setting for USE_MIOPEN configuration variable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct

2023-01-25 Thread via GitHub


mvermeulen commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1404483929

   @masahi - some further characterization:
   
   1. Using Radeon VII (gfx906): both the onnx_rocm.py and the from_pytorch.py 
work as expected.  In particular, the relay/torch graphs both indicate id 281 
and other results are as reported above.
   2. Using RX 6900XT (gfx1030); I see similar failure to what you report 
above.  However, if I change the target specification to be: ```target = 
tvm.target.Target("rocm -libs=rocblas", host="llvm")``` then it behaves the 
same as Radeon VII.
   3. Using RX 6800m (gfx1031); then I need to set the environment variable 
```HSA_OVERRIDE_GFX_VERSION=10.3.0``` and then using ```-libs=rocblas``` again 
causes things to pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct

2023-01-19 Thread GitBox


mvermeulen commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1397176511

   @masahi most likely you are missing arguments when starting up the docker 
container.  Here is how I run it:
   ```
   docker run -it --device=/dev/dri --device=/dev/kfd --network=host 
--group-add=render -v /home/mev:/home/mev mevermeulen/rocm-tvm:5.4.2 /bin/bash
   ```
   
   The --device options make sure the GPU devices are also available inside the 
docker image.  When this is done, /dev/kfd is created and has read/write 
permissions by the "render" group.  On my system, I happened to run as root so 
it worked anyways but if I were somehow running as a non-root user (either 
inside or outside the docker), I would want to be part of the group to get 
permissions to the device files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct

2023-01-18 Thread GitBox


mvermeulen commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1387643084

   Tried this with the following docker image I built from latest ROCm:
   ```
   docker pull mevermeulen/rocm-tvm:5.4.2
   ```
   
   I didn't have OpenCL built in that so I compared with CPU execution and I 
don't see an issue:
   ```
   root@chilecito:/src/rocm-tvm/qa# python3 /home/mev/onnx_rocm.py 
   [0.5488135  0.71518934 0.60276335 0.5448832  0.4236548  0.6458941
0.4375872  0.891773   0.96366274 0.3834415 ]
   [-0.22859086 -0.25806987 -0.43340546  0.4846983  -0.6018106   0.22698797
 0.85465795 -0.9607101   0.5279621  -1.1830723 ]
   [-0.22859041 -0.25806972 -0.43340546  0.4846975  -0.6018108   0.2269876
 0.8546581  -0.9607104   0.527962   -1.1830723 ]
   ```
   
   To compare against the CPU, I modified the last part of the program as 
follows:
   ```def main():
   np.random.seed(0)
   I_np = np.random.uniform(size = input_size).astype(dtype)
   print(I_np[0][0][0][:10])
   onnx_model = onnx.load("/home/mev/mnist-7.onnx")
   mod,params = relay.frontend.from_onnx(onnx_model,{"Input3":I_np.shape})
   rocm_output = build("rocm",mod = mod,params = params,input_name = 
input_name,input_data = I_np, input = I_np.shape, output = output_size)
   cpu_output = build("llvm",mod = mod,params = params,input_name = 
input_name,input_data = I_np, input = I_np.shape, output = output_size)
   #opencl_output = build("opencl",mod = mod,params = params,input_name = 
input_name,input_data = I_np, input = I_np.shape, output = output_size)
   print(rocm_output[0][:10])
   print(cpu_output[0][:10])
   #print(opencl_output[0][:10])
   ```
   
   @wangzy0327 does my docker work for you?  If so, a spot you can use for 
comparison.
   
   Also can you cross check that your ROCm installation and driver is properly 
installed.  For example you can try:
   ```
   prompt% rocminfo
   
   prompt% cd /opt/rocm/share/hip/samples/0_Intro/square
   prompt% make
   prompt% cat square.out
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org