** Description changed:

  [SRU Justification]
  
  [ Impact ]
  
  PyTorch running on an APU may report out of memory if 1/2 system memory
  is reserved for VRAM.
  
  This can be identified with `rocminfo` output:
  
  1. for the default setup, 512MB for VRAM on a 64G RAM AMD Strix Halo
  development board:
  
  ```
  $ sudo  journalctl -b -1 | grep -B1 GTT
  Aug 15 18:47:46 test kernel: [drm] amdgpu: 512M of VRAM memory ready
  Aug 15 18:47:46 test kernel: [drm] amdgpu: 31822M of GTT memory ready.
  $ free -h
-               total        used        free      shared  buff/cache   
available
+               total        used        free      shared  buff/cache   
available
  Mem:            62Gi       2.1Gi       3.5Gi        44Mi        57Gi        
60Gi
  Swap:             0B          0B          0B
  
  *******
  Agent 2
  *******
-  Name:                    gfx1151
-  Uuid:                    GPU-XX
-  Marketing Name:          AMD Radeon Graphics
+  Name:                    gfx1151
+  Uuid:                    GPU-XX
+  Marketing Name:          AMD Radeon Graphics
  ...
-   Pool Info:
-    Pool 1
-      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
-      Size:                    32586284(0x1f13a2c) KB
-      Allocatable:             TRUE
-      Alloc Granule:           4KB
-      Alloc Recommended Granule:2048KB
-      Alloc Alignment:         4KB
-      Accessible by all:       FALSE
+   Pool Info:
+    Pool 1
+      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
+      Size:                    32586284(0x1f13a2c) KB
+      Allocatable:             TRUE
+      Alloc Granule:           4KB
+      Alloc Recommended Granule:2048KB
+      Alloc Alignment:         4KB
+      Accessible by all:       FALSE
  ```
  The pool 1 has 31822 MB = 32586284 KB.
  
  With dedicated VRAM set to 32GB:
  ```
  $ sudo dmesg | grep -B1 GTT
  [    3.640984] [drm] amdgpu: 32768M of VRAM memory ready
  [    3.640986] [drm] amdgpu: 15970M of GTT memory ready.
  $ free -h
-               total        used        free      shared  buff/cache   
available
+               total        used        free      shared  buff/cache   
available
  Mem:            31Gi       1.6Gi        28Gi        44Mi       1.3Gi        
29Gi
  Swap:             0B          0B          0B
  
  *******
  Agent 2
  *******
-  Name:                    gfx1151
-  Uuid:                    GPU-XX
-  Marketing Name:          AMD Radeon Graphics
+  Name:                    gfx1151
+  Uuid:                    GPU-XX
+  Marketing Name:          AMD Radeon Graphics
  ...
-   Pool Info:
-    Pool 1
-      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
-      Size:                    16353840(0xf98a30) KB
-      Allocatable:             TRUE
-      Alloc Granule:           4KB
-      Alloc Recommended Granule:2048KB
-      Alloc Alignment:         4KB
-      Accessible by all:       FALSE
+   Pool Info:
+    Pool 1
+      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
+      Size:                    16353840(0xf98a30) KB
+      Allocatable:             TRUE
+      Alloc Granule:           4KB
+      Alloc Recommended Granule:2048KB
+      Alloc Alignment:         4KB
+      Accessible by all:       FALSE
  ```
  In this case, while we have 32768 MB = 33554432 KB VRAM, but the pool 1 still 
allocates 15970 MB = 16353840 KB from GTT.
  
  [ Test Plan ]
  
  1. Follow https://rocm.docs.amd.com/projects/install-on-
  linux/en/latest/install/quick-start.html to install & setup necessary
  host environment for ROCm.
  
  2. Follow https://rocm.docs.amd.com/projects/install-on-
  linux/en/latest/install/3rd-party/pytorch-install.html#using-docker-
  with-pytorch-pre-installed to use prebuilt PyTorch image for easy
  verification. Or, while we need only `rocminfo` to identify the
  behavior, one may install rocminfo snap instead to minimize the effort.
  
- 3. Setup
+ 3. Assign more memory to VRAM. On AMD Strix Halo development board, it's
+ in: Device Manager => AMD CBS => NBIO Common Options => GFX
+ Configuration => Dedicated Graphics Memory. On the development board we
+ have 64 GB RAM, and the options available are "High (32 GB)", "Medium
+ (16 GB)", "Minimum (0.5 GB)".
  
  4. Use `rocminfo` to identify if the allocation is now switched to VRAM:
  
  ```
  *******
  Agent 2
  *******
-  Name:                    gfx1151
-  Uuid:                    GPU-XX
-  Marketing Name:          AMD Radeon Graphics
+  Name:                    gfx1151
+  Uuid:                    GPU-XX
+  Marketing Name:          AMD Radeon Graphics
  ...
-   Pool Info:
-    Pool 1
-      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
-      Size:                    33554432(0x2000000) KB
-      Allocatable:             TRUE
-      Alloc Granule:           4KB
-      Alloc Recommended Granule:2048KB
-      Alloc Alignment:         4KB
-      Accessible by all:       FALSE
+   Pool Info:
+    Pool 1
+      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
+      Size:                    33554432(0x2000000) KB
+      Allocatable:             TRUE
+      Alloc Granule:           4KB
+      Alloc Recommended Granule:2048KB
+      Alloc Alignment:         4KB
+      Accessible by all:       FALSE
  ```
  With kernel patched, the allocated memory size is now 33554432 KB = 32768 MB.
  
  [ Where problems could occur ]
  
  This corrects the allocation source as expected.
  
  [ Other Info ]
  
  Nominate for Plucky for 6.14, and Noble for oem-6.14.
  
  ========== original bug report ==========
  When running PyTorch on an APU it reports wrong amount of memory and models 
can't run.
  
  torch.OutOfMemoryError: HIP out of memory. Tried to allocate 18.00 MiB.
  GPU 0 has a total capacity of 15.60 GiB of which 8.09 MiB is free. Of
  the allocated memory 15.10 GiB is allocated by PyTorch, and 195.37 MiB
  is reserved by PyTorch but unallocated. If reserved but unallocated
  memory is large try setting
  PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.
  See documentation for Memory Management
  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
  
  These two commits need to be backported into amdkfd to fix it.
  
  commit 8b0d068e7dd1 ("drm/amdkfd: add a new flag to manage where VRAM 
allocations go")
  commit 759e764f7d58 ("drm/amdkfd: use GTT for VRAM on APUs only if GTT is 
larger")

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2120454

Title:
  Pytorch reports incorrect GPU memory causing "HIP Out of Memory"
  errors

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2120454/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to