It seems fine so far running some dev workloads invoked as:
llama-server -m gpt-oss-20b-mxfp4.gguf -ngl 99 -fa on --ctx-size 0 --jinja 
--host ::0
Seeing a nice performance increase too.


Previous version - libggml-0.9.4pl20251120 & llama.cpp-0.0.7086:

llama-bench -m models/gpt-oss-20b-mxfp4.gguf -fa on -ngl 99
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 
| bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: 
KHR_coopmat
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 
| bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: 
KHR_coopmat
load_backend: loaded Vulkan backend from /usr/local/lib/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/local/lib/libggml-cpu-haswell.so
| model                          |       size |     params | backend    | ngl | 
           test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | 
--------------: | -------------------: |
| gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | Vulkan     |  99 | 
          pp512 |      1839.89 ± 34.88 |
| gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | Vulkan     |  99 | 
          tg128 |        129.02 ± 0.25 |


llama-bench -m models/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 -ngl 99 
-ncmoe 8
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 
| bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: 
KHR_coopmat
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 
| bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: 
KHR_coopmat
load_backend: loaded Vulkan backend from /usr/local/lib/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/local/lib/libggml-cpu-haswell.so
| model                          |       size |     params | backend    | ngl | 
fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | 
-: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | Vulkan     |  99 | 
 1 |           pp512 |         69.25 ± 2.18 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | Vulkan     |  99 | 
 1 |           tg128 |         16.97 ± 0.04 |



This version - libggml-0.9.5pl20260130 & llama.cpp-0.0.7883:

llama-bench -m models/gpt-oss-20b-mxfp4.gguf -fa on -ngl 99
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 
| bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: 
KHR_coopmat
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 
| bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: 
KHR_coopmat
load_backend: loaded Vulkan backend from /usr/local/lib/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/local/lib/libggml-cpu-haswell.so
| model                          |       size |     params | backend    | ngl | 
           test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | 
--------------: | -------------------: |
| gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | Vulkan     |  99 | 
          pp512 |      2405.75 ± 25.05 |
| gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | Vulkan     |  99 | 
          tg128 |        128.82 ± 0.35 |


llama-bench -m models/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 -ngl 99 
-ncmoe 8
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 
| bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: 
KHR_coopmat
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 
| bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: 
KHR_coopmat
load_backend: loaded Vulkan backend from /usr/local/lib/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/local/lib/libggml-cpu-haswell.so
| model                          |       size |     params | backend    | ngl | 
fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | 
-: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | Vulkan     |  99 | 
 1 |           pp512 |         79.29 ± 1.72 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | Vulkan     |  99 | 
 1 |           tg128 |         17.21 ± 0.10 |


Reply via email to