[Bug 2146822] [NEW] llama-cli segfaults on startup

Tim Flink Mon, 30 Mar 2026 09:55:38 -0700

Public bug reported:

When trying to run a model from huggingface, llama-cli segfaults on
startup. I have only tested it on gfx1201 for the moment, will test the
other hardware I have access to shortly.


## CLI Output

# llama-cli -st -hf ggml-org/gemma-3-1b-it-GGUF -p "If this is working, please 
output the exact phrase 'this appears to be working'"  
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 32624 MiB):                   
                
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32, 
VRAM: 32624 MiB  
load_backend: loaded ROCm backend from 
/usr/lib/x86_64-linux-gnu/ggml/backends0/libggml-hip.so                         
                                                                         
load_backend: loaded CPU backend from 
/usr/lib/x86_64-linux-gnu/ggml/backends0/libggml-cpu-haswell.so
common_download_file_single_online: no previous model file found 
/root/.cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_preset.ini
common_download_file_single_online: HEAD failed, status: 404                    
                                                                                
                                
no remote preset found, skipping                                                
                
common_download_file_single_online: no previous model file found 
/root/.cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf
common_download_file_single_online: downloading from 
https://huggingface.co/ggml-org/gemma-3-1b-it-GGUF/resolve/main/gemma-3-1b-it-Q4_K_M.gguf
 to /root/.cache/llama.cpp/ggml-org_gemma-3-1b-it-
GGUF_gemma-3-1b-it-Q4_K_M.gguf.downloadInProgress 
(etag:"107078f2011b8db626bee8040bb2bf82aa23ff7f5a81c786f3cf58dbcd75db2e")...
[==================================================] 100%  (768 MB / 768 MB)    
                                                                                
                                
                                                                                
                                                                                
                                
Loading model... -Segmentation fault         (core dumped) llama-cli -st -hf 
ggml-org/gemma-3-1b-it-GGUF -p "If this is working, please output the exact 
phrase 'this appears to be working'"


## GDB Backtrace

Thread 1 "llama-cli" received signal SIGSEGV, Segmentation fault.
0x00007fffac71ffcb in hip::ihipLaunchKernel_validate (f=f@entry=0x7ffff6ddfa68, 
launch_params=..., kernelParams=kernelParams@entry=0x7fffffff38e0, 
extra=extra@entry=0x0, 
    deviceId=deviceId@entry=0, params=params@entry=0) at 
/usr/src/rocm-hipamd-7.1.0-0ubuntu2/rocclr/platform/kernel.hpp:85
warning: 85     /usr/src/rocm-hipamd-7.1.0-0ubuntu2/rocclr/platform/kernel.hpp: 
No such file or directory
(gdb) bt
#0  0x00007fffac71ffcb in hip::ihipLaunchKernel_validate 
(f=f@entry=0x7ffff6ddfa68, launch_params=..., 
kernelParams=kernelParams@entry=0x7fffffff38e0, extra=extra@entry=0x0, 
    deviceId=deviceId@entry=0, params=params@entry=0) at 
/usr/src/rocm-hipamd-7.1.0-0ubuntu2/rocclr/platform/kernel.hpp:85
#1  0x00007fffac72048d in hip::ihipModuleLaunchKernel (f=0x7ffff6ddfa68, 
launch_params=..., hStream=0x5555562716b0, 
kernelParams=kernelParams@entry=0x7fffffff38e0, extra=extra@entry=0x0, 
    startEvent=startEvent@entry=0x0, stopEvent=0x0, flags=0, params=0, 
gridId=0, numGrids=0, prevGridSum=0, allGridSum=0, firstDevice=0)
    at /usr/src/rocm-hipamd-7.1.0-0ubuntu2/hipamd/src/hip_module.cpp:467
#2  0x00007fffac77deb4 in hip::ihipLaunchKernel (hostFunction=0x7ffff6ddfa68, 
gridDim=..., blockDim=..., args=0x7fffffff38e0, sharedMemBytes=0, 
stream=<optimized out>, startEvent=0x0, 
    stopEvent=0x0, flags=0) at 
/usr/src/rocm-hipamd-7.1.0-0ubuntu2/hipamd/src/hip_platform.cpp:677
#3  0x00007fffac71fe31 in hip::hipLaunchKernel_common (hostFunction=<optimized 
out>, hostFunction@entry=0x7ffff6ddfa68, gridDim=..., blockDim=..., 
args=<optimized out>, 
    args@entry=0x7fffffff38e0, sharedMemBytes=<optimized out>, 
stream=<optimized out>) at 
/usr/src/rocm-hipamd-7.1.0-0ubuntu2/hipamd/src/hip_module.cpp:819
#4  0x00007fffac73bae2 in hip::hipLaunchKernel (hostFunction=<optimized out>, 
gridDim=..., blockDim=..., args=<optimized out>, sharedMemBytes=<optimized 
out>, stream=<optimized out>)
    at /usr/src/rocm-hipamd-7.1.0-0ubuntu2/hipamd/src/hip_module.cpp:826
#5  0x00007fffb7e215e1 in ggml_cuda_op_scale(ggml_backend_cuda_context&, 
ggml_tensor*) () from /usr/lib/x86_64-linux-gnu/ggml/backends0/libggml-hip.so
#6  0x00007fffb7d43709 in ?? () from 
/usr/lib/x86_64-linux-gnu/ggml/backends0/libggml-hip.so
#7  0x00007fffb7d41f41 in ?? () from 
/usr/lib/x86_64-linux-gnu/ggml/backends0/libggml-hip.so
#8  0x00007ffff75161d3 in ggml_backend_sched_compute_splits 
(sched=0x555556258d30) at /usr/src/ggml-0.9.8-3/src/ggml-backend.cpp:1582
#9  ggml_backend_sched_graph_compute_async (sched=0x555556258d30, 
graph=<optimized out>) at /usr/src/ggml-0.9.8-3/src/ggml-backend.cpp:1805
#10 0x00007ffff7601781 in llama_context::graph_compute 
(this=this@entry=0x5555561c1cf0, gf=0x5555566243f0, batched=<optimized out>) at 
/usr/include/c++/15/bits/unique_ptr.h:192
#11 0x00007ffff76029b3 in llama_context::process_ubatch 
(this=this@entry=0x5555561c1cf0, ubatch=..., 
gtype=gtype@entry=LLM_GRAPH_TYPE_DECODER, mctx=mctx@entry=0x5555562e6b40, 
    ret=@0x7fffffff974c: 892221235) at 
/usr/src/llama.cpp-8064+dfsg-1ubuntu1/src/llama-context.cpp:1162
#12 0x00007ffff7606d99 in llama_context::decode (this=0x5555561c1cf0, 
batch_inp=...) at 
/usr/src/llama.cpp-8064+dfsg-1ubuntu1/src/llama-context.cpp:1620
#13 0x00007ffff76125f5 in llama_decode (ctx=<optimized out>, batch=...) at 
/usr/src/llama.cpp-8064+dfsg-1ubuntu1/src/llama-context.cpp:3466
#14 0x000055555571f7f4 in common_init_from_params (params=...) at 
/usr/src/llama.cpp-8064+dfsg-1ubuntu1/common/common.cpp:1297
#15 0x000055555565b797 in server_context_impl::load_model (this=0x5555562df560, 
params=...) at 
/usr/src/llama.cpp-8064+dfsg-1ubuntu1/tools/server/server-context.cpp:625
#16 0x00005555555ea326 in server_context::load_model (this=0x7fffffffc320, 
params=...) at /usr/include/c++/15/bits/unique_ptr.h:192
#17 main (argc=<optimized out>, argv=<optimized out>) at 
/usr/src/llama.cpp-8064+dfsg-1ubuntu1/tools/cli/cli.cpp:243

## Immediately relevant installed packages

libggml0-backend-hip_0.9.8-3.amd64
llama.cpp_8064+dfsg-1ubuntu1.amd64

** Affects: llama.cpp (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146822

Title:
  llama-cli segfaults on startup

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/llama.cpp/+bug/2146822/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2146822] [NEW] llama-cli segfaults on startup

Reply via email to