On 2025/11/03 23:19, Kirill A. Korinsky wrote:
> On Mon, 03 Nov 2025 19:22:14 +0100,
> Stuart Henderson <[email protected]> wrote:
> > 
> > On 2025/11/03 15:15, Kirill A. Korinsky wrote:
> > > We don't have GPU but with -t 32 I had run Qwen3 VL 30B model on CPU only 
> > > at
> > > AMD Ryzen 9 7950X3D with acceptable to use speed like 2 tokens/second 
> > > which
> > > more or leass useble. But it requires memory. 120G as :datasize is enough.
> > > 
> > > Because we uses libggml as dedicated port, it must to be updated to the 
> > > last
> > > version, and it contains a bug which brokes large models under large 
> > > number
> > > of threads: https://github.com/ggml-org/llama.cpp/issues/16960
> > 
> > i was hoping to hold off updating llama until there was a new ggml
> > release (implying they think it's stable-ish) rather than follow the
> > bleeding edge, but if you want then do it... please keep an eye on the
> > repo for fixes for any breakages though.
> >
> 
> Sure, I'll do.
> 
> More of that I'm waiting that upstream decided with this PR.
> 
> I won't go until it's merged at least to llama.cpp.

Thanks. Good call.

> > 
> > whisper still works, so with those changes it's ok with me.
> > 
> 
> I'll incorparated your remarks in my local tree, thanks!
> 
> -- 
> wbr, Kirill
> 

Reply via email to