I spent a lot of time and heartache dealing with VLLM and ollama. Here are the issues vllm. CPU is barely supported, and it is 60 GB docker layers to build it, then runs super slow. Ollama is much better on CPU, community wise they take months to merge common sense features. Still better than VLLM is a black hole of denial and then never merging anyway. The problem I have with ollama is similar to VLLM the install itself is about 14GB as it downloads every C and blas library in existence.
Enter deliverance https://github.com/edwardcapriolo/deliverance - Written for CPU - Written in Java with selected C modules foursome heavy lifting - A binary of < 55MB! (not 20 GB of python c and tensor libraries) - compiles in < 5 minutes (including tests) - Available in docker hub https://hub.docker.com/r/ecapriolo/deliveranc I do nice work with quantized models for qwen, gemma, and llama. I do most of my devwork on a core i5 that is 8 years old. Ow but EdI hatem da java. Well the tensor library does some math operation in SIMD in C https://github.com/edwardcapriolo/deliverance/blob/main/native/src/main/c/simd/vector_simd.c And and it even has web_dawn CPU support (confession I donthave a GPU to test on) https://github.com/edwardcapriolo/deliverance/blob/main/native/src/main/c/gpu/vector_gpu.c If anyone wants to loan be acccess to a BSD system or a BSD system with GPU I can run some tests there and we can have some fun. The dependencies are very light on the C side my alpine system that i test on looks like this: doas apk add maven doas apk install git doas apk add curl doas apk add docker-compose doas apk add openjdk25 doas apk add gpg doas apk add bash doas apk add clang20-libclang-20.1.8-r0 doas apk add llvm clang lld That will build the SIMD c module, which as I mentioned is significantly less effort then the 12 GB of blas libraries ollama installs and the 4GB of tensorflow stuff transformers will install Thanks, Edward On Tue, Apr 7, 2026 at 5:50 PM Martin Cracauer <[email protected]> wrote: > The situation with LLMs on FreeBSD is not totally catastrophic. > > The NVidia drivers are currently broken on my 5090, so I cannot > compare Vulkan/FreeBSD to Linux/Cuda. > > But they work on my 2080ti with Vulkan and run both ollama and > llama.cpp, accelerated. > > On my laptop with "AMD Ryzen 7 PRO 4750U with Radeon Graphics" also > runs Vulkan and accelerates ollama (although only by a factor of 3 > compared to CPU). This combo does not run llama.cpp > > Now that NVidia drivers are running on at least one of my cards I'll > give it another go to run CUDA through Linuxulator. > > Martin > -- > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > Martin Cracauer <[email protected]> http://www.cons.org/cracauer/ >
