On Wed, 21 Jan 2026 18:36:40 +0100, Kirill A. Korinsky <[email protected]> wrote: > > On Wed, 21 Jan 2026 18:18:29 +0100, > Chris Cappuccio <[email protected]> wrote: > > > > Kirill A. Korinsky [[email protected]] wrote: > > > > > > Here a version where I: > > > 1. added runtime dependency on textproc/ripgrep > > > 2. patched required current_exe() usage > > > 3. added README with sample how it can be used against llama.cpp server > > > > > > > I like this. I had done a similar README. In model_providers you should add: > > > > headers = {} > > query_params = {} > > > > They might not break anything today but they are of no use to llama.cpp and > > the future is less certain. They are designed for openai's interface. > > > > Do you mean something like that? > > headers = {} > query_params = {} > > model_provider = "local" > model = "local" > > [model_providers.local] > name = "llama-server" > base_url = "http://127.0.0.1:8080/v1" > wire_api = "chat" > >
BTW https://github.com/ggml-org/llama.cpp/pull/18486 was merged, I'll cook an update for llama.cpp and ggml in a few days, and base on their discussion they had made it in scope which allows to avoid write_api in config. If it works, I suggest to import it without deprecated wire_api = "chat" in README. -- wbr, Kirill
