Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
[Christian Kastner] > I'm open for better ideas, though. I find in general that programs written with run time selection of optimizations are far superiour to per host compilations, at least from a system administration viewpoint. I guess such approach would require rewriting llama.cpp, and have no idea how much work it would be. I look forward to having a look at your git repo to see if there is something there I can learn from for the whisper.cpp packaging. -- Happy hacking Petter Reinholdtsen
Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Hey Ptter, On 2024-03-08 20:21, Petter Reinholdtsen wrote: > [Christian Kastner 2024-02-13] >> I'll push a first draft soon, though it will definitely not be >> upload-ready for the above reasons. > > Where can I find the first draft? I've discarded the simple package and now plan another approach: a package that ships a helper to rebuild the utility when needed, similar to DKMS. Rationale: * Continuously developed upstream, no build suited for stable * Build optimized for the current host's hardware, which is a key feature. Building for our amd64 ISA standard would be absurd. I'm open for better ideas, though. I had to pause this primarily because of ROCm infrastructure work and our updates to 5.7 in preparation of the gfx1100,gfx1101,gfx1102 architectures, and that is still my focus. Incidentally, we could use some help with that, see thread at [1]. MIOpen in particular is something that our ROCm stack will eventually need. Best, Christian [1] https://lists.debian.org/debian-ai/2024/03/msg00029.html [2] https://github.com/ROCm/MIOpen
Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
[Christian Kastner 2024-02-13] > I'll push a first draft soon, though it will definitely not be > upload-ready for the above reasons. Where can I find the first draft? -- Happy hacking Petter Reinholdtsen
Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Hi Petter, On 2024-02-13 08:36, Petter Reinholdtsen wrote: > I tried building the CPU edition on one machine and run it on another, > and experienced illegal instruction exceptions. I suspect this mean one > need to be careful when selecting build profile to ensure it work on all > supported Debian platforms. yeah, that was my conclusion from my first experiments as well. This is a problem though, since one key point of llama.cpp is to make best use of the current hardware. If we'd target some 15-year-old amd64 lowest common denominator, we'd go against that. In my first experiments, I've also had problems with ROCm builds on hosts without a GPU. I have yet to investigate if/how capabilities can be generally enabled, and use determined at runtime. Another issue that stable is clearly the wrong distribution for this. This is a project that is continuously gaining new features, so we'd need to stable-updates. > I would be happy to help getting this up and running. Please let me > know when you have published a git repo with the packaging rules. I'll push a first draft soon, though it will definitely not be upload-ready for the above reasons. Best, Christian
Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
I tried building the CPU edition on one machine and run it on another, and experienced illegal instruction exceptions. I suspect this mean one need to be careful when selecting build profile to ensure it work on all supported Debian platforms. I would be happy to help getting this up and running. Please let me know when you have published a git repo with the packaging rules. -- Happy hacking Petter Reinholdtsen
Bug#1063673: ITP: llama.cpp -- Inference of Meta's LLaMA model (and others) in pure C/C++
Package: wnpp Severity: wishlist Owner: Christian Kastner X-Debbugs-Cc: debian-de...@lists.debian.org, debian...@lists.debian.org * Package name: llama.cpp Version : b2116 Upstream Author : Georgi Gerganov * URL : https://github.com/ggerganov/llama.cpp * License : MIT Programming Lang: C++ Description : Inference of Meta's LLaMA model (and others) in pure C/C++ The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud. * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2 and AVX512 support for x86 architectures * 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP) * Vulkan, SYCL, and (partial) OpenCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity This package will be maintained by the Debian Deep Learning Team.