On Thu, Apr 2, 2026 at 9:16 AM George Rosamond <[email protected]> wrote: > I want to initiate a thread on the "BSDs and AI today." > > A few things first. > > There are many levels to this discussion, and for the sake of clarity > and sanity, please top posting. All replies should be inline. > > This is useful: > https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying > > I'm looking to do a presentation on this in the summer for NYC*BUG. > There hasn't been anything in our community which provides the > high-level overview of the impact of AI, covering things from the impact > on the BSD operating systems to the impact on $job, etc. Hopefully this > thread can provide some raw materials, and become an outlet for > individual experiences and more general views.
Hopefully this will be an interesting discussion; at any rate, thanks for initiating it. > I initiated a similar fruitful (but private) discussion for another > open-source project, and think it's high-time for us to do the same on > a public list. > > *** > > There's a few layers to this discussions. Note these are discussions > points, not "Yes" or "No" surveys. > > * How are LLMs (big tech or otherwise) impacting $job now? Are you using > Claude Code or similar tools for day to day? Was it required or was it > your choice? Was there expectations from this tools in terms of > productivity, etc? This question raises the impact of AWS Bedrock/Kiro... Personally, LLMs are both influencing my job and not influencing my job. The dichotomy is that the surrounding ecosystems are being fundamentally shaped by them, but I have not incorporated their output directly in my own work. However, given that every Google web search these days more or less includes an AI Mode summary, I'm finding it inescapable; furthermore, many of the tools that I routinely use are similarly incorporating LLMs, either directly in their construction (for example, the Zed editor) or indirectly by hooking into their use (again, text editors and so on). Further, some of my colleagues are making heavy use of LLMs, albeit with significant human supervision. I suspect this is trend that will only increase: the quality of output has increased substantially in the last few months, and the genie is out of the bottle. There _is_ a "there" there, though whether it's worth it is a question that needs to be grappled with. > * Should BSD projects have explicit LLM-focused policies? Look at the > 2nd point in the NetBSD "Commit Guidelines" at > https://www.netbsd.org/developers/commit-guidelines.html. OSS-Security > already discussed the issue with alleged CVEs discovered by people with > LLMs trying to stack their resume with credentials. Probably! That unsatisfying one-word answer is about the best I suspect can be done at the moment. These tools are in their infancy, and collectively we're all grappling with how best to use them, or not use them at all, if that's still possible. I understand that discussions like this one are meant to iterate on that as part of the overall process. > * How should the BSD projects themselves be using LLMs? Integration in > the shell (oh, please no...)? Porting of APIs for big tech LLMs? > Utilizing LLMs to discover bad code, CVEs, undiscovered vulnerabilities? Speaking from my own experience with them.... I decided about six weeks ago that I needed to understand these things better, so I went through a few exercises messing around with Anthropic's Claude Code. What I discovered is that the output is not (yet?) good enough for direct incorporation into e.g., an operating system. Where I have found that they work best is in either interactively exploring a code base ("explain to me how this code uses interface X...."), or in building bespoke tooling that I might use to better approach whatever I'm actually working on. For example, I recently used Claude to write a tool that extracts machine-readable register definitions for a particular vendor's CPUs from PDF documents; given around ten volumes, each containing many-thousand-pages of text, the tool pulls those definitions and writes them into JSON files, which can then be queried with a tool like `jq`. Instead of ^F'ing though a multi-volume set of PDF files, I have a shell script that can show me the relevant details directly. I also had it generate tools to show me what the fields of a populated value mean, and did some editor integration so I can "hover" over a field and see what it means, what an accessor is changing, and so on. This is very handy, but more importantly, the process of building it was instructive. The first draft had all sorts of problems: page footers inside of field definitions, for example. The LLM kept wanting to add ad hoc heuristics to fix individual instances of such problems; I finally realized that among the best ways to constrain it to reality included a) asking it to explain to me what it was doing, in the form of a written "design" document, up front; b) forcing it to use test-driven development (to the extent I could force it to do anything), so that there was a known metric by which to judge the output of a change, c) making it frame the problem as building a grammar describing the register definitions I cared about, and then implementing a parser for that grammar: page footers could then be recognized as lexical tokens and treated like whitespace, solving that problem generally. This last point was key: forcing to frame its output in terms of a much smaller thing that was a) formally defined, like an EBNF grammar, and b) small enough that I could examine and verify myself, I could have reasonable confidence in the fidelity of its output. Still, it always biases towards taking the simplest action to effect an outcome, often with poor results. Regardless, I kept at it and eventually got it to the point where I was reasonably happy with the output. After, it occurred to me that if I didn't have the level of experience I do, I wouldn't have been able to successfully direct the LLM to build the tool I wanted. This led me to coin my own little "Dan's Law": an LLM can only write a program that is as good as the human driving it could have written. The corollary is that these things really are tools for senior engineers, who have the requisite experience to analyze their output. In the hands of less experienced folks, they're dangerous. I presented that tool at a little internal demo the other day and a colleague asked, "how much time do you estimate that Claude saved you?" I think this is the wrong question, and my response was that I wasn't sure that Claude really saved me any time: oh sure, it could emit text faster than I could type it all in, but I had to continually correct it and tell it to go back and start over, and in that sense, it wasted a lot of time by doing things that I would have, I hope, thought better than doing myself. Finally on this point, applicability of LLMs to a problem domain likely follows a power law: 90-99% of the training data for software is probably doing more or less the same thing, and the LLM is pretty good here. On the other hand, if you're working in the problem space covering the last 1-10%, the LLM is much worse. You can get it to generate a simple web UI, no problem; but a verifiably correct implementations of lock-free concurrent data structures? Eh, not so much. > * How should individual developers and users consider LLMs as tools for > contributing to the BSDs and other open-source projects? I happily used > a big tech LLM to deal with an rc file for some very Linuxey software > wrapped up in systemd clutter. This needs to be prefaced by asking, what does it mean to use an LLM? If essentially every web search is now using one indirectly, it seems inescapable; but I suspect that's not what you mean: rather, I think you're referring to direct use by an individual, and incorporating the output of that use into one's work. But still, this definitional issue is important. Suppose I point an LLM at a program and say, "explain what this does to me" and it points out a bug, which I then fix and produce a patch for; how does one characterize that? Suppose I verified and developed the patch _without_ use of an LLM, would sending the resulting patch upstream violate a project's "no AI" clause, given that the LLM pointed it out to me in the first place? What if I do a web search for some random technical term and the unasked-for AI summary is actually useful? Where does one draw the line? That seems like an urgent and immediate question. Anyway, to address what I suspect is the actual question, I think as a way to augment a human developer's abilities, basically being a gofer and search engine++, it's not out-right awful. As a way to explore and ideate, they're ok. As a replacement for human output (and importantly human judgement) the things are nowhere near capable enough for that. As with the tool I mentioned above, I've found that they work _best_ when constrained by something else that can be formally verified. I have had good luck asking the LLM to generate a formal model of a thing using something like TLA+, Promela, or Alloy, and proving that the model matches code (usually by showing me the correspondence between the generated model and the base code). I can then then verify the model using it's tools (SPIN, tlc, etc), and use it to generate property-based tests for a system, which gives me a baseline of behavior that the LLM has to meet in whatever it's doing. I strongly suspect that I think formal methods, aggressively applying the type systems of strongly- and statically-typed languages to a problem domain, and solid understandings of complexity theory and formal language design, are going to take on a much greater role for practitioners over the next few years. I never thought I'd say this as an OS person, but I suspect that theorem provers are going to take on a pretty big role for me over the remainder of my career. In fact, I found a bug using TLA+ on Friday; notably, that bug snuck past testing and human review. I think this is less an LLM win and more a formal methods win, but I used an LLM to generate the model that revealed the bug, so they're related in that sense. > Other relevant questions added to this thread are welcomed, including > references to other relevant public mailing list discussions. I mentioned that these tools are still in their infancy, and that feels very true even in how we interact with them: take Claude Code, for example. One can run their CLI, and it feels like playing Adventure or Zork or something. And yes, there _are_ other interfaces, including say a VS Code plugin, but features get released into the CLI first. Anyway, we're still in the "GET LAMP" era of working with these things, and still a long ways from Rogue, let alone something one of my kids would consider playing. It is also important to acknowledge the ethics here. There are three main things that keep me up at night: 1. We're re-centralizing the means of producing software. If these things are going to take on a larger role (and by every indication they are), then it's deeply concerning to me that a very small handful of big players are effectively controlling the show. Honestly, that should concern us all. Furthermore, I think that the true cost of LLM usage is much higher than what we're currently paying. Using Claude Code with the latest model effectively practically requires paying Anthropic for the Max subscription, which isn't exactly cheap. What do we do when the firehose of VC money shuts off and the cost increases 2x, 5x, or 10x? 2. There's the issue of the provenance and ownership of the data used for training models. We're starting to see supply chain attacks in this area, and people have been pointing out that there are legitimate questions about the legality of sourcing that data in the first place, and its fair use, for some time. Some folks will dismiss this by saying that most of us learn from others or by looking at existing references, so why is this different? I reply that there is a massive difference in scale: it was one thing for me to learn about linked lists as a kid reading a book on data structures; it's entirely different when a machine sucks in the content of every book on data structures and reproduces it on demand. As Warner and others have pointed out, the courts haven't caught up and it's all _really_ uncertain right now. And did the authors of those books agree to having their content used thus? If the incentive to read those references goes away, since the LLM gives me the information anyway, and there's correspondingly no financial incentive to write new books, how do we move new ideas out of the research domain and into mainstream practice? Do LLMs just pull everything towards the median? (Maybe the "Singularity" will end up being "aggressively mid.") 3. There's the environmental impact. The amount of energy required to build a new model is growing super-linearly (it appears to have gone from exponential to "merely" quadratic relative to the previous generation model), and we're running out of physics for Moore's Law to keep it reasonable (it's axiomatic that you can only halve the size of a thing so many times until you start running into fundamental physical limitations, and we're starting to edge up against that). Dedicated accelerator hardware and so forth may be able to help, but at some point, we will run out of the ability to train a bigger model. What then? Moreover, in their present form, these things are grotesquely inefficient: everything is free-form text. The whole thing really smacks of the sort of thing where the big players created a machine for generating simulacrums of plausible text, and then realized they could apply that to all kinds of stuff---like software. But the amounts of energy (and water!!) required to do so are unsustainable. Honestly, this seems like the worst of the three; one could imagine running a local model at home, or even a small cluster at a job, but if we're sucking the water table dry to train the model required to do that, that's not great. Most of the AI boosters I'm seen seem to be banking on these problems being solved before it becomes a really serious problem, or on gains in efficiency due to AI use offsetting the increase in energy costs, but I'm skeptical: I've seen no concrete plans how to address this challenge, in particular. Ultimately, there don't seem like a lot of easy answers, and I suspect we're in for a pretty wild ride over the next few years. - Dan C.
