Re: the BSDs in the AI Age

Dan Cross Mon, 06 Apr 2026 11:20:52 -0700

On Thu, Apr 2, 2026 at 9:16 AM George Rosamond
<[email protected]> wrote:
> I want to initiate a thread on the "BSDs and AI today."
>
> A few things first.
>
> There are many levels to this discussion, and for the sake of clarity
> and sanity, please top posting. All replies should be inline.
>
> This is useful:
> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
>
> I'm looking to do a presentation on this in the summer for NYC*BUG.
> There hasn't been anything in our community which provides the
> high-level overview of the impact of AI, covering things from the impact
> on the BSD operating systems to the impact on $job, etc. Hopefully this
> thread can provide some raw materials, and become an outlet for
> individual experiences and more general views.


Hopefully this will be an interesting discussion; at any rate, thanks
for initiating it.

> I initiated a similar fruitful (but private) discussion for another
> open-source project, and think it's high-time for us to do the same on
> a public list.
>
> ***
>
> There's a few layers to this discussions. Note these are discussions
> points, not "Yes" or "No" surveys.
>
> * How are LLMs (big tech or otherwise) impacting $job now? Are you using
> Claude Code or similar tools for day to day? Was it required or was it
> your choice? Was there expectations from this tools in terms of
> productivity, etc? This question raises the impact of AWS Bedrock/Kiro...

Personally, LLMs are both influencing my job and not influencing my
job.  The dichotomy is that the surrounding ecosystems are being
fundamentally shaped by them, but I have not incorporated their output
directly in my own work.  However, given that every Google web search
these days more or less includes an AI Mode summary, I'm finding it
inescapable; furthermore, many of the tools that I routinely use are
similarly incorporating LLMs, either directly in their construction
(for example, the Zed editor) or indirectly by hooking into their use
(again, text editors and so on).

Further, some of my colleagues are making heavy use of LLMs, albeit
with significant human supervision.  I suspect this is trend that will
only increase: the quality of output has increased substantially in
the last few months, and the genie is out of the bottle.  There _is_ a
"there" there, though whether it's worth it is a question that needs
to be grappled with.

> * Should BSD projects have explicit LLM-focused policies? Look at the
> 2nd point in the NetBSD "Commit Guidelines" at
> https://www.netbsd.org/developers/commit-guidelines.html. OSS-Security
> already discussed the issue with alleged CVEs discovered by people with
> LLMs trying to stack their resume with credentials.

Probably!

That unsatisfying one-word answer is about the best I suspect can be
done at the moment.  These tools are in their infancy, and
collectively we're all grappling with how best to use them, or not use
them at all, if that's still possible.  I understand that discussions
like this one are meant to iterate on that as part of the overall
process.

> * How should the BSD projects themselves be using LLMs? Integration in
> the shell (oh, please no...)? Porting of APIs for big tech LLMs?
> Utilizing LLMs to discover bad code, CVEs, undiscovered vulnerabilities?

Speaking from my own experience with them....  I decided about six
weeks ago that I needed to understand these things better, so I went
through a few exercises messing around with Anthropic's Claude Code.

What I discovered is that the output is not (yet?) good enough for
direct incorporation into e.g., an operating system. Where I have
found that they work best is in either interactively exploring a code
base ("explain to me how this code uses interface X...."), or in
building bespoke tooling that I might use to better approach whatever
I'm actually working on.

For example, I recently  used Claude to write a tool that extracts
machine-readable register definitions for a particular vendor's CPUs
from PDF documents; given around ten volumes, each containing
many-thousand-pages of text, the tool pulls those definitions and
writes them into JSON files, which can then be queried with a tool
like `jq`. Instead of ^F'ing though a multi-volume set of PDF files, I
have a shell script that can show me the relevant details directly. I
also had it generate tools to show me what the fields of a populated
value mean, and did some editor integration so I can "hover" over a
field and see what it means, what an accessor is changing, and so on.

This is very handy, but more importantly, the process of building it
was instructive. The first draft had all sorts of problems: page
footers inside of field definitions, for example. The LLM kept wanting
to add ad hoc heuristics to fix individual instances of such problems;
I finally realized that among the best ways to constrain it to reality
included a) asking it to explain to me what it was doing, in the form
of a written "design" document, up front; b) forcing it to use
test-driven development (to the extent I could force it to do
anything), so that there was a known metric by which to judge the
output of a change, c) making it frame the problem as building a
grammar describing the register definitions I cared about, and then
implementing a parser for that grammar: page footers could then be
recognized as lexical tokens and treated like whitespace, solving that
problem generally.

This last point was key: forcing to frame its output in terms of a
much smaller thing that was a) formally defined, like an EBNF grammar,
and b) small enough that I could examine and verify myself, I could
have reasonable confidence in the fidelity of its output.  Still, it
always biases towards taking the simplest action to effect an outcome,
often with poor results.  Regardless, I kept at it and eventually got
it to the point where I was reasonably happy with the output.  After,
it occurred to me that if I didn't have the level of experience I do,
I wouldn't have been able to successfully direct the LLM to build the
tool I wanted.  This led me to coin my own little "Dan's Law": an LLM
can only write a program that is as good as the human driving it could
have written.

The corollary is that these things really are tools for senior
engineers, who have the requisite experience to analyze their output.
In the hands of less experienced folks, they're dangerous.  I
presented that tool at a little internal demo the other day and a
colleague asked, "how much time do you estimate that Claude saved
you?"  I think this is the wrong question, and my response was that I
wasn't sure that Claude really saved me any time: oh sure, it could
emit text faster than I could type it all in, but I had to continually
correct it and tell it to go back and start over, and in that sense,
it wasted a lot of time by doing things that I would have, I hope,
thought better than doing myself.

Finally on this point, applicability of LLMs to a problem domain
likely follows a power law: 90-99% of the training data for software
is probably doing more or less the same thing, and the LLM is pretty
good here.  On the other hand, if you're working in the problem space
covering the last 1-10%, the LLM is much worse.  You can get it to
generate a simple web UI, no problem; but a verifiably correct
implementations of lock-free concurrent data structures?  Eh, not so
much.

> * How should individual developers and users consider LLMs as tools for
> contributing to the BSDs and other open-source projects? I happily used
> a big tech LLM to deal with an rc file for some very Linuxey software
> wrapped up in systemd clutter.

This needs to be prefaced by asking, what does it mean to use an LLM?
If essentially every web search is now using one indirectly, it seems
inescapable; but I suspect that's not what you mean: rather, I think
you're referring to direct use by an individual, and incorporating the
output of that use into one's work.

But still, this definitional issue is important.  Suppose I point an
LLM at a program and say, "explain what this does to me" and it points
out a bug, which I then fix and produce a patch for; how does one
characterize that?  Suppose I verified and developed the patch
_without_ use of an LLM, would sending the resulting patch upstream
violate a project's "no AI" clause, given that the LLM pointed it out
to me in the first place? What if I do a web search for some random
technical term and the unasked-for AI summary is actually useful?

Where does one draw the line?  That seems like an urgent and immediate question.

Anyway, to address what I suspect is the actual question, I think as a
way to augment a human developer's abilities, basically being a gofer
and search engine++, it's not out-right awful.  As a way to explore
and ideate, they're ok.  As a replacement for human output (and
importantly human judgement) the things are nowhere near capable
enough for that.  As with the tool I mentioned above, I've found that
they work _best_ when constrained by something else that can be
formally verified. I have had good luck asking the LLM to generate a
formal model of a thing using something like TLA+, Promela, or Alloy,
and proving that the model matches code (usually by showing me the
correspondence between the generated model and the base code).  I can
then then verify the model using it's tools (SPIN, tlc, etc), and use
it to generate property-based tests for a system, which gives me a
baseline of behavior that the LLM has to meet in whatever it's doing.

I strongly suspect that I think formal methods, aggressively applying
the type systems of strongly- and statically-typed languages to a
problem domain, and solid understandings of complexity theory and
formal language design, are going to take on a much greater role for
practitioners over the next few years.  I never thought I'd say this
as an OS person, but I suspect that theorem provers are going to take
on a pretty big role for me over the remainder of my career.

In fact, I found a bug using TLA+ on Friday; notably, that bug snuck
past testing and human review.  I think this is less an LLM win and
more a formal methods win, but I used an LLM to generate the model
that revealed the bug, so they're related in that sense.

> Other relevant questions added to this thread are welcomed, including
> references to other relevant public mailing list discussions.

I mentioned that these tools are still in their infancy, and that
feels very true even in how we interact with them: take Claude Code,
for example.  One can run their CLI, and it feels like playing
Adventure or Zork or something.  And yes, there _are_ other
interfaces, including say a VS Code plugin, but features get released
into the CLI first. Anyway, we're still in the "GET LAMP" era of
working with these things, and still a long ways from Rogue, let alone
something one of my kids would consider playing.

It is also important to acknowledge the ethics here. There are three
main things that keep me up at night:

1. We're re-centralizing the means of producing software.  If these
things are going to take on a larger role (and by every indication
they are), then it's deeply concerning to me that a very small handful
of big players are effectively controlling the show.  Honestly, that
should concern us all.  Furthermore, I think that the true cost of LLM
usage is much higher than what we're currently paying.  Using Claude
Code with the latest model effectively practically requires paying
Anthropic for the Max subscription, which isn't exactly cheap.  What
do we do when the firehose of VC money shuts off and the cost
increases 2x, 5x, or 10x?

2. There's the issue of the provenance and ownership of the data used
for training models. We're starting to see supply chain attacks in
this area, and people have been pointing out that there are legitimate
questions about the legality of sourcing that data in the first place,
and its fair use, for some time. Some folks will dismiss this by
saying that most of us learn from others or by looking at existing
references, so why is this different?  I reply that there is a massive
difference in scale: it was one thing for me to learn about linked
lists as a kid reading a book on data structures; it's entirely
different when a machine sucks in the content of every book on data
structures and reproduces it on demand.  As Warner and others have
pointed out, the courts haven't caught up and it's all _really_
uncertain right now.  And did the authors of those books agree to
having their content used thus?  If the incentive to read those
references goes away, since the LLM gives me the information anyway,
and there's correspondingly no financial incentive to write new books,
how do we move new ideas out of the research domain and into
mainstream practice?  Do LLMs just pull everything towards the median?
 (Maybe the "Singularity" will end up being "aggressively mid.")

3. There's the environmental impact.  The amount of energy required to
build a new model is growing super-linearly (it appears to have gone
from exponential to "merely" quadratic relative to the previous
generation model), and we're running out of physics for Moore's Law to
keep it reasonable (it's axiomatic that you can only halve the size of
a thing so many times until you start running into fundamental
physical limitations, and we're starting to edge up against that).
Dedicated accelerator hardware and so forth may be able to help, but
at some point, we will run out of the ability to train a bigger model.
What then?  Moreover, in their present form, these things are
grotesquely inefficient: everything is free-form text.  The whole
thing really smacks of the sort of thing where the big players created
a machine for generating simulacrums of plausible text, and then
realized they could apply that to all kinds of stuff---like software.
But the amounts of energy (and water!!) required to do so are
unsustainable.  Honestly, this seems like the worst of the three; one
could imagine running a local model at home, or even a small cluster
at a job, but if we're sucking the water table dry to train the model
required to do that, that's not great.  Most of the AI boosters I'm
seen seem to be banking on these problems being solved before it
becomes a really serious problem, or on gains in efficiency due to AI
use offsetting the increase in energy costs, but I'm skeptical: I've
seen no concrete plans how to address this challenge, in particular.

Ultimately, there don't seem like a lot of easy answers, and I suspect
we're in for a pretty wild ride over the next few years.

        - Dan C.

Re: the BSDs in the AI Age

Reply via email to