On Thu, Apr 2, 2026 at 10:18 AM Justin Sherrill <[email protected]> wrote:
> On Thu, Apr 2, 2026 at 9:15 AM George Rosamond < > [email protected]> wrote: > >> I want to initiate a thread on the "BSDs and AI today." >> >> I'm looking to do a presentation on this in the summer for NYC*BUG. >> > > Two quantifiable measures, though they will change by the time you are > doing a summer presentation: > > - What models and software run on BSDs? There's all sorts of tooling for > accessing LLMs, but how much have made it to BSD? > > - How well do LLMs answer questions about BSD specific technology? Or how > exact are they when answering questions that could also be for Linux > systems? This one might be enraging, as in "check your systemd settings to > tune your ZFS pools..." or some such. > > >> * Should BSD projects have explicit LLM-focused policies? >> > > LLM policies right now appear to be a stand-in for other problems. For > example, LLM bug reports are high volume and low quality so far, but I > imagine if they get better, the objection would go away: > > https://lwn.net/Articles/1065620/ > > There's probably also something that needs to be settled with copyright > and assignment with generated code, but I am out of my depth beyond feeling > like it's undefined. > Copyright is an interesting issue. It brings to light several issues that the Open Source community is generally unaware of. Copyright law doesn't stop all copying. There are elements of programs that are not copyrightable because they embody facts, or there's only one way to express things. In addition, boilerplate items part of the interface also likely don't enjoy copyright protection. These details usually don't matter for open source: If there's no copyright you can copy it freely, if there is, you can copy it freely (though maybe with a restriction or two). They only come up with, say, a table that initializes a device's registers is copied or something similar that has no creative content. However, AI-generated code brings these issues back. So if I have claude generate some code for me, and don't edit it, that likely has no copyright protection. It also almost certainly doesn't have any copyright violations in it, at least for the domains that I deal with. Since llms train on thousands of examples, and looks for patterns and uses those patterns to generate the code, there's no direct copying. Other domains with fewer examples may not be so lucky. And there's tools online to look for copying, you you'll still have to be cautious about interpreting the results (eg, some copying is OK, like inline copies of the BSD license). But almost nobody uses unmodified code in production. For the BSDs, claude's generated code today is unsuitable w/o modification, or a lot of prompt refinement. As the code is tweaked to work and handle the riggors of the BSD quality floor, it becomes a combination of the author's work and claude's. The author's creative content is copyrightable, even if embedded in what started out life as AI generated, much like my copyright exists if I modify works in the public domain. In other contexts, there'd be questions about the extent to which you could protect the code, but since open source "freely" gives the code away, you either have code in the public comain, that can be freely copied, or you have code that has a copyright that you can license to "freely" give it away. So the copyright risk analysis here suggests the risks would be low for BSD-license open source projects. There's other risks, but that's the copyright risk. I personally favor policies that allow AI generated code, but require the developer to be able to explain every line, as well as making them responsible for the whole thing. It's just a tool, and like any other tool you have to use it correctly. Warner
