LLM code generators

Peter Maydell Thu, 23 Nov 2023 10:11:33 -0800

On Thu, 23 Nov 2023 at 18:02, Daniel P. Berrangé <berra...@redhat.com> wrote:
>
> On Thu, Nov 23, 2023 at 04:56:28PM +0200, Manos Pitsidianakis wrote:
> > On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <m...@redhat.com> wrote:
> > > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
> > > > +Examples of tools impacted by this policy includes both GitHub CoPilot,
> > > > +and ChatGPT, amongst many others which are less well known.
> > >
> > >
> > > So you called out these two by name, fine, but given "AI" is in scare
> > > quotes I don't really know what is or is not allowed and I don't know
> > > how will contributors know.  Is the "AI" that one must not use
> > > necessarily an LLM?  And how do you define LLM even? Wikipedia says
> > > "general-purpose language understanding and generation".
> > >
> > >
> > > All this seems vague to me.
> > >
> > >
> > > However, can't we define a simpler more specific policy?
> > > For example, isn't it true that *any* automatically generated code
> > > can only be included if the scripts producing said code
> > > are also included or otherwise available under GPLv2?
> >
> > The following definition makes sense to me:
> >
> > - Automated codegen tool must be idempotent.
> > - Automated codegen tool must not use statistical modelling.
>
> As a casual reader, I would find this somewhat unclear to interpet
> and relate to.


It's also not really relevant to what we're trying to rule out.
A non-idempotent codegen tool is fine, if the code it generates
is clearly under a license that's compatible with QEMU's.
A codegen tool that uses statistical modelling is also fine,
if (for example) it's only doing statistical modelling of the
data in the single file it's adding code to and doesn't use
any external data set.

> > I'd remove all AI or LLM references. These are non-specific, colloquial and
> > in the case of `AI`, non-technical. This policy should apply the same to a
> > Markov chain code generator.
>
> The fact that they are colloaquial is, IMHO, a good thing is it makes
> the policy relatable to the casual reader who hears the terms "AI" and
> "LLM" in technical press articles/blogs/etc all over the place.

Yes, I think that the most important thing about the wording
of this policy (assuming we agree on it) is that it should be
immediately very clear to anybody reading it that ChatGPT,
Copilot, etc type tools aren't permitted. Because in practice
the most likely case is somebody who wants to use those, and we
don't want to make them have to go through "read an abstract
definition of what isn't permitted and apply that abstract
definition to the concrete tool they're using".

thanks
-- PMM

Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators

Reply via email to