LLM code generators

Alex Bennée Fri, 24 Nov 2023 02:34:10 -0800

Kevin Wolf <kw...@redhat.com> writes:

> Am 24.11.2023 um 00:53 hat Michael S. Tsirkin geschrieben:
>> On Thu, Nov 23, 2023 at 05:46:16PM +0000, Daniel P. Berrangé wrote:
>> > On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote:
>> > > Daniel P. Berrangé <berra...@redhat.com> writes:
>> > > 
<snip>
>> > > > +The QEMU maintainers thus require that contributors refrain from using
>> > > > +"AI" code generators on patches intended to be submitted to the 
>> > > > project,
>> > > > +and will decline any contribution if use of "AI" is known or 
>> > > > suspected.
>> > > > +
>> > > > +Examples of tools impacted by this policy includes both GitHub 
>> > > > CoPilot,
>> > > > +and ChatGPT, amongst many others which are less well known.
>> > > 
>> > > What about if you took an LLM and then fine tuned it by using project
>> > > data so it could better help new users in making contributions to the
>> > > project? You would be biasing the model to your own data for the
>> > > purposes of helping developers write better QEMU code?
>> > 
>> > It is hard to provide an answer to that question, since I think it is
>> > something that would need to be considered case by case. It hinges
>> > around how much does the new QEMU specific training data influence
>> > the model, vs other pre-existing training (if any)
>
> I suspect fine tuning won't be enough because it doesn't make the
> unlicensed original training data go away.
>
> If you could make sure that all of the training data consists only of
> code for which you have the right to contribute it to QEMU, that would
> be a different case.


That probably means we can never use even open source LLMs to generate
code for QEMU because while the source data is all open source it won't
necessarily be GPL compatible.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators

Reply via email to