Hi,

I would double-down on Greg Troxel's advice concerning copyright issues, 
especially concerning the introduction of LLM-generated code into QGIS codebase.

Opensource's success is based on these main characteristics : quality, 
security, trust.

AI contributions pose a threat to quality, security and trust alike.

A human-in-the-loop policy for contributions written with AI may help for 
quality and security issues, but will still leaves a huge problem for trust.

Among the various aspects of trust, what worries me most right now is the 
copyright issue. OpenSource software is based on intellectual property laws, 
and especially on copyright, to be able to derive copyleft and grant more 
rights to end-users.

End-user trust opensource software from a legal point of view because :

- they are backed by well-established copyright laws

- they have clear and well established end-users contracts ( opensource 
licences )

- they have a full record of modifications of the source code, hence a full 
lineage and certification of IP rights for the code

- also, foundations like OSGeo additionnaly put a stamp on the software to 
guarantee that process and initial IP can be trusted enough to have a legal 
insurance concerning the software

Introducing IA black boxes into the development process breaks the ability to 
control the lineage of the code and guarantee that it is a genuine invention, 
and therefore allowed to be licenced under the GPL.

For quality and security, a developer can always intrinsically assess that the 
generated code has the required level of quality, and that it does not include 
any security flaw.

But **there is no way for a developer to evaluate the IP rights on a code 
generated by a LLM**. How would one do it, since the code has been generated 
through a total opaque black box ingesting non-identified enormous volumes of 
data ?

Today, we definitely know that LLMs ( ChatGPT, Claude and others ) have been 
trained on illegal copyrighted material. It is proven that they trained LLMs on 
pirated books. Furthermore, every time someone complaints about IP violation by 
LLM, big corps settle a financial arrangement with the copyright owners and 
move on.

There is therefore no doubt that they have also trained LLMs on proprietary 
code. And also on opensource code not compliant with GPLv2+.

Big corp. currently hide behind a "fair use" argument, but this is clearly 
rubbish, otherwise why would they bother to settle large financial deals with copyright 
owners ?

So, LLM-generated code contributed to QGIS will at some point be plagiarized 
from random code available on the internet, and neither QGIS.org nor the 
contributor will be able to know.

If we start accepting such code without being able to check provenance or 
copyright issues, it will end up buried deep inside QGIS, and the day we will 
discover that it infringes copyright, it will be a nightmare to solve : in this 
case we will want to revert all incriminated code, and also all code depending 
on the plagiarized code **and have it rewritten from scratch by someone who has 
never read the plagiarized code** ( ref : SCO/UNIX for example ). This is 
almost impossible.

This would be a nightmare, just for one identified contribution.

Even more, if/when the fair-use principle of LLMs falls down, then all 
LLM-generated code should be removed from QGIS, and all code depending on it. 
This is a really high risk with high impact.

You may say : "ok but everyone does it, the chances of being caught are low, why not 
benefit from the opportunity ?"

Then what about "everyone copies GPL code into proprietary code, the chances of 
being caught are low, why not benefit from the opportunity ?"

Copyright is at the foundation of OpenSource software, and especially GPL-based 
software. If we choose to deny it, then we loose our core principle.

In the text Even propose, there is a copyright section, pushing the 
responsibility of IP compliance control back to the contributor. It may protect 
QGIS.org or other developers from being sued whenever there is a problem, or 
they could sue back the faulty contributor, but this is not enough :

- the faulty contributor has no way to ensure his generated code has no IP 
issue ( other than NOT using LLMs ) : responsibility without any mean of action 
is not fair and sustainable

- even if the QGIS projet can avoid being convicted by transferring 
responsibility, then the situation would still be open and be a nightmare : 
removing plagiarized code entangled down the core of the software and all its 
dependency code, and rewrite it without IP issue is really hard

Therefore, I do not think this mention is enough for IP protection.

This rationale concerns the generated code itself, contributed to QGIS or other 
software in the ecosystem. LLMs may be useful and without IP risks to help find 
bugs, write parts of documentations where there is no risk of plagiarism, or 
other use cases.

But I would definitely **forbid any generated code to be introduced into the 
main source code because of IP risk**.

Also, the least we can do for any contribution, is not only to have a human in 
the loop, but also to have a mandatory mention and description of LLM usage for 
each contribution. This would at least give traceability. It does not solve 
anything, but in case of a problem, we could at least start to investigate.

A am glad this conversation takes place, and willing to pursue the discussion, 
sorry for having been long.

Have a nice weekend,

Vincent





On 31/01/2026 01:01, Greg Troxel via QGIS-Developer wrote:
I would suggest a much stronger policy:

   no LLM-generated code or discussion may be submitted to any QGIS forum


The idea that LLM-generated code has been "reviewed" intends to be that
it is of high enough quality that it is reasonable for *humans* to spend
time reviewing it.  But I don't believe that asking that it be reviewed
will achieve that in practice.

I've already had the experience (in a different project) of seeing a
posted PR(ish, patch on list), taking the time to comment, and getting
LLM-generated (vacuous) replies to my comments.

Besides the ethical problems with asking humans to review, improve,
judge or in any other way pay attention to LLM output, there's the
problem of copyright.  While machine-generated text isn't copyrightable
as is, LLM output is a derived work of stolen human work, scraped
and used without permission, often as DDOS.

On the basis of each reason, I believe the policy about LLM should just
be "no".
_______________________________________________
QGIS-Developer mailing list
[email protected]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
_______________________________________________
QGIS-Developer mailing list
[email protected]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

Reply via email to