Here's a brainstorming idea for how to implement something to address your valid concerns. How about the following policy? No review of a pull request will occur unless it meets certain minimum requirements: 1) It passes all pre-existing tests; 2) It includes test coverage for all new code; 3) It includes tests covering any bug fixes.
I can see how to implement #1 automatically. Could #2 be implemented using one of the coverage testing tools? My experience with those is limited. It also would require some work to make sure new tests cover all changed code. I think this would clear out a lot of the very low quality, doesn't work or does nothing code. However, I see a couple of problems as well: 1) What happens if the bug fix is to an erroneous test? 2) This does not address low quality descriptions of the PR and its goals. 3) People who are just learning the code base will need a way to get help on running and fixing issues with testing. I think contributors might have to be in the position of asking for help on this list with issues of that sort or maybe there should be a specific venue for that. Just some ideas to help start the ball rolling. Jonathan On Saturday, October 25, 2025 at 5:46:10 PM UTC-5 Oscar wrote: > Hi all, > > I am increasingly seeing pull requests in the SymPy repo that were > written by AI e.g. something like Claude Code or ChatGPT etc. I don't > think that any of these PRs are written by actual AI bots but rather > that they are "written" by contributors who are using AI tooling. > > There are two separate categories: > > - Some contributors are making reasonable changes to the code and then > using LLMs to write things like the PR description or comments on > issues. > - Some contributors are basically just vibe coding by having an LLM > write all the code for them and then opening PRs usually with very > obvious problems. > > In the first case some people use LLMs to write things like PR > descriptions because English is not their first language. I can > understand this and I think it is definitely possible to do this with > LLMs in a way that is fine but it needs to amount to using them like > Google Translate rather than asking them to write the text. The > problems are that: > > - LLM summaries for something like a PR are too verbose and include > lots of irrelevant information making it harder to see what the actual > point is. > - LLMs often include information that is just false such as "fixes > issue #12345" when the issue is not fixed. > > I think some people are doing this in a way that is not good and I > would prefer for them to just write in broken English or use Google > Translate or something but I don't see this as a major problem. > > For the vibe coding case I think that there is a real problem. Many > SymPy contributors are novices at programming and are nowhere near > experienced enough to be able to turn vibe coding into outputs that > can be included in the codebase. This means that there are just spammy > PRs with false claims about what they do like "fixes X", "10x faster" > etc where the code has not even been lightly tested and clearly does > not work or possibly does not even do anything. > > I think what has happened is that the combination of user-friendly > editors with easy git/GitHub integration and LLM agent plugins has > brought us to the point where there are pretty much no technical > barriers preventing someone from opening up gibberish spam PRs while > having no real idea what they are doing. > > Really this is just inexperienced people using the tools badly which > is not new. Low quality spammy PRs are not new either. There are some > significant differences though: > > - I think that the number of low quality PRs is going to explode. It > was already bad last year in the run up to GSOC (January to March > time) and I think it will be much worse this year. > - I don't think that it is reasonable to give meaningful feedback on > PRs where this happens because the contributor has not spent any time > studying the code that they are changing and any feedback is just > going to be fed into an LLM. > > I'm not sure what we can do about this so for now I am regularly > closing low quality PRs without much feedback but some contributors > will just go on to open up new PRs. The "anyone can submit a PR model" > has been under threat for some time but I worry that the whole idea is > going to become unsustainable. > > In the context of the Russia-Ukraine war I have often seen references > to the "cost-exchange problem". This refers to the fact that while > both sides have a lot of anti-air defence capability they can still be > overrun by cheap drones because million dollar interceptor missiles > are just too expensive to be used against any large number of incoming > thousand dollar drones. The solution there would be to have some kind > of cheap interceptor like an automatic AA gun that can take out many > cheap drones efficiently even if much less effective against fancier > targets like enemy planes. > > The first time I heard about ChatGPT was when I got an email from > StackOverflow saying that any use of ChatGPT was banned. Looking into > it the reason given was that it was just too easy to generate > superficially reasonable text that was low quality spam and then too > much effort for real humans to filter that spam out manually. In other > words bad/incorrect answers were nothing new but large numbers of > inexperienced people using ChatGPT had ruined the cost-exchange ratio > of filtering them out. > > I think in the case of SymPy pull requests there is an analogous > "effort-exchange problem". The effort PR reviewers put in to help with > PRs is not reasonable if the author of the PR is not putting in a lot > more effort themselves because there are many times more people trying > to author PRs than review them. I don't think that it can be > sustainable in the face of this spam to review PRs in the same way as > if they had been written by humans who are at least trying to > understand what they are doing (and therefore learning from feedback). > Even just closing PRs and not giving any feedback needs to become more > efficient somehow. > > We need some sort of clear guidance or policy on the use of AI that > sets clear explanations like "you still need to understand the code". > I think we will also need to ban people for spam if they are doing > things like opening AI-generated PRs without even testing the code. > The hype that is spun by AI companies probably has many novice > programmers believing that it actually is reasonable to behave like > this but it really is not and that needs to be clearly stated > somewhere. I don't think any of this is malicious but I think that it > has the potential to become very harmful to open source projects. > > The situation right now is not so bad but if you project forwards a > bit to when the repo gets a lot busier after Christmas I think this is > going to be a big problem and I think it will only get worse in future > years as well. > > It is very unfortunate that right now AI is being used in all the > wrong places. It can do a student's homework because it knows the > answers to all the standard homework problems but it can't do the more > complicated more realistic things and then students haven't learned > anything from doing their homework. In the context of SymPy it would > be so much more useful to have AI doing other things like reviewing > the code, finding bugs, etc rather than helping novices to get a PR > merged without actually investing the time to learn anything from the > process. > > -- > Oscar > -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/sympy/dec18068-bf8b-4298-b42b-fc8b30529d36n%40googlegroups.com.
