On Fri, Mar 29, 2024 at 7:00 PM Andres Freund <and...@anarazel.de> wrote: > I am doubtful that every committer would find something sneaky hidden in > e.g. one of the test changes in a large commit. It's not too hard to hide > something sneaky. I comparison to that hiding something in configure.ac seems > less likely to succeed IMO, that imo tends to be more scrutinized. And hiding > just in configure directly wouldn't get you far, it'd just get removed when > the committer or some other committer at a later time, regenerates configure.
I agree with this. If I were trying to get away with a malicious commit, I'd look for files that other people would be unlikely to examine closely, or would have difficulty examining closely. Test data or test scripts seem like great possibilities. And I also would like it to be part of some relatively large commit that is annoying to read through visually. We don't have a lot of binary format files in the tree, which is good, but there's probably some things like Unicode tables and ECPG expected output files that very, very few people ever actually examine. If we had a file in the tree that looked based on the name like an expected output file for a test, but there was no corresponding test, how many of us would notice that? How many of us would scrutinize it? Imagine hiding something bad in the middle of that file somewhere. Maybe we need some kind of tool that scores files for risk. Longer files are riskier. Binary files are riskier, as are text files that are something other than plain English/C code/SGML. Files that haven't changed in a long time are not risky, but files with few recent changes are riskier than files with many recent changes, especially if only 1 or 2 committers made all of those changes, and especially if those commits also touched a lot of other files. Of course, if we had a tool like this that were public, I suppose anyone targeting PG would look at the tool and try to find ways around its heuristics. But maybe we should have something and not disclose the whole algorithm publicly, or even if we do disclose it all, having something is probably better than having nothing. It might force a hypothetical bad actor to do things that would be more likely to be noticed by the humans. We might also want to move toward signing commits and tags. One of the meson maintainers was recommending that on-list not long ago. We should think about weaknesses that might occur during the packaging process, too. If someone who alleges that their packaging PG is really packaging PG w/badstuff123.patch, how would we catch that? An awful lot of what we do operates on the principle that we know the people who are involved and trust them, and I'm glad we do trust them, but the world is full of people who trusted somebody too much and regretted it afterwards. The fact that we have many committers rather than a single maintainer probably reduces risk at least as far as the source repository is concerned, because there are more people paying attention to potentially notice something that isn't as it should be. But it's also more potential points of compromise, and a lot of things outside of that repository are not easy to audit. I can't for example verify what the infrastructure team is doing, or what Tom does when he builds the release tarballs. It seems like a stretch to imagine someone taking over Tom's online identity while simultaneously rendering him incommunicado ... but at the same time, the people behind this attack obviously put a lot of work into it and had a lot of resources available to craft the attack. We shouldn't make the mistake of assuming that bad things can't happen to us. -- Robert Haas EDB: http://www.enterprisedb.com