On 5/29/23 05:15, David A. Wheeler wrote:
Here's an example that might clarify the threat model.
It's possible that a
program could look for ".gitignore" and run it if present.
The source code repo might not have a .gitignore file,
but the malicious package added .gitignore and filled it with
a malicious application. That would cause malicious code to
be executed, but it would also be *highly* suspicious to
run a ".gitignore" file (that's *not* what they are for), so
it's reasonable to assume that the source code didn't do that.
If an attacker can insert a file that *would* cause malicious code
to execute in a reasonably-coded app, then that *would* be a problem.
"What's reasonable" is hard to truly write down, but a
whitelisted list of specific filenames seems like a reasonable place
to start.

I think the pypi example and missing .gitignore file is more about "git and pypi are both a VCS, did the author commit the same source code". It's about "what's the canonical source code release" instead of a real build.

It's the famous disconnect of "our engineers reviewed the source code they got from `git clone`, but our servers use source code from a package registry (or whatever source code a debian maintainer uploaded into the debian archive)".

For my "how to evade a semantic diff" exercise you would probably not bluntly add a new file, but instead find a complex file format (one that gets interpreted by some other, complex program maybe?) and then try to find blind spots in the diff tool that are useful for exploit development.

These aren't hard to find, for example diffoscope doesn't have a good understanding of extended attributes in tar files and will only flag them with a binary diff if it couldn't find any semantic differences.

If you intentionally introduce a benign difference for diffoscope to pick up on (like changing a timestamp by a few seconds), diffoscope is going to cite this as an explanation why the files aren't binary-equal and stops further investigation.

I've already explored semantic diff evasion for multiple months but unfortunately didn't have time to blog about it.

---

I don't think it's a worthwhile activity to try to build security controls on top of it, it sounds more like a code-review problem. Source code inputs are commonly pinned by their sha256sum, so it's very clear what should be reviewed, with no ambiguity of some .gitignore being present or absent.

cheers,
kpcyrd

Reply via email to