Re: Introducing: Semantically reproducible builds

kpcyrd Mon, 29 May 2023 09:41:25 -0700

On 5/29/23 05:15, David A. Wheeler wrote:

Here's an example that might clarify the threat model.
It's possible that a
program could look for ".gitignore" and run it if present.
The source code repo might not have a .gitignore file,
but the malicious package added .gitignore and filled it with
a malicious application. That would cause malicious code to
be executed, but it would also be *highly* suspicious to
run a ".gitignore" file (that's *not* what they are for), so
it's reasonable to assume that the source code didn't do that.
If an attacker can insert a file that *would* cause malicious code
to execute in a reasonably-coded app, then that *would* be a problem.
"What's reasonable" is hard to truly write down, but a
whitelisted list of specific filenames seems like a reasonable place
to start.

I think the pypi example and missing .gitignore file is more about "gitand pypi are both a VCS, did the author commit the same source code".It's about "what's the canonical source code release" instead of a realbuild.

It's the famous disconnect of "our engineers reviewed the source codethey got from `git clone`, but our servers use source code from apackage registry (or whatever source code a debian maintainer uploadedinto the debian archive)".

For my "how to evade a semantic diff" exercise you would probably notbluntly add a new file, but instead find a complex file format (one thatgets interpreted by some other, complex program maybe?) and then try tofind blind spots in the diff tool that are useful for exploit development.

These aren't hard to find, for example diffoscope doesn't have a goodunderstanding of extended attributes in tar files and will only flagthem with a binary diff if it couldn't find any semantic differences.

If you intentionally introduce a benign difference for diffoscope topick up on (like changing a timestamp by a few seconds), diffoscope isgoing to cite this as an explanation why the files aren't binary-equaland stops further investigation.

I've already explored semantic diff evasion for multiple months butunfortunately didn't have time to blog about it.

---

I don't think it's a worthwhile activity to try to build securitycontrols on top of it, it sounds more like a code-review problem. Sourcecode inputs are commonly pinned by their sha256sum, so it's very clearwhat should be reviewed, with no ambiguity of some .gitignore beingpresent or absent.


cheers,
kpcyrd

Re: Introducing: Semantically reproducible builds

Reply via email to