Re: reproducible builds with btrfs seed feature

2018-10-13 Thread Chris Murphy
On Sat, Oct 13, 2018 at 4:28 PM, Chris Murphy  wrote:
> Is it practical and desirable to make Btrfs based OS installation
> images reproducible? Or is Btrfs simply too complex and
> non-deterministic? [1]
>
> The main three problems with Btrfs right now for reproducibility are:
> a. many objects have uuids other than the volume uuid; and mkfs only
> lets us set the volume uuid
> b. atime, ctime, mtime, otime; and no way to make them all the same
> c. non-deterministic allocation of file extents, compression, inode
> assignment, logical and physical address allocation

d. generation, just pick a consistent default because the entire image
is made with mkfs and then never rw mounted so it's not a problem

> - Possibly disallow subvolumes and snapshots

There's no actual mechanism to do either of these with mkfs, so it's
not a problem. And if a sprout is created, it's fine for newly created
subvolumes to follow the usual behavior of having unique UUID and
incrementing generation. Thing is, the sprout will inherit the seeds
preset chunk uuid, which while it shouldn't cause a problem is a kind
of violation of uuid uniqueness; but ultimately I'm not sure how big
of a problem it is for such uuids to spread.



-- 
Chris Murphy


reproducible builds with btrfs seed feature

2018-10-13 Thread Chris Murphy
Is it practical and desirable to make Btrfs based OS installation
images reproducible? Or is Btrfs simply too complex and
non-deterministic? [1]

The main three problems with Btrfs right now for reproducibility are:
a. many objects have uuids other than the volume uuid; and mkfs only
lets us set the volume uuid
b. atime, ctime, mtime, otime; and no way to make them all the same
c. non-deterministic allocation of file extents, compression, inode
assignment, logical and physical address allocation

I'm imagining reproducible image creation would be a mkfs feature that
builds on Btrfs seed and --rootdir concepts to constrain Btrfs
features to maybe make reproducible Btrfs volumes possible:

- No raid
- Either all objects needing uuids can have those uuids specified by
switch, or possibly a defined set of uuids expressly for this use
case, or possibly all of them can just be zeros (eek? not sure)
- A flag to set all times the same
- Possibly require that target block device is zero filled before
creation of the Btrfs
- Possibly disallow subvolumes and snapshots
- Require the resulting image is seed/ro and maybe also a new
compat_ro flag to enforce that such Btrfs file systems cannot be
modified after the fact.
- Enforce a consistent means of allocation and compression

The end result is creating two Btrfs volumes would yield image files
with matching hashes.

If I had to guess, the biggest challenge would be allocation. But it's
also possible that such an image may have problems with "sprouts". A
non-removable sprout seems fairly straightforward and safe; but if a
"reproducible build" type of seed is removed, it seems like removal
needs to be smart enough to refresh *all* uuids found in the sprout: a
hard break from the seed.

Competing file systems, ext4 with make_ext4 fork, and squashfs. At the
moment I'm thinking it might be easier to teach squashfs integrity
checking than to make Btrfs reproducible.  But then I also think
restricting Btrfs features, and applying some requirements to
constrain Btrfs to make it reproducible, really enhances the Btrfs
seed-sprout feature.

Any thoughts? Useful? Difficult to implement?

Squashfs might be a better fit for this use case *if* it can be taught
about integrity checking. It does per file checksums for the purpose
of deduplication but those checksums aren't retained for later
integrity checking.

[1] problems of reproducible system images
https://reproducible-builds.org/docs/system-images/

[2] purpose and motivation for reproducible builds
https://reproducible-builds.org/

[3] who is involved?
https://reproducible-builds.org/who/#Qubes%20OS




-- 
Chris Murphy