Re: RFC: Unified package metadata format

2017-04-19 Thread Guillem Jover
Hi!

On Tue, 2017-04-04 at 07:20:00 +, Niels Thykier wrote:
> Guillem Jover:
> > I don't think this is correct. Initial whitespace gets ignored (this is
> > not clear from mtree(5), but that's what the various implementations do).
> > The subset of type of lines I'm intending to support would be:
> > 
> >   ,---
> >   /set 
> >   /unset 
> >   . 
> >   # comment
> >   
> >   `---
> > 
> 
> In this variant, can /set and /unset be interleaved between paths?  My
> previous PoC implementation did this and I was wondering if it would be
> supported. :)

Yes, like in the common mtree(5) format theese commands can be
interleaved, so you can set or unset defaults for the next entries.

> > No indented nor continuation lines, no relative paths, no ".." entries.
> > 
> > Then, my idea would be to further distinguish two types of mtree files,
> > template and detailed. The first would allow the globs permitted in the
> > spec ('[', ']', '?', '*'), and possibly also the "ignore" keyword. The
> > second would not. Template mtree would be used in source packages, and
> > would be used to generate the data.tar in the .deb and possibly part of
> > the detailed mtree in the control.tar, and of course the detailed mtree
> > in the db.
> > 
> 
> The "ignore" keyword being something like "anything that matches
> /except/ whatever matches the ignore keyword"?

The keyword is described in the man page as:

  "Ignore any file hierarchy below this file."

> > At which point I'd get the mtree support integrated
> > for the db as the first stage, and then we can start pondering about
> > how to transport additional metadata in the .deb as the second stage,
> > and finally about the templating mode. The last one might be more
> > involved as it will most probably require adding support for built-in
> > tar packing. But the second stage would allow to already test stuff
> > by manually crafting .debs or writing an external helper or tool to
> > inject that metadata, so this should allow us to experiment and not
> > block on the whole thing being finalized.

> I still have a debhelper tool to generate the mtree format, which I am
> happy to revive.  In particular, I would be delighted if that revival
> means we can migrate debhelper away from "chown" and just use the format
> to describe ownership.

See the updates I mentioned to the wiki; something I realized while
pondering a bit more about all this, is that we cannot safely only
ship the user/group information in the mtree files (and not in the
tar entry) as then older dpkg would be unable to set the correct owner
and permissions. So the mtree shipped would only contain information
not already present in the tar entry.

Regards,
Guillem



Re: RFC: Unified package metadata format

2017-04-17 Thread Josh Triplett
Matthew Garrett wrote:
> Debian package unified metadata format

In general, this looks like a good idea.  Having this in a declarative
form would have a variety of good uses.

> Format:
>
> The file shall be stored within the control archive with the name
> “mtree” and shall start with the following string:
>
> #mtree v2.0
>
> Each entry shall be of the form
>
>  /path/name key1=foo key2=bar
>
> Ie, a leading space, a slash, and the path name of the installed file
> followed by a series of space-separated key=value pairs followed by a
> line feed. The following keys are supported (extracted from mtree(5)):

I'd assume that some of the quirks, like the leading space, come from
the original mtree format?

What escaping mechanisms exist for filenames containing spaces?  Or
newlines?

At least the former *does* exist in Debian.  And I could imagine the
latter existing in packaged testsuite data.

> * md5 - the MD5 message digest of the file
> * md5digest - a synonym for md5
> * sha1 - the FIPS 160-1 (“SHA-1”) message digest of the file
> * sha1digest - a synonym for sha1
> * sha256 - the FIPS 180-2 (“SHA-256”) message digest of the file
> * sha256digest - a synonym for sha256

Backward compatibility with existing mtree implementations aside, any
reason a new standard for using mtree in package metadata should allow
md5 or sha1 at all?

> * Are any other keys required? Should dpkg-divert be implemented using
> this format?

I'd love to see things like alternatives implemented declaratively using
this format; doesn't seem that hard to add keys like
alternative.name=vi alternative.priority=...
(as well as "alternative.parent" or similar to use on manpages that
should switch with their parent application).

But at the same time, I'd suggest getting to a baseline implementation
with a minimum of bikeshedding, and then slowly moving other things
over.



Re: RFC: Unified package metadata format

2017-04-04 Thread Matthew Garrett
On Mon, Apr 3, 2017 at 6:58 PM, Guillem Jover  wrote:
> On Tue, 2017-03-28 at 16:22:58 -0700, Matthew Garrett wrote:
>> Each entry shall be of the form
>>
>>  /path/name key1=foo key2=bar
>
>> Ie, a leading space, a slash, and the path name of the installed file
>> followed by a series of space-separated key=value pairs followed by a
>> line feed. […]
>
> I don't think this is correct. Initial whitespace gets ignored (this is
> not clear from mtree(5), but that's what the various implementations do).
> The subset of type of lines I'm intending to support would be:

Ah, using a leading space means you can start a full path with /
without it being interpreted as a special command. Just starting with
a . instead seems reasonable.

> Then, my idea would be to further distinguish two types of mtree files,
> template and detailed. The first would allow the globs permitted in the
> spec ('[', ']', '?', '*'), and possibly also the "ignore" keyword. The
> second would not. Template mtree would be used in source packages, and
> would be used to generate the data.tar in the .deb and possibly part of
> the detailed mtree in the control.tar, and of course the detailed mtree
> in the db.

Ok, that makes sense. dpkg would never need to know anything about the
template files, then?

>> […] The following keys are supported (extracted from mtree(5)):
>>
>> * md5digest - a synonym for md5
>> * sha1digest - a synonym for sha1
>> * sha256digest - a synonym for sha256
>
> On my WIP code, I've ignored these keywords, because they are just too
> verbose, and I don't the see point with them. We are going to be
> incompatible anyway with standard mtree(5), so… :)

Fair enough.

> I've also got "contents" to represent hardlinks, "time", "ignore" and
> "optional" (but a "class" might make more sense, to be able to specify
> the file as "class=conffile" or "log", "otional/ghost" and similar.
> There's also "nlink" which I should probably drop as it does not make
> much sense for dpkg's purposes.

Makes sense.

>> The following keys are supported but not present in mtree(5):
>> * major - the major number of a device node
>> * minor - the minor number of a device node
>
> Actually some implementations define a "device" keyword, but it seems
> a bit of a mess, given that the major/minor within are OS specific
> anyway. So, my thinking was to probably ignore this one.

If debs can contain device nodes then it feels like we should probably
have a mechanism for verifying whether the ones on disk match the ones
that were in the package?

>> * override.* - if present, will override the contents of a key
>> applying to the same file. This may be used to apply local system
>> policy and must not be present in shipped files.
>
> I'm not sure I see the use for this?

If there were a single mtree that was an authoritative source of
truth, we'd presumably want both the original data and any data that's
been modified by dpkg-statoverride. If we keep a separate statoverride
file then this isn't necessary.

>> * Are any other keys required? Should dpkg-divert be implemented using
>> this format?
>
> Hmm not sure, did you really mean dpkg-divert and not
> dpkg-statoverrides?

No, I meant divert here - if the idea is to be able to parse the mtree
and compare it to the filesystem, some mechanism to indicate when the
user has used dpkg-divert seems helpful.

Otherwise this sounds good!



Re: RFC: Unified package metadata format

2017-04-04 Thread Niels Thykier
Guillem Jover:
> Hi!
> 
> On Tue, 2017-03-28 at 16:22:58 -0700, Matthew Garrett wrote:
>> I'm looking at implementing support for IMA file signatures inside
>> dpkg. The previous patches posted for this
>> (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850340) did so
>> using extended PAX metadata, but people didn't seem terribly
>> enthusiastic about that.
>> https://wiki.debian.org/Teams/Dpkg/Spec/MetadataTracking suggested
>> mtree as a potential format, so I thought I'd try to kick off some
>> discussion and see whether I'm missing any requirements or whether
>> there were any better ideas. So:
> 
> As mentioned on IRC, I updated the wiki with some more thoughts.
> Regarding PAX, it's my intention to extend and merge those patches
> for 1.19.x, but as stated there, I'm not entirely sure that's the best
> way (currently) to transport that metadata.
> 

Glad to see this moving forward. :)

> [...]
>> Each entry shall be of the form
>>
>>  /path/name key1=foo key2=bar
> 
>> Ie, a leading space, a slash, and the path name of the installed file
>> followed by a series of space-separated key=value pairs followed by a
>> line feed. […]
> 
> I don't think this is correct. Initial whitespace gets ignored (this is
> not clear from mtree(5), but that's what the various implementations do).
> The subset of type of lines I'm intending to support would be:
> 
>   ,---
>   /set 
>   /unset 
>   . 
>   # comment
>   
>   `---
> 

In this variant, can /set and /unset be interleaved between paths?  My
previous PoC implementation did this and I was wondering if it would be
supported. :)

> No indented nor continuation lines, no relative paths, no ".." entries.
> 
> Then, my idea would be to further distinguish two types of mtree files,
> template and detailed. The first would allow the globs permitted in the
> spec ('[', ']', '?', '*'), and possibly also the "ignore" keyword. The
> second would not. Template mtree would be used in source packages, and
> would be used to generate the data.tar in the .deb and possibly part of
> the detailed mtree in the control.tar, and of course the detailed mtree
> in the db.
> 

The "ignore" keyword being something like "anything that matches
/except/ whatever matches the ignore keyword"?

>> [...]
> 
> 
> My current working plan is to get the last items for 1.18.x out,
> ideally this week or the next. Then immediately branch 1.18.x and open
> 1.19.x on master.

Sounds great (from my PoV). :)

> At which point I'd get the mtree support integrated
> for the db as the first stage, and then we can start pondering about
> how to transport additional metadata in the .deb as the second stage,
> and finally about the templating mode. The last one might be more
> involved as it will most probably require adding support for built-in
> tar packing. But the second stage would allow to already test stuff
> by manually crafting .debs or writing an external helper or tool to
> inject that metadata, so this should allow us to experiment and not
> block on the whole thing being finalized.
> 
> Thanks,
> Guillem
> 

I still have a debhelper tool to generate the mtree format, which I am
happy to revive.  In particular, I would be delighted if that revival
means we can migrate debhelper away from "chown" and just use the format
to describe ownership.

I intend to do an upload of debhelper to experimental later and am happy
to include it there if it would be any help.

Thanks,
~Niels




Re: RFC: Unified package metadata format

2017-04-03 Thread Guillem Jover
Hi!

On Tue, 2017-03-28 at 16:22:58 -0700, Matthew Garrett wrote:
> I'm looking at implementing support for IMA file signatures inside
> dpkg. The previous patches posted for this
> (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850340) did so
> using extended PAX metadata, but people didn't seem terribly
> enthusiastic about that.
> https://wiki.debian.org/Teams/Dpkg/Spec/MetadataTracking suggested
> mtree as a potential format, so I thought I'd try to kick off some
> discussion and see whether I'm missing any requirements or whether
> there were any better ideas. So:

As mentioned on IRC, I updated the wiki with some more thoughts.
Regarding PAX, it's my intention to extend and merge those patches
for 1.19.x, but as stated there, I'm not entirely sure that's the best
way (currently) to transport that metadata.

In any case, here are some more comments on the following:

> Debian package unified metadata format
> 
> Format:
> 
> The file shall be stored within the control archive with the name
> “mtree” and shall start with the following string:
> 
> #mtree v2.0

Also as mentioned on IRC some time ago, it seems this format didn't
quite catch up in general, and at least libarchive has now dropped the
magic value support from its code, NetBSD as the originator still
handles it though, and I think it still makes sense to mark the files.

One of my main references has been the NetBSD implementation as
described at:

  

the other being the libarchive one.

> Each entry shall be of the form
> 
>  /path/name key1=foo key2=bar

> Ie, a leading space, a slash, and the path name of the installed file
> followed by a series of space-separated key=value pairs followed by a
> line feed. […]

I don't think this is correct. Initial whitespace gets ignored (this is
not clear from mtree(5), but that's what the various implementations do).
The subset of type of lines I'm intending to support would be:

  ,---
  /set 
  /unset 
  . 
  # comment
  
  `---

No indented nor continuation lines, no relative paths, no ".." entries.

Then, my idea would be to further distinguish two types of mtree files,
template and detailed. The first would allow the globs permitted in the
spec ('[', ']', '?', '*'), and possibly also the "ignore" keyword. The
second would not. Template mtree would be used in source packages, and
would be used to generate the data.tar in the .deb and possibly part of
the detailed mtree in the control.tar, and of course the detailed mtree
in the db.

> […] The following keys are supported (extracted from mtree(5)):
> 
> * md5digest - a synonym for md5
> * sha1digest - a synonym for sha1
> * sha256digest - a synonym for sha256

On my WIP code, I've ignored these keywords, because they are just too
verbose, and I don't the see point with them. We are going to be
incompatible anyway with standard mtree(5), so… :)

> * gid - the file group as a numeric value
> * gname - the file group as a symbolic name
> * md5 - the MD5 message digest of the file
> * sha1 - the FIPS 160-1 (“SHA-1”) message digest of the file
> * sha256 - the FIPS 180-2 (“SHA-256”) message digest of the file
> * mode - the file’s permissions as a numeric (octal) value
> * uid - the file owner as a numeric value
> * uname - the file owner as a symbolic name
> * size - the size, in bytes, of the file
> * link - the file referenced by a symbolic link
> * type - The type of the file; may be set to any one of the follow:
>   * block - block special device
>   * char - character special device
>   * dir - directory
>   * fifo - fifo
>   * file - regular file
>   * link - symbolic link
>   * socket - socket

I've also got "contents" to represent hardlinks, "time", "ignore" and
"optional" (but a "class" might make more sense, to be able to specify
the file as "class=conffile" or "log", "otional/ghost" and similar.
There's also "nlink" which I should probably drop as it does not make
much sense for dpkg's purposes.

> The following keys are supported but not present in mtree(5):
> * major - the major number of a device node
> * minor - the minor number of a device node

Actually some implementations define a "device" keyword, but it seems
a bit of a mess, given that the major/minor within are OS specific
anyway. So, my thinking was to probably ignore this one.

> * xattr.* - a base64-encoded extended attribute that will be
> associated with the file if the underlying filesystem supports
> extended attribute. The name of the attribute will follow the “xattr.”
> string - eg, 
> “xattr.security.selinux=dW5jb25maW5lZF91Om9iamVjdF9yOnVzZXJfaG9tZV90OnMwAA==”
> would set the security.selinux extended attribute to
> unconfined_u:object_r:user_home_t:s0. This format is present in
> go-mtree.

This sounds good! I don't think all xattrs would need to be
base64-encoded, but over-encoding only takes more space so that's
always safe. :)

> * override.* - if present, 

Re: RFC: Unified package metadata format

2017-03-30 Thread Russ Allbery
Matthew Garrett  writes:

> * Users auditing their systems can have full kernel-enforced
> cryptographic assurance that the files they have on disk match the
> files that Debian shipped. Doing that otherwise would involve you
> having to take the machine offline.

I would very much like to have this as well.  This sort of thing makes it
much easier to build out a maintainable FIM system that doesn't require
people constantly whitelist new binaries manually.

> * Even Debian users may (for security or other policy reasons) want to
> configure systems so that they only run binaries that are provided
> through some trusted distribution mechanism.

Yes.  Consider, for example, a Kerberos KDC or other security-critical
system, where you may want to have some automated system for explicitly
blessing a subset of the archive and specific versions of packages and not
allow anything else.

-- 
Russ Allbery (r...@debian.org)