I've written a nascent Rust binding for rpmlib:
First of all, I would love any feedback. This is my first deep dive into the
internals of RPM, and while I found ample documentation about certain things, a
lot of what I was doing was guesswork.
I'm also not super up-to-speed on RPM nomenclature but I've largely tried to
mirror the structure and terminology rpmlib itself uses. I've already gotten
some feedback that it's "Red Hat", not "RedHat", and I probably shouldn't be
calling it the "RedHat Package Manager", so it seems I could use some advice on
appropriate project branding. Is "rpmlib" a good name for the Rust crate?
I'd also be interested in upstreaming this if there's interest. I'm presently
the sole author and am happy to relicense everything under LGPL/GPL, sign CLAs,
transfer copyright or what have you, as well as transfer control of the
packages on https://crates.io
If there's one thing in particular I'd like clarification and feedback on it's
my understanding rpmlib's memory model. Rust's claim to fame is the compiler's
ability to prove properties about the lifetime relationships of objects in the
program, and thereby provide safe "zero cost" (i.e. zero copy) abstractions
which might otherwise be deemed "risky" in practically any other language due
to the potential for use-after-free errors. I am trying to apply this approach
in my Rust binding.
In RPM nomenclature I am using `HEADERGET_MINMEM` with the goal of directly
accessing memory owned by RPM and thereby providing a zero-copy API but also
providing safe abstractions for doing so. Doing that correctly involves
describing the precise memory relationships to the Rust compiler.
First I'll say Rust's notion of memory safety applies to multithreaded
programs, and in that regard I have just thrown a big mutex across the whole
rpmlib FFI. In particular right now creating a transaction set also acquires
the mutex and does not release it until it is complete. tl;dr: assume
sequential/single-threaded access for now.
I've largely been following this guide which provided much of the lifetime
information I was looking for for things like transaction sets and iterators:
(The combination of `docs-old` and `Draft` doesn't bode particularly well,
If I have one immediate takeaway: transaction sets everywhere! Which is
something I can get on board with. In almost all cases (aside from reading the
initial config/rpmrc and configuring macro contexts) everything seems to being
with a transaction set, so that is the initial lifetime of importance from a
Regarding that, the referenced documentation suggests things like:
> When you are done with a transaction set, call rpmtsFree:
> rpmts rpmtsFree(rpmts ts);
Rust has a trait that automatically frees things for you at the end of their
lifetime called `Drop`, and it's the sort of RAIA pattern you might expect. I
have implemented free for both transaction sets and iterators using this trait
based on this documentation.
Where things got a bit hazy was actually using `HEADERGET_MINMEM`. Quoting the
> You do not need to free the Header returned by rpmdbNextIterator. Also, the
> next call to rpmdbNextIterator will reset the Header.
What I took this to mean, in Rustier nomenclature, is that the lifetime of a
`Header` referenced by an iterator is valid until the next item is requested
from the iterator. This is a bit different from a typical Rust `Iterator`,
which provides a read-only view of a collection, where it's safe to have
references to multiple items in the collection at once.
There is a Rust pattern for what I think this is describing though, which is a
`StreamingIterator` and the one I chose to use:
The idea of `StreamingIterator` is you iterate by borrowing a value from the
iterator. Before you move on, you must give that value back. From a nitty
gritty perspective, it actually does this by splitting up the iteration into
two steps: one where you ask the iterator to mutate itself and "preload" the
value into a buffer, and another where you immutably borrow that value from the
iterator, with a lifetime guaranteed to end before you request the next item.
The relevant code is here:
The two relevant lifetimes are `'db` and `'ts` for "database" and "transaction
set" respectively, with transaction set having the longest lifetime.
In my usage of `HEADERGET_MINMEM` I have assumed the lifetime of a borrowed
`Header` is only valid until the next one is requested, and users of the API
must drop any references to the previous `Header` value before requesting the
next. More nitty gritty details: it does this by means of Rust's affine type
system: since getting the next value requires a mutable reference to the
iterator, it's explicitly disallowed to obtain one of the previous header value
is in any way aliased. Programs which wish to make progress iterating must make
the borrow checker happy by dropping the previous value first or they will be
rejected by the compiler.
## Totally Bogus Code: Tag Data Parsing
I mapped "tag data" (not quite sure that's the right term, but what I'm meaning
to describe is the values in headers that correspond to tags) onto a Rust
enum/sum type, which I'd like to say is pretty awesome:
...except for the part where it's all half-implemented and untested. Strings
work, I think?
Things I wasn't entirely clear on:
- What can I assume about the character sets of `STRING` VS `I18NSTRING`?
- What exactly is the `count` member of the `rpmtd_s` struct for and how does
it relate to the various data types?
- Where do I get the length of binary data? Is it `count`?
- What's the character encoding of char? Is it 1 byte? What is it used for? I
presently assume it's a 1-byte ASCII character.
- How do string array types work? Is the length `count`? I've left them
completely unimplemented for now
## RPM Signing
Last but not least: if there's one particularly interesting thing I'd like to
do, it's use `librpmsign.so` to sign an RPM, but swap in some Rust code to
perform the actual digital signature operation i.e. using GPG via librpmsign to
handle the digest computation and serializing of the signature, but swapping
out the actual cryptographic primitive. I've been working on a Rust library for
digital signatures supporting multiple software and hardware backends and one
of the use cases I'm most interested in is RPM signing, specifically with keys
kept in secure enclaves / hardware devices.
I look forward to any input/clarifications, and again would be happy to work
toward upstreaming this code if there is interest.
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
Rpm-maint mailing list