I've written a nascent Rust binding for rpmlib:


First of all, I would love any feedback. This is my first deep dive into the 
internals of RPM, and while I found ample documentation about certain things, a 
lot of what I was doing was guesswork.

I'm also not super up-to-speed on RPM nomenclature but I've largely tried to 
mirror the structure and terminology rpmlib itself uses. I've already gotten 
some feedback that it's "Red Hat", not "RedHat", and I probably shouldn't be 
calling it the "RedHat Package Manager", so it seems I could use some advice on 
appropriate project branding. Is "rpmlib" a good name for the Rust crate?

I'd also be interested in upstreaming this if there's interest. I'm presently 
the sole author and am happy to relicense everything under LGPL/GPL, sign CLAs, 
transfer copyright or what have you, as well as transfer control of the 
packages on https://crates.io

## Lifetimes

If there's one thing in particular I'd like clarification and feedback on it's 
my understanding rpmlib's memory model. Rust's claim to fame is the compiler's 
ability to prove properties about the lifetime relationships of objects in the 
program, and thereby provide safe "zero cost" (i.e. zero copy) abstractions 
which might otherwise be deemed "risky" in practically any other language due 
to the potential for use-after-free errors. I am trying to apply this approach 
in my Rust binding.

In RPM nomenclature I am using `HEADERGET_MINMEM` with the goal of directly 
accessing memory owned by RPM and thereby providing a zero-copy API but also 
providing safe abstractions for doing so. Doing that correctly involves 
describing the precise memory relationships to the Rust compiler.

First I'll say Rust's notion of memory safety applies to multithreaded 
programs, and in that regard I have just thrown a big mutex across the whole 
rpmlib FFI. In particular right now creating a transaction set also acquires 
the mutex and does not release it until it is complete. tl;dr: assume 
sequential/single-threaded access for now.

I've largely been following this guide which provided much of the lifetime 
information I was looking for for things like transaction sets and iterators:


(The combination of `docs-old` and `Draft` doesn't bode particularly well, 

If I have one immediate takeaway: transaction sets everywhere! Which is 
something I can get on board with. In almost all cases (aside from reading the 
initial config/rpmrc and configuring macro contexts) everything seems to being 
with a transaction set, so that is the initial lifetime of importance from a 
Rust perspective.

Regarding that, the referenced documentation suggests things like:

> When you are done with a transaction set, call rpmtsFree:
> rpmts rpmtsFree(rpmts ts);

Rust has a trait that automatically frees things for you at the end of their 
lifetime called `Drop`, and it's the sort of RAIA pattern you might expect. I 
have implemented free for both transaction sets and iterators using this trait 
based on this documentation.

Where things got a bit hazy was actually using `HEADERGET_MINMEM`. Quoting the 
linked documentation:

> You do not need to free the Header returned by rpmdbNextIterator. Also, the 
> next call to rpmdbNextIterator will reset the Header.

What I took this to mean, in Rustier nomenclature, is that the lifetime of a 
`Header` referenced by an iterator is valid until the next item is requested 
from the iterator. This is a bit different from a typical Rust `Iterator`, 
which provides a read-only view of a collection, where it's safe to have 
references to multiple items in the collection at once.

There is a Rust pattern for what I think this is describing though, which is a 
`StreamingIterator` and the one I chose to use:


The idea of `StreamingIterator` is you iterate by borrowing a value from the 
iterator. Before you move on, you must give that value back. From a nitty 
gritty perspective, it actually does this by splitting up the iteration into 
two steps: one where you ask the iterator to mutate itself and "preload" the 
value into a buffer, and another where you immutably borrow that value from the 
iterator, with a lifetime guaranteed to end before you request the next item.

The relevant code is here:


The two relevant lifetimes are `'db` and `'ts` for "database" and "transaction 
set" respectively, with transaction set having the longest lifetime.

In my usage of `HEADERGET_MINMEM` I have assumed the lifetime of a borrowed 
`Header` is only valid until the next one is requested, and users of the API 
must drop any references to the previous `Header` value before requesting the 
next. More nitty gritty details: it does this by means of Rust's affine type 
system: since getting the next value requires a mutable reference to the 
iterator, it's explicitly disallowed to obtain one of the previous header value 
is in any way aliased. Programs which wish to make progress iterating must make 
the borrow checker happy by dropping the previous value first or they will be 
rejected by the compiler.

## Totally Bogus Code: Tag Data Parsing

I mapped "tag data" (not quite sure that's the right term, but what I'm meaning 
to describe is the values in headers that correspond to tags) onto a Rust 
enum/sum type, which I'd like to say is pretty awesome:


...except for the part where it's all half-implemented and untested. Strings 
work, I think?

Things I wasn't entirely clear on:

- What can I assume about the character sets of `STRING` VS `I18NSTRING`?
- What exactly is the `count` member of the `rpmtd_s` struct for and how does 
it relate to the various data types?
- Where do I get the length of binary data? Is it `count`? 
- What's the character encoding of char? Is it 1 byte? What is it used for? I 
presently assume it's a 1-byte ASCII character.
- How do string array types work? Is the length `count`? I've left them 
completely unimplemented for now

## RPM Signing

Last but not least: if there's one particularly interesting thing I'd like to 
do, it's use `librpmsign.so` to sign an RPM, but swap in some Rust code to 
perform the actual digital signature operation i.e. using GPG via librpmsign to 
handle the digest computation and serializing of the signature, but swapping 
out the actual cryptographic primitive. I've been working on a Rust library for 
digital signatures supporting multiple software and hardware backends and one 
of the use cases I'm most interested in is RPM signing, specifically with keys 
kept in secure enclaves / hardware devices.

I look forward to any input/clarifications, and again would be happy to work 
toward upstreaming this code if there is interest.

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
Rpm-maint mailing list

Reply via email to