On 04/04/2018 05:32 PM, Vladimir D. Seleznev wrote:
On Wed, Apr 04, 2018 at 09:18:35AM -0400, Jeff Johnson wrote:


On Apr 4, 2018, at 5:47 AM, Panu Matilainen <pmati...@redhat.com> wrote:

On 04/04/2018 11:01 AM, Panu Matilainen wrote:
[...]
+static int identityTag(Header h, rpmtd td, headerGetFlags hgflags)
+{
+    int rc = 1;
+    unsigned int bsize;
+    void * hb;
+    DIGEST_CTX ctx;
+    char * identity = NULL;
+
+    struct rpmtd_s ttd;
+    Header ih = headerNew();
+    HeaderIterator hi = headerInitIterator(h);
+
+    while (headerNext(hi, &ttd) && rc) {
+    if (rpmtdCount(&ttd) > 0) {
+        if (!identityFilter(ttd.tag))
+        rc = headerPut(ih, &ttd, HEADERPUT_DEFAULT);
+    }
+    rpmtdFreeData(&ttd);
+    }
+    headerFreeIterator(hi);
+
+    if (!rc)
+    return 0;
+
+    hb = headerExport(ih, &bsize);

Forgot to bring this up on the first round: another question is
whether it actually makes sense to go through all this trouble of
copying the header, then exporting it and calculating the digest
from that.

It'd be considerably cheaper to calculate a digest on the go while
iterating over the data (from the immutable region, see my other
email). A newly imported header is guaranteed to be sorted (by tags)
so it's consistent.


There is no particular reason why the IDENTITY digest should need/use
a header blob.

Any faithful transformation (e.g. using sprintf or hex strings) on the
data for the set of tags can be used for an IDENTITY digest. The
header blob implicitly includes padding, which can/will change
depending on the definition ordering even when the tag data is
identical/reproducible.

Ok.

The filtering should also be positive/inclusive rather than
negative/exclusive. While a positive list of tags to include is harder
to enumerate than the shorter list of tags to exclude, the IDENTITY
tag set will then be closed and well known.

I still don't understand why filtering should be inclusive rather than
exclusive? The only reason I see is arbitrary tags but the code
currently does not include tags beyond RPMTAG_FIRSTFREE_TAG.

With negative filtering you'll never quite know which tags were included. Older packages have things that newer packages don't, and coming from different sources who knows what patches the building rpm was carrying. And you'll be playing "what we forgot" for quite some time this way. After sleeping over it, for example:

+    case RPMTAG_CHANGELOG:

This is not an actual tag in packages, it's only used by the spec parser internally. To filter out the changelog from identity you'll need these three:

    RPMTAG_CHANGELOGTIME        = 1080, /* i[] */
    RPMTAG_CHANGELOGNAME        = 1081, /* s[] */
    RPMTAG_CHANGELOGTEXT        = 1082, /* s[] */

Other tags you'd need to filter out but current patch misses:
- RPMTAG_RPMVERSION (reflects the version used to build rpm, which is not a part of a packages identity)
- RPMTAG_OPTFLAGS (reflects building rpm configuration, not the package id)
- RPMTAG_PAYLOADFORMAT, RPMTAG_PAYLOADCOMPRESSOR, RPMTAG_PAYLOADFLAGS (these reflect rpm configuration, not package id)
- RPMTAG_OS, RPMTAG_PLATFORM (if filtering arch then surely these too)
- RPMTAG_RHNPLATFORM (older packages can have this)
- RPMTAG_VENDOR, RPMTAG_PACKAGER (typically set from configuration)
- RPMTAG_FILEDIGESTS, RPMTAG_FILEDIGESTALGO (these depend on rpm configuration, and also compiler versions, build time encoded in files etc)

...and still I didn't go through each and every tag, which is what will be needed either way. Only with blacklisting, you'll never *really* know. With whitelisting, identity instantly becomes well-defined, even if it turns out it's missing something it should have.

A random remark of the day:
RPMTAG_RELEASE you clearly want to include, but then for example in Fedora/RHEL ecosystem part of it is very commonly partially defined by %dist macro which comes from rpm config. So the package id would change depending on which distro it was built on, even if it otherwise did not change. ID as a build-time digest of the unparsed spec (and all the associated sources) would avoid those issues...


The simplest way to mark the positive/inclusive IDENTITY tag set is to
change the awk script that generates the tag table to add a marker
(like the array return marker) to identify which tags to include.
The members (and ordering) of the IDENTITY tag set might also need to
be configurable without recompiling.

May be it would be easier to add marker that tells that marked tag
should be filtered from IDENTITY by calculation? But what is best way to
do that? Should I add a new field to struct headerTagTableEntry_s? Or I
miss something?

AIUI that's exactly what Jeff was suggesting. But adding a new field to headerTagTableEntry_s would require also adding an API to access it which seems a bit weird I think. An alternative would be constructing the identityFilter() function into a .C file with eg an awk script that's then included into tagexts.c.

        - Panu -


But overall, dynamically generating the IDENTITY tag set withe a tag
extension can be deployed (and back ported and maintained) far more
easily than changing header code.

Nice job!


_______________________________________________
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint

Reply via email to