On Fri, Mar 30, 2018 at 07:50:09AM +0300, Alexey Tourbin wrote:
> On Thu, Mar 29, 2018 at 7:55 PM, Vladimir D. Seleznev
> <vsele...@altlinux.org> wrote:
> > Hello, rpm-maint@!
> >
> > There are RFC patches which implement RPMTAG_IDENTITY calculation.
> >
> > The main idea is that RPMTAG_IDENTITY contains a hash of as many as 
> > possible,
> > ideally all RPMTAGs, with exception of that that principally cannot be
> > reproducible and that we don't want to make it reproducible. Another 
> > exception
> > is for these tags that we want to use in certain cases, but only for these 
> > tags
> > that aren't relevant to result of package build. So value of 
> > RPMTAG_IDENTITY is
> > calculating by blacklist filtered tags for each package.
> 
> Hello,
> 
> So previously you outlined two use cases for RPMTAG_IDENTITY:
> - stricter dependencies between subpackages, presumably to replace
> Requires: %name = %EVR with something like %name = %EVR-%{NameBuildID}
> or %name = id:%{NameBuildId};
> - verifying if a build is reproducible, in which case %{NameBulidID}s
> should stay the same across rebuilds.
> 
> Let's start with stricter dependencies between subpackages.  My first
> observation is that end users, who update only from
> centralized/verified repositories, do not benefit from stricter
> dependencies.  That's because either of the following holds:
> - rpm does not permit updating packages with %EVR unchanged; or even if it 
> does,
> - the build system which serves a centralized repo does not permit
> inplace package updates; or even if it does,
> - the package manager atop of rpm doesn't pull packages with unchanged
> %EVR; or even if it does, it can pull and update all installed
> subpackages at once.

This does not work if end user makes spot-upgrade (I'm not sure about
the term) of one package of package set from single build: just this one
package will be upgraded.

> So there is no plausible way for an end user to end up with
> subpackages (no pun intended) from different build sets.  There is a
> way to end up with subpackages from different build sets for
> developers who do incremental package builds without bumping the
> release.  But said developers, I perhaps among them, must somehow
> learn to solve their problems without involving everybody else.  So my
> second observation is that there indeed exist some facilities which
> only beg to be used and render the whole issue a non-issue.
> 
> For example, if you do incremental builds in a Mandriva-based Russian
> distro, you should try this:
> 
> $ sudo rpm -Fv RPMS.hasher/*.rpm
> 
> This will "freshen" all installed packages, and that I think is the
> best way to handle different build sets.  The only case where it
> doesn't work nearly as perfectly is when the set of subpackage names
> can change in an arbitrary way. But neither will RPMTAG_IDENTITY
> handle perfectly all such situations!  Thus my third observation is
> that the problem has not been examined properly from a mathematical
> standpoint.  Using subpackages from a single build set is a stronger
> requirement which cannot be satisfied by simply producing stricter
> dependencies within connected components.
> 
> By the way, I believe there might be legitimate reasons for partial
> upgrades, on the premise that one knows what he or she is doing. For
> example, if I make changes to a library, I may want to update only the
> library subpackage.

You mean upgrade library from one build set and leave rest of packages
built from another one installed in the system? This seems dangerous.
That means that partial upgrades may cause your system broken, i.e.
partial upgrades will be unsupported.

> Or, if there is a big noarch subpackage with
> data, I have every reason to leave it alone.
> 
> This further brings the problem of noarch subpackage.  They are
> supposed to be installable on any architecture, but stricter
> subpackage dependencies can change that.  Let's do some case analysis:
> - arch->noarch, i.e. a binary package requires its base noarch
> subpackage; this will result in very rigorous requirements to noarch
> subpackages: they must hatch byte-to-byte identical on every
> architecture, or else the dependency will be broken.  This might
> actually make sense, or it might not.  I'm inclined towards the
> latter, here's why: strict dependencies between subpackages is a very
> basic mechanism, while the identity of noarch packages, the right
> amount of it, is subject to interpretation, and is a matter of policy.
> So the build system should orchestrate synchronous builds across
> architectures and then check if noarch subpackages are identical
> enough, according to its policy.  Shifting the responsibility down to
> rpm would compromise the mechanism/policy distinction.
> - noarch->arch, i.e. a noarch subpackage requires its base binary
> package; with stricter dependencies, that would be outright wrong,
> because noarch subpackage can't know byte-to-byte specifics of binary
> packages, and the dependency will be broken one way or the other, most
> of the time;

Yes, we also thought about that, and that actually makes noarch packages
arch, and that makes not sense.

Furthermore, rebuild can produce some packages with same identity, and
some with different, from previous build. This broke idea of
identity-based dependencies.

> - further amendments to how strict dependencies are propagated between
> subpackages must be made to build/interdep.c; since interdep.c is not
> part of rpm.org, I'll omit the details.

I know, I wrote some draft code which replaced "%name = %EVR" requires
with ".${BUILDID}-%name", and add this to provides to packages that
provides "%name = %EVR". This was done right after processInterdep,
where $BUILDID was same across all subpackages. It is the question what
value should be in $BUILDID. I also thought that "%name = %EVR" may be
replaced with something like ".%name-%EVR+b$ITER" where $ITER is binary
rebuild iteration. Neither $BUILDID or $ITER are not environment
variables in examples above and are presented that way for differ them
from RPM macros.

Then we assume that values for either $BUILDID or $ITER should be set by
builder via some mechanism to rpmbuild. For example, that mechanism can
be certain environment variable. And rpmbuild should replace NEVR-based
dependencies with more strict only in case if that mechanism is handled.
But for now this is just cogitations about how to solve the problem. We
can discuss this, but it is already too ALT specific and we want to
solve even more ALT specific tasks with that and I don't know if there
is a best place for this discussion.

> > Previously I wrote that RPMTAG_IDENTITY value will be used to generate more
> > strict interpackage dependencies, but we turn away from it because identity 
> > of
> > binary packages of two builds from one source package can be same for some
> > packages and differ for others, and it brings collision for them.
> 
> So I actually was intrigued and waited to see your patches, in
> particular how you handle dependencies, before expressing my opinion,
> but it turns out there will be no patches regarding dependencies.
> 
> Well, this leaves the case of build id.  Suppose you built a package,
> and you want to know its build id.  So you open the package and read
> its build id from the header.  Further suppose that the package is
> stored on a hard drive (a very plausibly assumption indeed).  Further
> suppose the drive makes about 6,000 revolutions per minute, so it
> takes about 0.01s to start reading the header.  About a megabyte can
> be read in another 0.01s, an average header being much smaller.
> According to blake2.net, data can be hashed at a speed of about 900
> Mb/s, so it will take about 0.001s or less to recalculate the build id
> on the fly.  The thing is, it's just reading the header that is
> already expensive; once you have the data, calculating the hash is
> cheap, and the difference is more than an order of magnitude.  The
> difference in speed will be less pronounced with SSD.  Still, you need
> to read at least 4K, because that's how filesystems work.  So putting
> RPMTAG_IDENTITY into the signature header won't reduce nearly as much
> overhead as you might hope.

-- 
   With best regards,
   Vladimir D. Seleznev
_______________________________________________
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint

Reply via email to