Brian,
On Sun, Oct 25, 2009 at 6:50 PM, Brian Harring ferri...@gmail.com wrote:
The proposal is pretty simple; if code modifies the vdb in any
fashion, it needs to update the mtime on a file named
'.modification_time' in the root of the vdb.
For example-
1) ${PACKAGE_MANAGER} fires ups, builds a pkg. it's now ready to
install it.
2) this step isn't strictly required, but is a zero cost safety
measure- prior to modifying the vdb, it updates the timestamp. The
reason for doing this is to protect against the manager blowing up in
some fashion and now updating the timestamp- there still is a window
if the manager breaks down during merging but it's far reduced.
3) manager does it's thing to the livefs, and to the vdb.
4) once finished, again, updates the timestamp.
This isn't an incredibly complex change. What it enables however is
package managers to get serious about optimizing access to the vdb.
For example for the 3 managers:
paludis:
installed-cache currently needs to be manually ran by the user;
specifically, the user is responsible for regenerating this cache if
they use a non paludis manager to modify the VDB. This can be
automated via checking the vdb timestamp against a stored copy of the
the vdb timestamp at the time of the cache generation.
portage:
portage maintains a set of denormalized caches of the vdb- it however
has to do validation of those caches on each access, meaning quite a
few stats. Same thing, can compare timestamp from current vdb to when
it was generated to identify if it is no longer authorative.
pkgcore:
pkgcore maintains a denormalized old style virtuals cache- same thing
w/ portage, it has to do validation (stat'ing) whenever it uses that
cache to ensure the data is accurate. Same thing, can compare
timestamp from current vdb to whenit was generated to identify if it
is no longer authorative.
The existing vdb caching could all be modified to use this timestamp.
One stat in the best (common) case, instead of having to either scan
the whole vdb each time or doing a subset of stats.
This change enables further caching/denormalization of the vdb data
while maintaining the old format- basically, it allows the manager to
build out a helluva lot faster access to the vdb while keeping on
disk compatibility in /var/db/pkg.
Now unfortunately since the vdb is not format versioned in any
fashion, to get this timestamp we have to do the following-
1) nudge everyone who has code poking into the vdb to update their
code to update the timestamp
2) sit on our hands for N months until such time we've deemed
everyone we care about has upgraded
3) push out a new release, and start pushing out versions of the
managers/vdb consumers that use this timestamp instead of just
updating it.
For anyone who has been around gentoo for a couple of years, this is a
pretty familiar pattern- eapi, profile changes, etc, all go through
this unfortunately.
That's the core of the proposal; there is a ticket open
( http://bugs.gentoo.org/290428 ) regarding this although there is
some debate from ciaran which I'll try to now summarize, along w/ the
counterarguments.
1) do a new vdb.
Counter: this mechanism provides a way to synchronize the new vdb
while maintaining the old during it's transition period, so this is
needed anyways. Further, pinning all of our optimization hopes on a
new vdb is daft- it's been discussed for 5+ years now and still
hasn't materialized (pkgcore has been able to have a new vdb for
several years, but without a synchronization mechanism it would
require locking users into the new format and locking out old
consumers of the vdb- an unfriendly choice to push on users, hence
never being implemented).
2) code that hasn't been updated to adjust the timestamp, but is still
in use after the transition period will break things.
Counter: nature of any modification of this sort, frankly the gains
outweight the costs of users being rediculously out of date. Not
saying it's perfect, but until someone comes up with a proposal that
versions every PMS component (meaning PMS has to start documenting
the VDB), it's what we have if we wish to move forward in
refactoring.
3) the correct approach is to require users to tell each manager that
changes have occured outside it's purview (run paludis
--regenerate-installed-cache after every time you invoke pmerge or
emerge).
Counter: that's rather unfriendly to users, and isn't what
pkgcore/portage do. Further, it's historically the opposite of the
norm- consider the ebuild cache (we do validation as we go there,
instead of expecting users to do a emerge --regen everytime they
modify an ebuild).
That's roughly the three points raised; there is some minor quibbling
that mtime cannot be trusted, but that's mostly a variation of #2.
This looks to me like a good idea. I see some of it at least has been
implemented in portage and I would suspect in