Re: [gentoo-dev] adding a modification timestamp to the installed pkgs database (vdb)

2010-01-17 Thread Ciaran McCreesh
2010/1/12 Brian Harring ferri...@gmail.com:
 There's no discussion because Brian refuses to address any comments
 on the proposal and just says we should do it anyway, and if you
 want it done properly instead, do it yourself.

 This is a bit of bullshit, per the norm.  There is plenty of
 discussion- the problem is you don't like the direction it's gone.
 You want a whole new vdb- I don't oppose that.  However I'm not
 interested in trying to standardize a new vdb format into PMS, at
 least not yet.

No, I want a decent cache proposal that lets package managers know
what's changed, not one that sometimes (but not always) might let
package managers know when some things have changed, but not what's
changed and not what they can still assume.

 Your argument can basically be summed up as don't do the minimal
 tweak, do the whole new vdb with defined caches that all can share.

No, I want the well defined caches that all can share.

 The daft thing about this is that you're ignoring one core transition
 issue w/ vdb2- if someone did create a vdb2, they still would need a
 synchronization mechanism (one quite similar to what I'm proposing).

If you replace VDB, you need a well defined cache mechanism. So let's
do that bit now.

 1) portage/pkgcore support the PMS defined vdb2 while paludis doesn't
 2) portage/pkgcore are invoked modifying the livefs; vdb1, vdb2 is
 updated.
 3) paludis is invoked.  vdb1 is updated, vdb2 is not
 4) portage and pkgcore now cannot rely upon vdb2, since vdb1 now
 contains extra modifications due to paludis not supporting vdb2.

No, we'd not do it that way. If we're ditching VDB, the only sane way
to do it is to ditch it with an rm -fr when creating the new layout.
Keeping two sets of data around is going to lead to breakage no matter
how well we do things.

 Summarizing; the synchronization primitive is needed for any future
 vdb2

No. A *proper* cache validation mechanism is needed. What you're
suggesting isn't enough to use for anything at all.

 Summing it up; what ciaran wants is reliant on what I'm proposing,

No, what I want in the long term is reliant upon implementing a decent
cache setup in the short term.

-- 
Ciaran McCreesh


signature.asc
Description: PGP signature


Re: [gentoo-dev] adding a modification timestamp to the installed pkgs database (vdb)

2010-01-17 Thread Ciaran McCreesh
2010/1/17 Tobias Klausmann klaus...@gentoo.org:
 No, we'd not do it that way. If we're ditching VDB, the only sane way
 to do it is to ditch it with an rm -fr when creating the new layout.
 Keeping two sets of data around is going to lead to breakage no matter
 how well we do things.

 Please also provide a downgrade path, i.e. a way to go back from
 the new DB version to the current one should it be necessary (if
 there is no such path, Murphy will see to it that the new format
 breaks in interesting[0] ways).

That probably wouldn't be possible. One of the reasons we want to
ditch VDB is to allow multiple slots of the same cat/pkg-ver to be
installed in parallel (which is in turn necessary to allow some of the
more hideous dynamic slot abuses that people are after). VDB doesn't
support that, so you probably won't be able to go back once you've
started using new features.

*shrug* all of this is years off, anyway. It's at least EAPI 5
territory. We can work all this out later if EAPI 4 ever happens.

-- 
Ciaran McCreesh



Re: [gentoo-dev] adding a modification timestamp to the installed pkgs database (vdb)

2010-01-12 Thread Ciaran McCreesh
On Mon, 11 Jan 2010 15:35:51 -0700
Denis Dupeyron calc...@gentoo.org wrote:
 I'm a bit surprised by the low amount of discussions this topic has
 generated.

There's no discussion because Brian refuses to address any comments on
the proposal and just says we should do it anyway, and if you want it
done properly instead, do it yourself.

-- 
Ciaran McCreesh


signature.asc
Description: PGP signature


Re: [gentoo-dev] adding a modification timestamp to the installed pkgs database (vdb)

2010-01-11 Thread Denis Dupeyron
Brian,

On Sun, Oct 25, 2009 at 6:50 PM, Brian Harring ferri...@gmail.com wrote:
 The proposal is pretty simple; if code modifies the vdb in any
 fashion, it needs to update the mtime on a file named
 '.modification_time' in the root of the vdb.

 For example-

 1) ${PACKAGE_MANAGER} fires ups, builds a pkg.  it's now ready to
 install it.
 2) this step isn't strictly required, but is a zero cost safety
 measure- prior to modifying the vdb, it updates the timestamp.  The
 reason for doing this is to protect against the manager blowing up in
 some fashion and now updating the timestamp- there still is a window
 if the manager breaks down during merging but it's far reduced.
 3) manager does it's thing to the livefs, and to the vdb.
 4) once finished, again, updates the timestamp.

 This isn't an incredibly complex change.  What it enables however is
 package managers to get serious about optimizing access to the vdb.
 For example for the 3 managers:

 paludis:
  installed-cache currently needs to be manually ran by the user;
 specifically, the user is responsible for regenerating this cache if
 they use a non paludis manager to modify the VDB.  This can be
 automated via checking the vdb timestamp against a stored copy of the
 the vdb timestamp at the time of the cache generation.

 portage:
  portage maintains a set of denormalized caches of the vdb- it however
 has to do validation of those caches on each access, meaning quite a
 few stats.  Same thing, can compare timestamp from current vdb to when
 it was generated to identify if it is no longer authorative.

 pkgcore:
  pkgcore maintains a denormalized old style virtuals cache- same thing
 w/ portage, it has to do validation (stat'ing) whenever it uses that
 cache to ensure the data is accurate.  Same thing, can compare
 timestamp from current vdb to whenit was generated to identify if it
 is no longer authorative.

 The existing vdb caching could all be modified to use this timestamp.
 One stat in the best (common) case, instead of having to either scan
 the whole vdb each time or doing a subset of stats.

 This change enables further caching/denormalization of the vdb data
 while maintaining the old format- basically, it allows the manager to
 build out a helluva lot faster access to the vdb while keeping on
 disk compatibility in /var/db/pkg.


 Now unfortunately since the vdb is not format versioned in any
 fashion, to get this timestamp we have to do the following-

 1) nudge everyone who has code poking into the vdb to update their
 code to update the timestamp
 2) sit on our hands for N months until such time we've deemed
 everyone we care about has upgraded
 3) push out a new release, and start pushing out versions of the
 managers/vdb consumers that use this timestamp instead of just
 updating it.

 For anyone who has been around gentoo for a couple of years, this is a
 pretty familiar pattern- eapi, profile changes, etc, all go through
 this unfortunately.


 That's the core of the proposal; there is a ticket open
 ( http://bugs.gentoo.org/290428 ) regarding this although there is
 some debate from ciaran which I'll try to now summarize, along w/ the
 counterarguments.

 1) do a new vdb.
 Counter: this mechanism provides a way to synchronize the new vdb
 while maintaining the old during it's transition period, so this is
 needed anyways.  Further, pinning all of our optimization hopes on a
 new vdb is daft- it's been discussed for 5+ years now and still
 hasn't materialized (pkgcore has been able to have a new vdb for
 several years, but without a synchronization mechanism it would
 require locking users into the new format and locking out old
 consumers of the vdb- an unfriendly choice to push on users, hence
 never being implemented).

 2) code that hasn't been updated to adjust the timestamp, but is still
 in use after the transition period will break things.
  Counter: nature of any modification of this sort, frankly the gains
 outweight the costs of users being rediculously out of date.  Not
 saying it's perfect, but until someone comes up with a proposal that
 versions every PMS component (meaning PMS has to start documenting
 the VDB), it's what we have if we wish to move forward in
 refactoring.

 3) the correct approach is to require users to tell each manager that
 changes have occured outside it's purview (run paludis
 --regenerate-installed-cache after every time you invoke pmerge or
 emerge).
  Counter: that's rather unfriendly to users, and isn't what
 pkgcore/portage do.  Further, it's historically the opposite of the
 norm- consider the ebuild cache (we do validation as we go there,
 instead of expecting users to do a emerge --regen everytime they
 modify an ebuild).


 That's roughly the three points raised; there is some minor quibbling
 that mtime cannot be trusted, but that's mostly a variation of #2.

This looks to me like a good idea. I see some of it at least has been
implemented in portage and I would suspect in