Re: [RFC] Interaction between strip and caches

2021-02-27 Thread Joerg Sonnenberger
On Fri, Feb 26, 2021 at 10:52:52PM -0500, Augie Fackler wrote:
> 
> 
> > On Dec 14, 2020, at 5:03 PM, Joerg Sonnenberger  wrote:
> > 
> > Hello all,
> > while looking at the revbranchcache, I noticed that it is doing quite an
> > expensive probalistic invalidation dance. It is essentially looking up
> > the revision in the changelog again and compares the first 32bit to see
> > if they (still) match. Other caches are doing cheaper checks like
> > remembering the head revision and node and checking it again to match.
> > The goal is in all cases to detect one of two cases:
> > 
> > (1) Repository additions by a hg instance without support for the cache.
> > (2) Repository removals by strip without update support specific to the
> > cache in use.
> > 
> > The first part is generally handled reasonable well and cheap. Keep
> > track of the number of revisions and process to all missing changesets
> > is something code has to support anyway. The real difficult problem is
> > the second part. I would like us to adopt a more explicit way of dealing
> > with this and opt-in support via a repository requirement. Given that
> > the strip command has finally become part of core, it looks like a good
> > time to do this now.
> > 
> > The first option is to require strip to nuke all caches that it can't
> > update. This is easy to implement and works reliable by nature with all
> > existing caches. It is also the more blunt option.
> 
> Won’t the caches invalidate themselves an this defect happens today?

Only if the cache implementation hooks into strip and is active at the
time. As mentioned at the start, it is expensive and complex. I'd say
80% of the complexity of the new .hgtags cache version I am working on
is dealing with the current cache invalidation.

> > The second option is to keep a journal of strips. This can be a single
> > monotonically increasing counter and every cache just reads the counter
> > and rebuilds itself. Alternatively it could be a full journal that lists
> > the revisions and associated nodes removed. This requires changes to
> > existing caches but has the advantage that strip can be replayed by the
> > cache logic to avoid a full rebuild.
> 
> Potentially complicated, but could be worthwhile in a large repo with
> strips. Is that something you expect to encounter? For the most part
> we’ve historically considered strip an anti-pattern of sorts and not
> worried super hard about optimizing it.

My hope is that if we can handle additions by non-cache-aware clients as
we do now, it is good enough. Replaying changes is moderately cheap if
we don't have to deal with strip.

There is also the related issue of cache invalidation for obsstore, but
the same concerns apply -- replaying changes is easy as long as we don't
have to handle removal of entries.

Joerg
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [RFC] Interaction between strip and caches

2021-02-26 Thread Augie Fackler


> On Dec 14, 2020, at 5:03 PM, Joerg Sonnenberger  wrote:
> 
> Hello all,
> while looking at the revbranchcache, I noticed that it is doing quite an
> expensive probalistic invalidation dance. It is essentially looking up
> the revision in the changelog again and compares the first 32bit to see
> if they (still) match. Other caches are doing cheaper checks like
> remembering the head revision and node and checking it again to match.
> The goal is in all cases to detect one of two cases:
> 
> (1) Repository additions by a hg instance without support for the cache.
> (2) Repository removals by strip without update support specific to the
> cache in use.
> 
> The first part is generally handled reasonable well and cheap. Keep
> track of the number of revisions and process to all missing changesets
> is something code has to support anyway. The real difficult problem is
> the second part. I would like us to adopt a more explicit way of dealing
> with this and opt-in support via a repository requirement. Given that
> the strip command has finally become part of core, it looks like a good
> time to do this now.
> 
> The first option is to require strip to nuke all caches that it can't
> update. This is easy to implement and works reliable by nature with all
> existing caches. It is also the more blunt option.

Won’t the caches invalidate themselves an this defect happens today?

> The second option is to keep a journal of strips. This can be a single
> monotonically increasing counter and every cache just reads the counter
> and rebuilds itself. Alternatively it could be a full journal that lists
> the revisions and associated nodes removed. This requires changes to
> existing caches but has the advantage that strip can be replayed by the
> cache logic to avoid a full rebuild.

Potentially complicated, but could be worthwhile in a large repo with strips. 
Is that something you expect to encounter? For the most part we’ve historically 
considered strip an anti-pattern of sorts and not worried super hard about 
optimizing it.

> 
> Joerg
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[RFC] Interaction between strip and caches

2020-12-14 Thread Joerg Sonnenberger
Hello all,
while looking at the revbranchcache, I noticed that it is doing quite an
expensive probalistic invalidation dance. It is essentially looking up
the revision in the changelog again and compares the first 32bit to see
if they (still) match. Other caches are doing cheaper checks like
remembering the head revision and node and checking it again to match.
The goal is in all cases to detect one of two cases:

(1) Repository additions by a hg instance without support for the cache.
(2) Repository removals by strip without update support specific to the
cache in use.

The first part is generally handled reasonable well and cheap. Keep
track of the number of revisions and process to all missing changesets
is something code has to support anyway. The real difficult problem is
the second part. I would like us to adopt a more explicit way of dealing
with this and opt-in support via a repository requirement. Given that
the strip command has finally become part of core, it looks like a good
time to do this now.

The first option is to require strip to nuke all caches that it can't
update. This is easy to implement and works reliable by nature with all
existing caches. It is also the more blunt option.

The second option is to keep a journal of strips. This can be a single
monotonically increasing counter and every cache just reads the counter
and rebuilds itself. Alternatively it could be a full journal that lists
the revisions and associated nodes removed. This requires changes to
existing caches but has the advantage that strip can be replayed by the
cache logic to avoid a full rebuild.

Joerg
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel