Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo
On 10/27/2016 07:55 PM, Durham Goode wrote: On 10/27/16 5:22 AM, Yuya Nishihara wrote: On Wed, 26 Oct 2016 09:42:09 -0700, Durham Goode wrote: On 10/26/16 5:54 AM, Yuya Nishihara wrote: On Tue, 25 Oct 2016 12:46:58 -0700, Durham Goode wrote: On 10/22/16 1:59 AM, Yuya Nishihara wrote: On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote: # HG changeset patch # User Durham Goode# Date 1476837866 25200 # Tue Oct 18 17:44:26 2016 -0700 # Branch stable # Node ID 3efece5c59853908d65de53635488361dbf20c49 # Parent ed607426a3ff4deda8c7f2de8b86d5b6ca976d67 manifest: make manifestctx store the repo The old manifestctx stored a reference to the revlog. If the inmemory revlog became invalid, the ctx now held an old copy and would be incorrect. To fix this, we need the ctx to go through the manifestlog for each access. This is the same pattern that changectx already uses (it stores the repo, and accesses commit data through self._repo.changelog). diff --git a/mercurial/manifest.py b/mercurial/manifest.py --- a/mercurial/manifest.py +++ b/mercurial/manifest.py @@ -1276,7 +1276,7 @@ class manifestlog(object): if self._treeinmem: m = treemanifestctx(self._revlog, '', node) else: -m = manifestctx(self._revlog, node) +m = manifestctx(self._repo, node) if node != revlog.nullid: self._mancache[node] = m This will create a reference cycle, but I don't know if the situation gets better or worse after this patch. repo -> manifestlog -> _mancache -> manifestctx -> repo Is _mancache valid after the manifestlog is recreated? _mancache is not really valid after manifestlog is recreated, since it may contain manifest entries that no longer exist in the file. Even if it didn't contain bad entries, the manifestctx itself needs up-to-date access to the manifest revlog, which is only available through the repo object (since the repo object's property is what handles manifest revlog cache checking). Perhaps I miss the point. Do we have stale copies of manifestlog and manifestctx somewhere? My understanding is we've moved the @storecache to manifestlog() at 3c8811efdddc, so manifestlog, _revlog (and _mancache) should be accessible only when they are valid, and manifestlog doesn't live longer than repo.manifest(). Ah, yes manifestctx's may be held for longer than the manifestlog lives, by code that is out of our control. I'm not aware of specific cases where this is happening, but it could. Got it, thanks. It seems manifestctx has slightly different guarantee about its lifetime from changectx. changectx is effectively a snapshot when repo[changeid] is called (more precisely, when propertycached function is called.) changectx seems the same to me. From looking at the changectx code, it looks like it doesn't actually load any of the commit data until the first property is accessed (i.e. changectx._changectx looks in the changelog for data). So if someone held on to the ctx for a bit, and accessed ctx._changectx after the changelog had been mutated, it's important that it goes through repo.changelog so it gets the most up-to-date version. The changectx seems to be using the revision index to access data. So I'm not sure in what kind of scenario would make a reloading of the mutated changelog useful. Am I missing something ? Can you elaborate on these case? -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo
On 10/27/16 5:22 AM, Yuya Nishihara wrote: On Wed, 26 Oct 2016 09:42:09 -0700, Durham Goode wrote: On 10/26/16 5:54 AM, Yuya Nishihara wrote: On Tue, 25 Oct 2016 12:46:58 -0700, Durham Goode wrote: On 10/22/16 1:59 AM, Yuya Nishihara wrote: On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote: # HG changeset patch # User Durham Goode# Date 1476837866 25200 # Tue Oct 18 17:44:26 2016 -0700 # Branch stable # Node ID 3efece5c59853908d65de53635488361dbf20c49 # Parent ed607426a3ff4deda8c7f2de8b86d5b6ca976d67 manifest: make manifestctx store the repo The old manifestctx stored a reference to the revlog. If the inmemory revlog became invalid, the ctx now held an old copy and would be incorrect. To fix this, we need the ctx to go through the manifestlog for each access. This is the same pattern that changectx already uses (it stores the repo, and accesses commit data through self._repo.changelog). diff --git a/mercurial/manifest.py b/mercurial/manifest.py --- a/mercurial/manifest.py +++ b/mercurial/manifest.py @@ -1276,7 +1276,7 @@ class manifestlog(object): if self._treeinmem: m = treemanifestctx(self._revlog, '', node) else: -m = manifestctx(self._revlog, node) +m = manifestctx(self._repo, node) if node != revlog.nullid: self._mancache[node] = m This will create a reference cycle, but I don't know if the situation gets better or worse after this patch. repo -> manifestlog -> _mancache -> manifestctx -> repo Is _mancache valid after the manifestlog is recreated? _mancache is not really valid after manifestlog is recreated, since it may contain manifest entries that no longer exist in the file. Even if it didn't contain bad entries, the manifestctx itself needs up-to-date access to the manifest revlog, which is only available through the repo object (since the repo object's property is what handles manifest revlog cache checking). Perhaps I miss the point. Do we have stale copies of manifestlog and manifestctx somewhere? My understanding is we've moved the @storecache to manifestlog() at 3c8811efdddc, so manifestlog, _revlog (and _mancache) should be accessible only when they are valid, and manifestlog doesn't live longer than repo.manifest(). Ah, yes manifestctx's may be held for longer than the manifestlog lives, by code that is out of our control. I'm not aware of specific cases where this is happening, but it could. Got it, thanks. It seems manifestctx has slightly different guarantee about its lifetime from changectx. changectx is effectively a snapshot when repo[changeid] is called (more precisely, when propertycached function is called.) changectx seems the same to me. From looking at the changectx code, it looks like it doesn't actually load any of the commit data until the first property is accessed (i.e. changectx._changectx looks in the changelog for data). So if someone held on to the ctx for a bit, and accessed ctx._changectx after the changelog had been mutated, it's important that it goes through repo.changelog so it gets the most up-to-date version. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo
On Wed, 26 Oct 2016 09:42:09 -0700, Durham Goode wrote: > On 10/26/16 5:54 AM, Yuya Nishihara wrote: > > On Tue, 25 Oct 2016 12:46:58 -0700, Durham Goode wrote: > >> On 10/22/16 1:59 AM, Yuya Nishihara wrote: > >>> On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote: > # HG changeset patch > # User Durham Goode> # Date 1476837866 25200 > # Tue Oct 18 17:44:26 2016 -0700 > # Branch stable > # Node ID 3efece5c59853908d65de53635488361dbf20c49 > # Parent ed607426a3ff4deda8c7f2de8b86d5b6ca976d67 > manifest: make manifestctx store the repo > > The old manifestctx stored a reference to the revlog. If the inmemory > revlog > became invalid, the ctx now held an old copy and would be incorrect. To > fix > this, we need the ctx to go through the manifestlog for each access. > > This is the same pattern that changectx already uses (it stores the > repo, and > accesses commit data through self._repo.changelog). > > diff --git a/mercurial/manifest.py b/mercurial/manifest.py > --- a/mercurial/manifest.py > +++ b/mercurial/manifest.py > @@ -1276,7 +1276,7 @@ class manifestlog(object): > if self._treeinmem: > m = treemanifestctx(self._revlog, '', node) > else: > -m = manifestctx(self._revlog, node) > +m = manifestctx(self._repo, node) > if node != revlog.nullid: > self._mancache[node] = m > >>> This will create a reference cycle, but I don't know if the situation gets > >>> better or worse after this patch. > >>> > >>> repo -> manifestlog -> _mancache -> manifestctx -> repo > >>> > >>> Is _mancache valid after the manifestlog is recreated? > >> _mancache is not really valid after manifestlog is recreated, since it > >> may contain manifest entries that no longer exist in the file. Even if > >> it didn't contain bad entries, the manifestctx itself needs up-to-date > >> access to the manifest revlog, which is only available through the repo > >> object (since the repo object's property is what handles manifest revlog > >> cache checking). > > Perhaps I miss the point. Do we have stale copies of manifestlog and > > manifestctx > > somewhere? My understanding is we've moved the @storecache to manifestlog() > > at > > 3c8811efdddc, so manifestlog, _revlog (and _mancache) should be accessible > > only > > when they are valid, and manifestlog doesn't live longer than > > repo.manifest(). > Ah, yes manifestctx's may be held for longer than the manifestlog lives, > by code that is out of our control. I'm not aware of specific cases > where this is happening, but it could. Got it, thanks. It seems manifestctx has slightly different guarantee about its lifetime from changectx. changectx is effectively a snapshot when repo[changeid] is called (more precisely, when propertycached function is called.) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo
On Tue, 25 Oct 2016 12:46:58 -0700, Durham Goode wrote: > On 10/22/16 1:59 AM, Yuya Nishihara wrote: > > On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote: > >> # HG changeset patch > >> # User Durham Goode> >> # Date 1476837866 25200 > >> # Tue Oct 18 17:44:26 2016 -0700 > >> # Branch stable > >> # Node ID 3efece5c59853908d65de53635488361dbf20c49 > >> # Parent ed607426a3ff4deda8c7f2de8b86d5b6ca976d67 > >> manifest: make manifestctx store the repo > >> > >> The old manifestctx stored a reference to the revlog. If the inmemory > >> revlog > >> became invalid, the ctx now held an old copy and would be incorrect. To fix > >> this, we need the ctx to go through the manifestlog for each access. > >> > >> This is the same pattern that changectx already uses (it stores the repo, > >> and > >> accesses commit data through self._repo.changelog). > >> > >> diff --git a/mercurial/manifest.py b/mercurial/manifest.py > >> --- a/mercurial/manifest.py > >> +++ b/mercurial/manifest.py > >> @@ -1276,7 +1276,7 @@ class manifestlog(object): > >> if self._treeinmem: > >> m = treemanifestctx(self._revlog, '', node) > >> else: > >> -m = manifestctx(self._revlog, node) > >> +m = manifestctx(self._repo, node) > >> if node != revlog.nullid: > >> self._mancache[node] = m > > This will create a reference cycle, but I don't know if the situation gets > > better or worse after this patch. > > > >repo -> manifestlog -> _mancache -> manifestctx -> repo > > > > Is _mancache valid after the manifestlog is recreated? > _mancache is not really valid after manifestlog is recreated, since it > may contain manifest entries that no longer exist in the file. Even if > it didn't contain bad entries, the manifestctx itself needs up-to-date > access to the manifest revlog, which is only available through the repo > object (since the repo object's property is what handles manifest revlog > cache checking). Perhaps I miss the point. Do we have stale copies of manifestlog and manifestctx somewhere? My understanding is we've moved the @storecache to manifestlog() at 3c8811efdddc, so manifestlog, _revlog (and _mancache) should be accessible only when they are valid, and manifestlog doesn't live longer than repo.manifest(). ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo
On 10/22/16 1:59 AM, Yuya Nishihara wrote: On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote: # HG changeset patch # User Durham Goode# Date 1476837866 25200 # Tue Oct 18 17:44:26 2016 -0700 # Branch stable # Node ID 3efece5c59853908d65de53635488361dbf20c49 # Parent ed607426a3ff4deda8c7f2de8b86d5b6ca976d67 manifest: make manifestctx store the repo The old manifestctx stored a reference to the revlog. If the inmemory revlog became invalid, the ctx now held an old copy and would be incorrect. To fix this, we need the ctx to go through the manifestlog for each access. This is the same pattern that changectx already uses (it stores the repo, and accesses commit data through self._repo.changelog). diff --git a/mercurial/manifest.py b/mercurial/manifest.py --- a/mercurial/manifest.py +++ b/mercurial/manifest.py @@ -1276,7 +1276,7 @@ class manifestlog(object): if self._treeinmem: m = treemanifestctx(self._revlog, '', node) else: -m = manifestctx(self._revlog, node) +m = manifestctx(self._repo, node) if node != revlog.nullid: self._mancache[node] = m This will create a reference cycle, but I don't know if the situation gets better or worse after this patch. repo -> manifestlog -> _mancache -> manifestctx -> repo Is _mancache valid after the manifestlog is recreated? _mancache is not really valid after manifestlog is recreated, since it may contain manifest entries that no longer exist in the file. Even if it didn't contain bad entries, the manifestctx itself needs up-to-date access to the manifest revlog, which is only available through the repo object (since the repo object's property is what handles manifest revlog cache checking). ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo
On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote: > # HG changeset patch > # User Durham Goode> # Date 1476837866 25200 > # Tue Oct 18 17:44:26 2016 -0700 > # Branch stable > # Node ID 3efece5c59853908d65de53635488361dbf20c49 > # Parent ed607426a3ff4deda8c7f2de8b86d5b6ca976d67 > manifest: make manifestctx store the repo > > The old manifestctx stored a reference to the revlog. If the inmemory revlog > became invalid, the ctx now held an old copy and would be incorrect. To fix > this, we need the ctx to go through the manifestlog for each access. > > This is the same pattern that changectx already uses (it stores the repo, and > accesses commit data through self._repo.changelog). > > diff --git a/mercurial/manifest.py b/mercurial/manifest.py > --- a/mercurial/manifest.py > +++ b/mercurial/manifest.py > @@ -1276,7 +1276,7 @@ class manifestlog(object): > if self._treeinmem: > m = treemanifestctx(self._revlog, '', node) > else: > -m = manifestctx(self._revlog, node) > +m = manifestctx(self._repo, node) > if node != revlog.nullid: > self._mancache[node] = m This will create a reference cycle, but I don't know if the situation gets better or worse after this patch. repo -> manifestlog -> _mancache -> manifestctx -> repo Is _mancache valid after the manifestlog is recreated? ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo
# HG changeset patch # User Durham Goode# Date 1476837866 25200 # Tue Oct 18 17:44:26 2016 -0700 # Branch stable # Node ID 3efece5c59853908d65de53635488361dbf20c49 # Parent ed607426a3ff4deda8c7f2de8b86d5b6ca976d67 manifest: make manifestctx store the repo The old manifestctx stored a reference to the revlog. If the inmemory revlog became invalid, the ctx now held an old copy and would be incorrect. To fix this, we need the ctx to go through the manifestlog for each access. This is the same pattern that changectx already uses (it stores the repo, and accesses commit data through self._repo.changelog). diff --git a/mercurial/manifest.py b/mercurial/manifest.py --- a/mercurial/manifest.py +++ b/mercurial/manifest.py @@ -1276,7 +1276,7 @@ class manifestlog(object): if self._treeinmem: m = treemanifestctx(self._revlog, '', node) else: -m = manifestctx(self._revlog, node) +m = manifestctx(self._repo, node) if node != revlog.nullid: self._mancache[node] = m return m @@ -1288,8 +1288,8 @@ class manifestctx(object): """A class representing a single revision of a manifest, including its contents, its parent revs, and its linkrev. """ -def __init__(self, revlog, node): -self._revlog = revlog +def __init__(self, repo, node): +self._repo = repo self._data = None self._node = node @@ -1309,14 +1309,15 @@ class manifestctx(object): if self._node == revlog.nullid: self._data = manifestdict() else: -text = self._revlog.revision(self._node) +rl = self._repo.manifestlog._revlog +text = rl.revision(self._node) arraytext = array.array('c', text) -self._revlog._fulltextcache[self._node] = arraytext +rl._fulltextcache[self._node] = arraytext self._data = manifestdict(text) return self._data def readfast(self): -rl = self._revlog +rl = self._repo.manifestlog._revlog r = rl.rev(self._node) deltaparent = rl.deltaparent(r) if deltaparent != revlog.nullrev and deltaparent in rl.parentrevs(r): @@ -1324,11 +1325,11 @@ class manifestctx(object): return self.read() def readdelta(self): -revlog = self._revlog +revlog = self._repo.manifestlog._revlog if revlog._usemanifestv2: # Need to perform a slow delta r0 = revlog.deltaparent(revlog.rev(self._node)) -m0 = manifestctx(revlog, revlog.node(r0)).read() +m0 = manifestctx(self._repo, revlog.node(r0)).read() m1 = self.read() md = manifestdict() for f, ((n0, fl0), (n1, fl1)) in m0.diff(m1).iteritems(): ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel