Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo

2016-10-28 Thread Pierre-Yves David



On 10/27/2016 07:55 PM, Durham Goode wrote:



On 10/27/16 5:22 AM, Yuya Nishihara wrote:

On Wed, 26 Oct 2016 09:42:09 -0700, Durham Goode wrote:

On 10/26/16 5:54 AM, Yuya Nishihara wrote:

On Tue, 25 Oct 2016 12:46:58 -0700, Durham Goode wrote:

On 10/22/16 1:59 AM, Yuya Nishihara wrote:

On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote:

# HG changeset patch
# User Durham Goode 
# Date 1476837866 25200
#  Tue Oct 18 17:44:26 2016 -0700
# Branch stable
# Node ID 3efece5c59853908d65de53635488361dbf20c49
# Parent  ed607426a3ff4deda8c7f2de8b86d5b6ca976d67
manifest: make manifestctx store the repo

The old manifestctx stored a reference to the revlog. If the
inmemory revlog
became invalid, the ctx now held an old copy and would be
incorrect. To fix
this, we need the ctx to go through the manifestlog for each access.

This is the same pattern that changectx already uses (it stores
the repo, and
accesses commit data through self._repo.changelog).

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -1276,7 +1276,7 @@ class manifestlog(object):
if self._treeinmem:
m = treemanifestctx(self._revlog, '', node)
else:
-m = manifestctx(self._revlog, node)
+m = manifestctx(self._repo, node)
if node != revlog.nullid:
self._mancache[node] = m

This will create a reference cycle, but I don't know if the
situation gets
better or worse after this patch.

 repo -> manifestlog -> _mancache -> manifestctx -> repo

Is _mancache valid after the manifestlog is recreated?

_mancache is not really valid after manifestlog is recreated, since it
may contain manifest entries that no longer exist in the file. Even if
it didn't contain bad entries, the manifestctx itself needs up-to-date
access to the manifest revlog, which is only available through the
repo
object (since the repo object's property is what handles manifest
revlog
cache checking).

Perhaps I miss the point. Do we have stale copies of manifestlog and
manifestctx
somewhere? My understanding is we've moved the @storecache to
manifestlog() at
3c8811efdddc, so manifestlog, _revlog (and _mancache) should be
accessible only
when they are valid, and manifestlog doesn't live longer than
repo.manifest().

Ah, yes manifestctx's may be held for longer than the manifestlog lives,
by code that is out of our control.  I'm not aware of specific cases
where this is happening, but it could.

Got it, thanks.

It seems manifestctx has slightly different guarantee about its
lifetime from
changectx. changectx is effectively a snapshot when repo[changeid] is
called
(more precisely, when propertycached function is called.)

changectx seems the same to me.  From looking at the changectx code, it
looks like it doesn't actually load any of the commit data until the
first property is accessed (i.e. changectx._changectx looks in the
changelog for data).  So if someone held on to the ctx for a bit, and
accessed ctx._changectx after the changelog had been mutated, it's
important that it goes through repo.changelog so it gets the most
up-to-date version.


The changectx seems to be using the revision index to access data. So 
I'm not sure in what kind of scenario would make a reloading of the 
mutated changelog useful. Am I missing something ? Can you elaborate on 
these case?



--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo

2016-10-27 Thread Durham Goode



On 10/27/16 5:22 AM, Yuya Nishihara wrote:

On Wed, 26 Oct 2016 09:42:09 -0700, Durham Goode wrote:

On 10/26/16 5:54 AM, Yuya Nishihara wrote:

On Tue, 25 Oct 2016 12:46:58 -0700, Durham Goode wrote:

On 10/22/16 1:59 AM, Yuya Nishihara wrote:

On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote:

# HG changeset patch
# User Durham Goode 
# Date 1476837866 25200
#  Tue Oct 18 17:44:26 2016 -0700
# Branch stable
# Node ID 3efece5c59853908d65de53635488361dbf20c49
# Parent  ed607426a3ff4deda8c7f2de8b86d5b6ca976d67
manifest: make manifestctx store the repo

The old manifestctx stored a reference to the revlog. If the inmemory revlog
became invalid, the ctx now held an old copy and would be incorrect. To fix
this, we need the ctx to go through the manifestlog for each access.

This is the same pattern that changectx already uses (it stores the repo, and
accesses commit data through self._repo.changelog).

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -1276,7 +1276,7 @@ class manifestlog(object):
if self._treeinmem:
m = treemanifestctx(self._revlog, '', node)
else:
-m = manifestctx(self._revlog, node)
+m = manifestctx(self._repo, node)
if node != revlog.nullid:
self._mancache[node] = m

This will create a reference cycle, but I don't know if the situation gets
better or worse after this patch.

 repo -> manifestlog -> _mancache -> manifestctx -> repo

Is _mancache valid after the manifestlog is recreated?

_mancache is not really valid after manifestlog is recreated, since it
may contain manifest entries that no longer exist in the file. Even if
it didn't contain bad entries, the manifestctx itself needs up-to-date
access to the manifest revlog, which is only available through the repo
object (since the repo object's property is what handles manifest revlog
cache checking).

Perhaps I miss the point. Do we have stale copies of manifestlog and manifestctx
somewhere? My understanding is we've moved the @storecache to manifestlog() at
3c8811efdddc, so manifestlog, _revlog (and _mancache) should be accessible only
when they are valid, and manifestlog doesn't live longer than repo.manifest().

Ah, yes manifestctx's may be held for longer than the manifestlog lives,
by code that is out of our control.  I'm not aware of specific cases
where this is happening, but it could.

Got it, thanks.

It seems manifestctx has slightly different guarantee about its lifetime from
changectx. changectx is effectively a snapshot when repo[changeid] is called
(more precisely, when propertycached function is called.)
changectx seems the same to me.  From looking at the changectx code, it 
looks like it doesn't actually load any of the commit data until the 
first property is accessed (i.e. changectx._changectx looks in the 
changelog for data).  So if someone held on to the ctx for a bit, and 
accessed ctx._changectx after the changelog had been mutated, it's 
important that it goes through repo.changelog so it gets the most 
up-to-date version.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo

2016-10-27 Thread Yuya Nishihara
On Wed, 26 Oct 2016 09:42:09 -0700, Durham Goode wrote:
> On 10/26/16 5:54 AM, Yuya Nishihara wrote:
> > On Tue, 25 Oct 2016 12:46:58 -0700, Durham Goode wrote:
> >> On 10/22/16 1:59 AM, Yuya Nishihara wrote:
> >>> On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote:
>  # HG changeset patch
>  # User Durham Goode 
>  # Date 1476837866 25200
>  #  Tue Oct 18 17:44:26 2016 -0700
>  # Branch stable
>  # Node ID 3efece5c59853908d65de53635488361dbf20c49
>  # Parent  ed607426a3ff4deda8c7f2de8b86d5b6ca976d67
>  manifest: make manifestctx store the repo
> 
>  The old manifestctx stored a reference to the revlog. If the inmemory 
>  revlog
>  became invalid, the ctx now held an old copy and would be incorrect. To 
>  fix
>  this, we need the ctx to go through the manifestlog for each access.
> 
>  This is the same pattern that changectx already uses (it stores the 
>  repo, and
>  accesses commit data through self._repo.changelog).
> 
>  diff --git a/mercurial/manifest.py b/mercurial/manifest.py
>  --- a/mercurial/manifest.py
>  +++ b/mercurial/manifest.py
>  @@ -1276,7 +1276,7 @@ class manifestlog(object):
> if self._treeinmem:
> m = treemanifestctx(self._revlog, '', node)
> else:
>  -m = manifestctx(self._revlog, node)
>  +m = manifestctx(self._repo, node)
> if node != revlog.nullid:
> self._mancache[node] = m
> >>> This will create a reference cycle, but I don't know if the situation gets
> >>> better or worse after this patch.
> >>>
> >>> repo -> manifestlog -> _mancache -> manifestctx -> repo
> >>>
> >>> Is _mancache valid after the manifestlog is recreated?
> >> _mancache is not really valid after manifestlog is recreated, since it
> >> may contain manifest entries that no longer exist in the file. Even if
> >> it didn't contain bad entries, the manifestctx itself needs up-to-date
> >> access to the manifest revlog, which is only available through the repo
> >> object (since the repo object's property is what handles manifest revlog
> >> cache checking).
> > Perhaps I miss the point. Do we have stale copies of manifestlog and 
> > manifestctx
> > somewhere? My understanding is we've moved the @storecache to manifestlog() 
> > at
> > 3c8811efdddc, so manifestlog, _revlog (and _mancache) should be accessible 
> > only
> > when they are valid, and manifestlog doesn't live longer than 
> > repo.manifest().
> Ah, yes manifestctx's may be held for longer than the manifestlog lives, 
> by code that is out of our control.  I'm not aware of specific cases 
> where this is happening, but it could.

Got it, thanks.

It seems manifestctx has slightly different guarantee about its lifetime from
changectx. changectx is effectively a snapshot when repo[changeid] is called
(more precisely, when propertycached function is called.)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo

2016-10-26 Thread Yuya Nishihara
On Tue, 25 Oct 2016 12:46:58 -0700, Durham Goode wrote:
> On 10/22/16 1:59 AM, Yuya Nishihara wrote:
> > On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode 
> >> # Date 1476837866 25200
> >> #  Tue Oct 18 17:44:26 2016 -0700
> >> # Branch stable
> >> # Node ID 3efece5c59853908d65de53635488361dbf20c49
> >> # Parent  ed607426a3ff4deda8c7f2de8b86d5b6ca976d67
> >> manifest: make manifestctx store the repo
> >>
> >> The old manifestctx stored a reference to the revlog. If the inmemory 
> >> revlog
> >> became invalid, the ctx now held an old copy and would be incorrect. To fix
> >> this, we need the ctx to go through the manifestlog for each access.
> >>
> >> This is the same pattern that changectx already uses (it stores the repo, 
> >> and
> >> accesses commit data through self._repo.changelog).
> >>
> >> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> >> --- a/mercurial/manifest.py
> >> +++ b/mercurial/manifest.py
> >> @@ -1276,7 +1276,7 @@ class manifestlog(object):
> >>   if self._treeinmem:
> >>   m = treemanifestctx(self._revlog, '', node)
> >>   else:
> >> -m = manifestctx(self._revlog, node)
> >> +m = manifestctx(self._repo, node)
> >>   if node != revlog.nullid:
> >>   self._mancache[node] = m
> > This will create a reference cycle, but I don't know if the situation gets
> > better or worse after this patch.
> >
> >repo -> manifestlog -> _mancache -> manifestctx -> repo
> >
> > Is _mancache valid after the manifestlog is recreated?
> _mancache is not really valid after manifestlog is recreated, since it 
> may contain manifest entries that no longer exist in the file. Even if 
> it didn't contain bad entries, the manifestctx itself needs up-to-date 
> access to the manifest revlog, which is only available through the repo 
> object (since the repo object's property is what handles manifest revlog 
> cache checking).

Perhaps I miss the point. Do we have stale copies of manifestlog and manifestctx
somewhere? My understanding is we've moved the @storecache to manifestlog() at
3c8811efdddc, so manifestlog, _revlog (and _mancache) should be accessible only
when they are valid, and manifestlog doesn't live longer than repo.manifest().
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo

2016-10-25 Thread Durham Goode



On 10/22/16 1:59 AM, Yuya Nishihara wrote:

On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote:

# HG changeset patch
# User Durham Goode 
# Date 1476837866 25200
#  Tue Oct 18 17:44:26 2016 -0700
# Branch stable
# Node ID 3efece5c59853908d65de53635488361dbf20c49
# Parent  ed607426a3ff4deda8c7f2de8b86d5b6ca976d67
manifest: make manifestctx store the repo

The old manifestctx stored a reference to the revlog. If the inmemory revlog
became invalid, the ctx now held an old copy and would be incorrect. To fix
this, we need the ctx to go through the manifestlog for each access.

This is the same pattern that changectx already uses (it stores the repo, and
accesses commit data through self._repo.changelog).

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -1276,7 +1276,7 @@ class manifestlog(object):
  if self._treeinmem:
  m = treemanifestctx(self._revlog, '', node)
  else:
-m = manifestctx(self._revlog, node)
+m = manifestctx(self._repo, node)
  if node != revlog.nullid:
  self._mancache[node] = m

This will create a reference cycle, but I don't know if the situation gets
better or worse after this patch.

   repo -> manifestlog -> _mancache -> manifestctx -> repo

Is _mancache valid after the manifestlog is recreated?
_mancache is not really valid after manifestlog is recreated, since it 
may contain manifest entries that no longer exist in the file. Even if 
it didn't contain bad entries, the manifestctx itself needs up-to-date 
access to the manifest revlog, which is only available through the repo 
object (since the repo object's property is what handles manifest revlog 
cache checking).

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo

2016-10-22 Thread Yuya Nishihara
On Tue, 18 Oct 2016 17:50:16 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode 
> # Date 1476837866 25200
> #  Tue Oct 18 17:44:26 2016 -0700
> # Branch stable
> # Node ID 3efece5c59853908d65de53635488361dbf20c49
> # Parent  ed607426a3ff4deda8c7f2de8b86d5b6ca976d67
> manifest: make manifestctx store the repo
> 
> The old manifestctx stored a reference to the revlog. If the inmemory revlog
> became invalid, the ctx now held an old copy and would be incorrect. To fix
> this, we need the ctx to go through the manifestlog for each access.
> 
> This is the same pattern that changectx already uses (it stores the repo, and
> accesses commit data through self._repo.changelog).
> 
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -1276,7 +1276,7 @@ class manifestlog(object):
>  if self._treeinmem:
>  m = treemanifestctx(self._revlog, '', node)
>  else:
> -m = manifestctx(self._revlog, node)
> +m = manifestctx(self._repo, node)
>  if node != revlog.nullid:
>  self._mancache[node] = m

This will create a reference cycle, but I don't know if the situation gets
better or worse after this patch.

  repo -> manifestlog -> _mancache -> manifestctx -> repo

Is _mancache valid after the manifestlog is recreated?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 4 STABLE] manifest: make manifestctx store the repo

2016-10-18 Thread Durham Goode
# HG changeset patch
# User Durham Goode 
# Date 1476837866 25200
#  Tue Oct 18 17:44:26 2016 -0700
# Branch stable
# Node ID 3efece5c59853908d65de53635488361dbf20c49
# Parent  ed607426a3ff4deda8c7f2de8b86d5b6ca976d67
manifest: make manifestctx store the repo

The old manifestctx stored a reference to the revlog. If the inmemory revlog
became invalid, the ctx now held an old copy and would be incorrect. To fix
this, we need the ctx to go through the manifestlog for each access.

This is the same pattern that changectx already uses (it stores the repo, and
accesses commit data through self._repo.changelog).

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -1276,7 +1276,7 @@ class manifestlog(object):
 if self._treeinmem:
 m = treemanifestctx(self._revlog, '', node)
 else:
-m = manifestctx(self._revlog, node)
+m = manifestctx(self._repo, node)
 if node != revlog.nullid:
 self._mancache[node] = m
 return m
@@ -1288,8 +1288,8 @@ class manifestctx(object):
 """A class representing a single revision of a manifest, including its
 contents, its parent revs, and its linkrev.
 """
-def __init__(self, revlog, node):
-self._revlog = revlog
+def __init__(self, repo, node):
+self._repo = repo
 self._data = None
 
 self._node = node
@@ -1309,14 +1309,15 @@ class manifestctx(object):
 if self._node == revlog.nullid:
 self._data = manifestdict()
 else:
-text = self._revlog.revision(self._node)
+rl = self._repo.manifestlog._revlog
+text = rl.revision(self._node)
 arraytext = array.array('c', text)
-self._revlog._fulltextcache[self._node] = arraytext
+rl._fulltextcache[self._node] = arraytext
 self._data = manifestdict(text)
 return self._data
 
 def readfast(self):
-rl = self._revlog
+rl = self._repo.manifestlog._revlog
 r = rl.rev(self._node)
 deltaparent = rl.deltaparent(r)
 if deltaparent != revlog.nullrev and deltaparent in rl.parentrevs(r):
@@ -1324,11 +1325,11 @@ class manifestctx(object):
 return self.read()
 
 def readdelta(self):
-revlog = self._revlog
+revlog = self._repo.manifestlog._revlog
 if revlog._usemanifestv2:
 # Need to perform a slow delta
 r0 = revlog.deltaparent(revlog.rev(self._node))
-m0 = manifestctx(revlog, revlog.node(r0)).read()
+m0 = manifestctx(self._repo, revlog.node(r0)).read()
 m1 = self.read()
 md = manifestdict()
 for f, ((n0, fl0), (n1, fl1)) in m0.diff(m1).iteritems():
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel