Re: [PATCH 4 of 7] manifest: use specific opener to avoid file stat ambiguity around truncation
On 09/21/2016 08:57 AM, FUJIWARA Katsunori wrote: At Tue, 20 Sep 2016 18:14:32 +0200, Pierre-Yves David wrote: On 09/16/2016 10:51 PM, FUJIWARA Katsunori wrote: # HG changeset patch # User FUJIWARA Katsunori# Date 1474057496 -32400 # Sat Sep 17 05:24:56 2016 +0900 # Node ID 55bd90d63ad3a76fd4d27d5eabb3fe9a7fa0642b # Parent 71b6b49f8a7ab6c894028b9153290f4bbf0f54f6 manifest: use specific opener to avoid file stat ambiguity around truncation If steps below occurs at "the same time in sec", all of mtime, ctime and size are same between (1) and (3). 1. append data to 00manifest.i (and close transaction) 2. discard appended data by truncation (strip or rollback) 3. append same size but different data to 00manifest.i again Therefore, cache validation doesn't work after (3) as expected. To avoid such file stat ambiguity around truncation, this patch uses specific opener, which forces checkambig=True at writing 00manifest.i changes out. Hooking vfs.__call__() is the only way to centralize changes into manifest.py, because all logic to actually write in-memory changes out is implemented in revlog layer. In fact, reusing already wrapped self.opener for dirlogcache entries like as below is a little redundant, because storecache-ed properties of localrepository doesn't refer file stat of non-root 00manifest.i files (= ambiguity of such files doesn't cause any issue). if dir not in self._dirlogcache: self._dirlogcache[dir] = manifestrevlog(self.opener, dir, self._dirlogcache) But wrapped opener never checks ambiguity of file stat for non-root 00manifest.i files, because "path == '00manifest.i'" condition in wrapper.__call__() is always False for such files. Therefore, this patch simply reuses wrapped self.opener for non-root 00manifest.i files. Even after this patch, avoiding file stat ambiguity of 00manifest.i around truncation isn't yet completed, because truncation side isn't aware of this issue. This is a part of ExactCacheValidationPlan. https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan diff --git a/mercurial/manifest.py b/mercurial/manifest.py --- a/mercurial/manifest.py +++ b/mercurial/manifest.py @@ -892,6 +892,28 @@ class treemanifest(object): subp1, subp2 = subp2, subp1 writesubtree(subm, subp1, subp2) +def _checkambigopener(opener): +"""build an opener to add checkambig=True at changing 00manifest.i +""" +class wrapper(opener.__class__): +def __call__(self, path, mode="r", text=False, atomictemp=False, + notindexed=False, backgroundclose=False, checkambig=False, + **kwargs): +if path == '00manifest.i' and mode not in ('r', 'rb'): +# check file stat ambiguity at closing forcibly +checkambig = True +supercall = super(wrapper, self).__call__ +return supercall(path, mode, + text=text, + atomictemp=atomictemp, + notindexed=notindexed, + backgroundclose=backgroundclose, + checkambig=checkambig, + **kwargs) +opener.__class__ = wrapper This wrapper logic is a bit scary here. Could we had the necessary minimal hooks in revlog to allow manifest and changelog to hooks in ? That would seems more robust/clean than wrapping core from core. How about adding optional argument "checkambig" to revlog.__init__(), of which default value is False ?\ Given that multiple subclass uses it it seems to make sense to go this way. It seems simple enough. The revlog class has other small 'extension point' for subclass so there is precedence. Cheers, -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 4 of 7] manifest: use specific opener to avoid file stat ambiguity around truncation
At Tue, 20 Sep 2016 18:14:32 +0200, Pierre-Yves David wrote: > > On 09/16/2016 10:51 PM, FUJIWARA Katsunori wrote: > > # HG changeset patch > > # User FUJIWARA Katsunori> > # Date 1474057496 -32400 > > # Sat Sep 17 05:24:56 2016 +0900 > > # Node ID 55bd90d63ad3a76fd4d27d5eabb3fe9a7fa0642b > > # Parent 71b6b49f8a7ab6c894028b9153290f4bbf0f54f6 > > manifest: use specific opener to avoid file stat ambiguity around truncation > > > > If steps below occurs at "the same time in sec", all of mtime, ctime > > and size are same between (1) and (3). > > > > 1. append data to 00manifest.i (and close transaction) > > 2. discard appended data by truncation (strip or rollback) > > 3. append same size but different data to 00manifest.i again > > > > Therefore, cache validation doesn't work after (3) as expected. > > > > To avoid such file stat ambiguity around truncation, this patch uses > > specific opener, which forces checkambig=True at writing 00manifest.i > > changes out. > > > > Hooking vfs.__call__() is the only way to centralize changes into > > manifest.py, because all logic to actually write in-memory changes out > > is implemented in revlog layer. > > > > In fact, reusing already wrapped self.opener for dirlogcache entries > > like as below is a little redundant, because storecache-ed properties > > of localrepository doesn't refer file stat of non-root 00manifest.i > > files (= ambiguity of such files doesn't cause any issue). > > > > if dir not in self._dirlogcache: > > self._dirlogcache[dir] = manifestrevlog(self.opener, dir, > > self._dirlogcache) > > > > But wrapped opener never checks ambiguity of file stat for non-root > > 00manifest.i files, because "path == '00manifest.i'" condition in > > wrapper.__call__() is always False for such files. Therefore, this > > patch simply reuses wrapped self.opener for non-root 00manifest.i > > files. > > > > Even after this patch, avoiding file stat ambiguity of 00manifest.i > > around truncation isn't yet completed, because truncation side isn't > > aware of this issue. > > > > This is a part of ExactCacheValidationPlan. > > > > https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan > > > > diff --git a/mercurial/manifest.py b/mercurial/manifest.py > > --- a/mercurial/manifest.py > > +++ b/mercurial/manifest.py > > @@ -892,6 +892,28 @@ class treemanifest(object): > > subp1, subp2 = subp2, subp1 > > writesubtree(subm, subp1, subp2) > > > > +def _checkambigopener(opener): > > +"""build an opener to add checkambig=True at changing 00manifest.i > > +""" > > +class wrapper(opener.__class__): > > +def __call__(self, path, mode="r", text=False, atomictemp=False, > > + notindexed=False, backgroundclose=False, > > checkambig=False, > > + **kwargs): > > +if path == '00manifest.i' and mode not in ('r', 'rb'): > > +# check file stat ambiguity at closing forcibly > > +checkambig = True > > +supercall = super(wrapper, self).__call__ > > +return supercall(path, mode, > > + text=text, > > + atomictemp=atomictemp, > > + notindexed=notindexed, > > + backgroundclose=backgroundclose, > > + checkambig=checkambig, > > + **kwargs) > > +opener.__class__ = wrapper > > This wrapper logic is a bit scary here. Could we had the necessary > minimal hooks in revlog to allow manifest and changelog to hooks in ? > That would seems more robust/clean than wrapping core from core. How about adding optional argument "checkambig" to revlog.__init__(), of which default value is False ? > > + > > +return opener > > + > > class manifestrevlog(revlog.revlog): > > '''A revlog that stores manifest texts. This is responsible for > > caching the > > full-text manifest contents. > > @@ -927,6 +949,13 @@ class manifestrevlog(revlog.revlog): > > else: > > self._dirlogcache = {'': self} > > > > +# If "dir" isn't empty, this "opener" is reused one via > > +# "self.opener" of another manifestrevlog. Therefore, > > +# wrapping opener is needed only at once for root > > +# 00manifest.i file, to avoid multiple (= redundant) > > +# wrapping. > > +opener = _checkambigopener(opener) > > + > > super(manifestrevlog, self).__init__(opener, indexfile) > > > > @property > > ___ > > Mercurial-devel mailing list > > Mercurial-devel@mercurial-scm.org > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel > > > > -- > Pierre-Yves David >
Re: [PATCH 4 of 7] manifest: use specific opener to avoid file stat ambiguity around truncation
On 09/16/2016 10:51 PM, FUJIWARA Katsunori wrote: # HG changeset patch # User FUJIWARA Katsunori# Date 1474057496 -32400 # Sat Sep 17 05:24:56 2016 +0900 # Node ID 55bd90d63ad3a76fd4d27d5eabb3fe9a7fa0642b # Parent 71b6b49f8a7ab6c894028b9153290f4bbf0f54f6 manifest: use specific opener to avoid file stat ambiguity around truncation If steps below occurs at "the same time in sec", all of mtime, ctime and size are same between (1) and (3). 1. append data to 00manifest.i (and close transaction) 2. discard appended data by truncation (strip or rollback) 3. append same size but different data to 00manifest.i again Therefore, cache validation doesn't work after (3) as expected. To avoid such file stat ambiguity around truncation, this patch uses specific opener, which forces checkambig=True at writing 00manifest.i changes out. Hooking vfs.__call__() is the only way to centralize changes into manifest.py, because all logic to actually write in-memory changes out is implemented in revlog layer. In fact, reusing already wrapped self.opener for dirlogcache entries like as below is a little redundant, because storecache-ed properties of localrepository doesn't refer file stat of non-root 00manifest.i files (= ambiguity of such files doesn't cause any issue). if dir not in self._dirlogcache: self._dirlogcache[dir] = manifestrevlog(self.opener, dir, self._dirlogcache) But wrapped opener never checks ambiguity of file stat for non-root 00manifest.i files, because "path == '00manifest.i'" condition in wrapper.__call__() is always False for such files. Therefore, this patch simply reuses wrapped self.opener for non-root 00manifest.i files. Even after this patch, avoiding file stat ambiguity of 00manifest.i around truncation isn't yet completed, because truncation side isn't aware of this issue. This is a part of ExactCacheValidationPlan. https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan diff --git a/mercurial/manifest.py b/mercurial/manifest.py --- a/mercurial/manifest.py +++ b/mercurial/manifest.py @@ -892,6 +892,28 @@ class treemanifest(object): subp1, subp2 = subp2, subp1 writesubtree(subm, subp1, subp2) +def _checkambigopener(opener): +"""build an opener to add checkambig=True at changing 00manifest.i +""" +class wrapper(opener.__class__): +def __call__(self, path, mode="r", text=False, atomictemp=False, + notindexed=False, backgroundclose=False, checkambig=False, + **kwargs): +if path == '00manifest.i' and mode not in ('r', 'rb'): +# check file stat ambiguity at closing forcibly +checkambig = True +supercall = super(wrapper, self).__call__ +return supercall(path, mode, + text=text, + atomictemp=atomictemp, + notindexed=notindexed, + backgroundclose=backgroundclose, + checkambig=checkambig, + **kwargs) +opener.__class__ = wrapper This wrapper logic is a bit scary here. Could we had the necessary minimal hooks in revlog to allow manifest and changelog to hooks in ? That would seems more robust/clean than wrapping core from core. + +return opener + class manifestrevlog(revlog.revlog): '''A revlog that stores manifest texts. This is responsible for caching the full-text manifest contents. @@ -927,6 +949,13 @@ class manifestrevlog(revlog.revlog): else: self._dirlogcache = {'': self} +# If "dir" isn't empty, this "opener" is reused one via +# "self.opener" of another manifestrevlog. Therefore, +# wrapping opener is needed only at once for root +# 00manifest.i file, to avoid multiple (= redundant) +# wrapping. +opener = _checkambigopener(opener) + super(manifestrevlog, self).__init__(opener, indexfile) @property ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel -- Pierre-Yves David ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel