Re: Fixing .hg file open ordering
Yep, your reasoning here makes sense. While I'm glad that you discovered after some investigation that we're doing the right thing already, it's a little troubling that this wasn't easy to understand from reading the code, and that (from my interpretation of what you wrote) you had to think and rethink your reading a couple of times to figure out what was going on. Sounds a bit like a latent future bug, doesn't it? On Tue, Nov 22, 2016 at 12:19 PM, Ryan McElroywrote: > I think that fixing the read open ordering is not sufficient. > > I was just chatting with Jun Wu (quark) about this, and after that > discussion, it seems that we also have to ensure the write order is > correct. Specifically, bookmarks must be written only after the changelog > is written, otherwise, regardless of the order that we open files for read, > we can have a bookmarks file that points to commits that no reader can see > yet. > > Based on a very quick look at the transaction code, it looks like > filegenerators in transactions are stored in a dict, so we have no > guarantees about what order they will be called in, so it's very likely > that we often call the write of the bookmarks file before the write of the > changelog, making it so that no read-side fix alone can be sufficient to > solve this problem. > ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Fixing .hg file open ordering
On 11/22/16 8:19 PM, Ryan McElroy wrote: On 11/10/16 5:11 PM, Bryan O'Sullivan wrote: On Thu, Nov 10, 2016 at 2:29 AM, Durham Goode> wrote: We worked out a solution to Bryan’s main concern, but there hasn’t been any discussion of my main proposal. Internally we worked around it by forcing bookmarks and the changelog to load in the right order very early during repo setup, but that’s a hack and doesn’t address the greater problem. For the Real Problem (TM), I'd rather try the simpler solution first, of reading the bookmarks file in the changelog constructor, then splitting it into lines in the bmstore constructor. I'd hope that could be both lightweight enough that it could avoid a performance hit, and unlikely enough to introduce bugs later on that we wouldn't have to worry about a more heavyweight "make bugs impossible" solution. If I turn out to be wrong and we need something more formal (which we've gotten away without for years with revlogs, FWIW), we haven't really lost much. I think that fixing the read open ordering is not sufficient. I was just chatting with Jun Wu (quark) about this, and after that discussion, it seems that we also have to ensure the write order is correct. Specifically, bookmarks must be written only after the changelog is written, otherwise, regardless of the order that we open files for read, we can have a bookmarks file that points to commits that no reader can see yet. Based on a very quick look at the transaction code, it looks like filegenerators in transactions are stored in a dict, so we have no guarantees about what order they will be called in, so it's very likely that we often call the write of the bookmarks file before the write of the changelog, making it so that no read-side fix alone can be sufficient to solve this problem Well, I looked a little more deeply and actually filegenerators have an order parameter (sorry I missed that the first time). However, the order parameter is unset by the bmstore when it calls addfilegenerator. However (and now I'm remembering more of this), I see that we have a set called "postfinalizegenerators" which are forced to be after the finalize steps which is where the changelog is written (introduced in a5009789960c by durham). So it seems we're already okay here, assuming all that code it working as intended. Sorry for forgetting about this. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Fixing .hg file open ordering
On 11/9/16, 4:51 PM, "Augie Fackler"wrote: On Wed, Nov 02, 2016 at 03:39:50PM -0700, Bryan O'Sullivan wrote: > On Wed, Nov 2, 2016 at 3:30 PM, Durham Goode wrote: > > > Bookmark writes are now within the wlock, and any inmemory read you did > > prior to taking the lock will be invalidated (during the next > > repo._bookmarks access) and thrown away when you take the lock. So once > > you're in the lock you are guaranteed to get the latest version off disk, > > and guaranteed that no one else will modify it until you release the lock. > > > > OK, great, that was the step I was missing when reading the code. Based on my quick scan of this thread, it sounds like you've worked out a solution for this? We worked out a solution to Bryan’s main concern, but there hasn’t been any discussion of my main proposal. Internally we worked around it by forcing bookmarks and the changelog to load in the right order very early during repo setup, but that’s a hack and doesn’t address the greater problem. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Fixing .hg file open ordering
On Wed, Nov 02, 2016 at 03:39:50PM -0700, Bryan O'Sullivan wrote: > On Wed, Nov 2, 2016 at 3:30 PM, Durham Goodewrote: > > > Bookmark writes are now within the wlock, and any inmemory read you did > > prior to taking the lock will be invalidated (during the next > > repo._bookmarks access) and thrown away when you take the lock. So once > > you're in the lock you are guaranteed to get the latest version off disk, > > and guaranteed that no one else will modify it until you release the lock. > > > > OK, great, that was the step I was missing when reading the code. Based on my quick scan of this thread, it sounds like you've worked out a solution for this? > ___ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Fixing .hg file open ordering
On Wed, Nov 2, 2016 at 3:30 PM, Durham Goodewrote: > Bookmark writes are now within the wlock, and any inmemory read you did > prior to taking the lock will be invalidated (during the next > repo._bookmarks access) and thrown away when you take the lock. So once > you're in the lock you are guaranteed to get the latest version off disk, > and guaranteed that no one else will modify it until you release the lock. > OK, great, that was the step I was missing when reading the code. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Fixing .hg file open ordering
On 11/2/16 3:20 PM, Bryan O'Sullivan wrote: There's long been a well-defined order for accessing historical data: changelog first, then manifest, then revlog, and the reverse for writes. Yea, I guess I'm proposing formalizing these dependencies into a single location in code so we cannot mess it up on accident. I think that what has happened with bookmarks is that we literally forgot about the next necessary ordering constraint: you must read bookmarks before the changelog, otherwise you run into the race we see here. This same constraint should apply to any other file that contains commit hashes. It's unfortunate that the bmstore API is "backwards" in this way, but what's the sequence of operations by which bookmarks seem to disappear? There's something missing from your description to help piece the sequence of events together, because a bookmark that points to a commit that is not yet known will be explicitly ignored in the bmstore constructor. Here's what I *think* must be the sequence: I read the changelog. I'm asked to look up the foo bookmark. Someone else writes out some commits and a new bookmarks file. I open the bookmarks file, which was written *after* the last changelog write. The newly written bookmarks file contains a version of foo that points at a commit that I haven't yet seen, and because the bmstore constructor ignores it, it appears to have vanished. Yes, I tried to say this in my first paragraph, though not as clearly as I could have. I think this will be a little harder to fix than you may be anticipating. You can fix the read path by reading the bookmarks before the changelog, but the write path is subtle. Here's why. Bookmarks are stored in memory. If I'm being asked to write out a new bookmarks file as part of a transaction, what happens in this interleaving? I read bookmarks You write bookmarks I write bookmarks, based off my stale understanding In other words, I'll potentially cause your write to go missing unless I validate during the transaction that the bookmarks haven't changed since I read them before I write them. Bookmark writes are now within the wlock, and any inmemory read you did prior to taking the lock will be invalidated (during the next repo._bookmarks access) and thrown away when you take the lock. So once you're in the lock you are guaranteed to get the latest version off disk, and guaranteed that no one else will modify it until you release the lock. So your sequence becomes: I read bookmarks You write bookmarks You release the lock I take the lock I read bookmarks again I write bookmarks I release the lock ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Fixing .hg file open ordering
There's long been a well-defined order for accessing historical data: changelog first, then manifest, then revlog, and the reverse for writes. I think that what has happened with bookmarks is that we literally forgot about the next necessary ordering constraint: you must read bookmarks before the changelog, otherwise you run into the race we see here. This same constraint should apply to any other file that contains commit hashes. It's unfortunate that the bmstore API is "backwards" in this way, but what's the sequence of operations by which bookmarks seem to disappear? There's something missing from your description to help piece the sequence of events together, because a bookmark that points to a commit that is not yet known will be explicitly ignored in the bmstore constructor. Here's what I *think* must be the sequence: I read the changelog. I'm asked to look up the foo bookmark. Someone else writes out some commits and a new bookmarks file. I open the bookmarks file, which was written *after* the last changelog write. The newly written bookmarks file contains a version of foo that points at a commit that I haven't yet seen, and because the bmstore constructor ignores it, it appears to have vanished. I think this will be a little harder to fix than you may be anticipating. You can fix the read path by reading the bookmarks before the changelog, but the write path is subtle. Here's why. Bookmarks are stored in memory. If I'm being asked to write out a new bookmarks file as part of a transaction, what happens in this interleaving? I read bookmarks You write bookmarks I write bookmarks, based off my stale understanding In other words, I'll potentially cause your write to go missing unless I validate during the transaction that the bookmarks haven't changed since I read them before I write them. On Wed, Nov 2, 2016 at 2:38 PM, Durham Goodewrote: > On 11/2/16 2:09 PM, Durham Goode wrote: > >> There's currently no defined order in which Mercurial should open files >> in the .hg directory. For instance, it's possible to read the changelog >> first, then several seconds later read the bookmark file. If during those >> several seconds the repo receives new commits and a bookmark moves, then it >> will read a bookmark that points at a node that doesn't exist in the >> inmemory changelog. >> >> We're seeing this on our server, which causes certain bookmarks to >> disappear for users in some situations (until they pull again). >> > To clarify (as Jun has brought up outside of email), we could also solve > this by special casing our various dependencies. Like make > localrepo.changelog manually load the bookmarks first (we can't do it right > now because the bookmarkstore constructor requires the changelog so it can > check if each bookmark exists). So we wouldn't necessarily need a new > layer for it, but it would require more diligence to make sure all reads > happen in the right order across the code base. > > ___ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel > ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Fixing .hg file open ordering
On 11/2/16 2:09 PM, Durham Goode wrote: There's currently no defined order in which Mercurial should open files in the .hg directory. For instance, it's possible to read the changelog first, then several seconds later read the bookmark file. If during those several seconds the repo receives new commits and a bookmark moves, then it will read a bookmark that points at a node that doesn't exist in the inmemory changelog. We're seeing this on our server, which causes certain bookmarks to disappear for users in some situations (until they pull again). To clarify (as Jun has brought up outside of email), we could also solve this by special casing our various dependencies. Like make localrepo.changelog manually load the bookmarks first (we can't do it right now because the bookmarkstore constructor requires the changelog so it can check if each bookmark exists). So we wouldn't necessarily need a new layer for it, but it would require more diligence to make sure all reads happen in the right order across the code base. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Fixing .hg file open ordering
There's currently no defined order in which Mercurial should open files in the .hg directory. For instance, it's possible to read the changelog first, then several seconds later read the bookmark file. If during those several seconds the repo receives new commits and a bookmark moves, then it will read a bookmark that points at a node that doesn't exist in the inmemory changelog. We're seeing this on our server, which causes certain bookmarks to disappear for users in some situations (until they pull again). Has anyone thought about how to solve this? I know we talked about a better storage model that allowed for more snapshot views of the store, but that seems farther off. I've been thinking about two possible ideas: 1. Introduce a store accessor class that all reads go through. An instance of it would live on the localrepo and it is responsible for providing a snapshotted view of the store. It would contain all the inmemory instances of store structures (the revlogs, the bookmark store, etc). All reads of those structures would go through that class, and it would be responsible for invalidation (like @storecache) and loading them in the right order (so reading the changelog would force the bookmarks to be read first). This might also be useful for letting other classes hold a reference to the store without holding a reference to the repo. For instance, changectx and manifestctx could hold a reference to the store accessor (since they need access to the changelog and manifest revlogs for reads), instead of the repo (which is a source of circular dependencies). Having the store be separate from the repo instance could also be useful for having multiple store snapshots inmemory at once. I'm not sure why we would want to, but we could diff the originally loaded store state with a future store state by instantiating two. 2. The option above would be a pretty significant change. Alternatively, I was think about teaching the opener class about the concept of dependencies. We could tell it ".hg/bookmark" should always be opened before ".hg/store/00changelog.i". Then, if anyone tries to read .hg/store/00changelog.i, we will automatically open .hg/bookmark first, if it hasn't already been opened. Then when the bookmark reader comes along, we hand it the already opened file handle. It's somewhat weird to do it at the opener level, but it would mean we could avoid refactoring the rest of the code base. Thoughts? Better ideas? Durham ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel