Re: Fixing .hg file open ordering

2016-11-23 Thread Bryan O'Sullivan
Yep, your reasoning here makes sense. While I'm glad that you discovered
after some investigation that we're doing the right thing already, it's a
little troubling that this wasn't easy to understand from reading the code,
and that (from my interpretation of what you wrote) you had to think and
rethink your reading a couple of times to figure out what was going on.
Sounds a bit like a latent future bug, doesn't it?

On Tue, Nov 22, 2016 at 12:19 PM, Ryan McElroy  wrote:

> I think that fixing the read open ordering is not sufficient.
>
> I was just chatting with Jun Wu (quark) about this, and after that
> discussion, it seems that we also have to ensure the write order is
> correct. Specifically, bookmarks must be written only after the changelog
> is written, otherwise, regardless of the order that we open files for read,
> we can have a bookmarks file that points to commits that no reader can see
> yet.
>
> Based on a very quick look at the transaction code, it looks like
> filegenerators in transactions are stored in a dict, so we have no
> guarantees about what order they will be called in, so it's very likely
> that we often call the write of the bookmarks file before the write of the
> changelog, making it so that no read-side fix alone can be sufficient to
> solve this problem.
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Fixing .hg file open ordering

2016-11-22 Thread Ryan McElroy



On 11/22/16 8:19 PM, Ryan McElroy wrote:

On 11/10/16 5:11 PM, Bryan O'Sullivan wrote:


On Thu, Nov 10, 2016 at 2:29 AM, Durham Goode > wrote:


We worked out a solution to Bryan’s main concern, but there
hasn’t been any discussion of my main proposal. Internally we
worked around it by forcing bookmarks and the changelog to load
in the right order very early during repo setup, but that’s a
hack and doesn’t address the greater problem.


For the Real Problem (TM), I'd rather try the simpler solution first, 
of reading the bookmarks file in the changelog constructor, then 
splitting it into lines in the bmstore constructor. I'd hope that 
could be both lightweight enough that it could avoid a performance 
hit, and unlikely enough to introduce bugs later on that we wouldn't 
have to worry about a more heavyweight "make bugs impossible" 
solution. If I turn out to be wrong and we need something more formal 
(which we've gotten away without for years with revlogs, FWIW), we 
haven't really lost much.





I think that fixing the read open ordering is not sufficient.

I was just chatting with Jun Wu (quark) about this, and after that 
discussion, it seems that we also have to ensure the write order is 
correct. Specifically, bookmarks must be written only after the 
changelog is written, otherwise, regardless of the order that we open 
files for read, we can have a bookmarks file that points to commits 
that no reader can see yet.


Based on a very quick look at the transaction code, it looks like 
filegenerators in transactions are stored in a dict, so we have no 
guarantees about what order they will be called in, so it's very 
likely that we often call the write of the bookmarks file before the 
write of the changelog, making it so that no read-side fix alone can 
be sufficient to solve this problem


Well, I looked a little more deeply and actually filegenerators have an 
order parameter (sorry I missed that the first time).


However, the order parameter is unset by the bmstore when it calls 
addfilegenerator.


However (and now I'm remembering more of this), I see that we have a set 
called "postfinalizegenerators" which are forced to be after the 
finalize steps which is where the changelog is written (introduced in 
a5009789960c by durham). So it seems we're already okay here, assuming 
all that code it working as intended.


Sorry for forgetting about this.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Fixing .hg file open ordering

2016-11-10 Thread Durham Goode


On 11/9/16, 4:51 PM, "Augie Fackler"  wrote:

On Wed, Nov 02, 2016 at 03:39:50PM -0700, Bryan O'Sullivan wrote:
> On Wed, Nov 2, 2016 at 3:30 PM, Durham Goode  wrote:
>
> > Bookmark writes are now within the wlock, and any inmemory read you did
> > prior to taking the lock will be invalidated (during the next
> > repo._bookmarks access) and thrown away when you take the lock. So once
> > you're in the lock you are guaranteed to get the latest version off 
disk,
> > and guaranteed that no one else will modify it until you release the 
lock.
> >
>
> OK, great, that was the step I was missing when reading the code.

Based on my quick scan of this thread, it sounds like you've worked
out a solution for this?
 
We worked out a solution to Bryan’s main concern, but there hasn’t been any 
discussion of my main proposal.  Internally we worked around it by forcing 
bookmarks and the changelog to load in the right order very early during repo 
setup, but that’s a hack and doesn’t address the greater problem.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Fixing .hg file open ordering

2016-11-09 Thread Augie Fackler
On Wed, Nov 02, 2016 at 03:39:50PM -0700, Bryan O'Sullivan wrote:
> On Wed, Nov 2, 2016 at 3:30 PM, Durham Goode  wrote:
>
> > Bookmark writes are now within the wlock, and any inmemory read you did
> > prior to taking the lock will be invalidated (during the next
> > repo._bookmarks access) and thrown away when you take the lock. So once
> > you're in the lock you are guaranteed to get the latest version off disk,
> > and guaranteed that no one else will modify it until you release the lock.
> >
>
> OK, great, that was the step I was missing when reading the code.

Based on my quick scan of this thread, it sounds like you've worked
out a solution for this?

> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Fixing .hg file open ordering

2016-11-02 Thread Bryan O'Sullivan
On Wed, Nov 2, 2016 at 3:30 PM, Durham Goode  wrote:

> Bookmark writes are now within the wlock, and any inmemory read you did
> prior to taking the lock will be invalidated (during the next
> repo._bookmarks access) and thrown away when you take the lock. So once
> you're in the lock you are guaranteed to get the latest version off disk,
> and guaranteed that no one else will modify it until you release the lock.
>

OK, great, that was the step I was missing when reading the code.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Fixing .hg file open ordering

2016-11-02 Thread Durham Goode



On 11/2/16 3:20 PM, Bryan O'Sullivan wrote:
There's long been a well-defined order for accessing historical data: 
changelog first, then manifest, then revlog, and the reverse for writes.
Yea, I guess I'm proposing formalizing these dependencies into a single 
location in code so we cannot mess it up on accident.


I think that what has happened with bookmarks is that we literally 
forgot about the next necessary ordering constraint: you must read 
bookmarks before the changelog, otherwise you run into the race we see 
here. This same constraint should apply to any other file that 
contains commit hashes.


It's unfortunate that the bmstore API is "backwards" in this way, but 
what's the sequence of operations by which bookmarks seem to 
disappear? There's something missing from your description to help 
piece the sequence of events together, because a bookmark that points 
to a commit that is not yet known will be explicitly ignored in the 
bmstore constructor.


Here's what I *think* must be the sequence:

I read the changelog.
I'm asked to look up the foo bookmark.
Someone else writes out some commits and a new bookmarks file.
I open the bookmarks file, which was written *after* the last 
changelog write.
The newly written bookmarks file contains a version of foo that points 
at a commit that I haven't yet seen, and because the bmstore 
constructor ignores it, it appears to have vanished.
Yes, I tried to say this in my first paragraph, though not as clearly as 
I could have.


I think this will be a little harder to fix than you may be anticipating.

You can fix the read path by reading the bookmarks before the 
changelog, but the write path is subtle. Here's why.


Bookmarks are stored in memory. If I'm being asked to write out a new 
bookmarks file as part of a transaction, what happens in this 
interleaving?


I read bookmarks
You write bookmarks
I write bookmarks, based off my stale understanding

In other words, I'll potentially cause your write to go missing unless 
I validate during the transaction that the bookmarks haven't changed 
since I read them before I write them.
Bookmark writes are now within the wlock, and any inmemory read you did 
prior to taking the lock will be invalidated (during the next 
repo._bookmarks access) and thrown away when you take the lock. So once 
you're in the lock you are guaranteed to get the latest version off 
disk, and guaranteed that no one else will modify it until you release 
the lock.


So your sequence becomes:

I read bookmarks
You write bookmarks
You release the lock
I take the lock
I read bookmarks again
I write bookmarks
I release the lock
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Fixing .hg file open ordering

2016-11-02 Thread Bryan O'Sullivan
There's long been a well-defined order for accessing historical data:
changelog first, then manifest, then revlog, and the reverse for writes.

I think that what has happened with bookmarks is that we literally forgot
about the next necessary ordering constraint: you must read bookmarks
before the changelog, otherwise you run into the race we see here. This
same constraint should apply to any other file that contains commit hashes.

It's unfortunate that the bmstore API is "backwards" in this way, but
what's the sequence of operations by which bookmarks seem to disappear?
There's something missing from your description to help piece the sequence
of events together, because a bookmark that points to a commit that is not
yet known will be explicitly ignored in the bmstore constructor.

Here's what I *think* must be the sequence:

I read the changelog.
I'm asked to look up the foo bookmark.
Someone else writes out some commits and a new bookmarks file.
I open the bookmarks file, which was written *after* the last changelog
write.
The newly written bookmarks file contains a version of foo that points at a
commit that I haven't yet seen, and because the bmstore constructor ignores
it, it appears to have vanished.

I think this will be a little harder to fix than you may be anticipating.

You can fix the read path by reading the bookmarks before the changelog,
but the write path is subtle. Here's why.

Bookmarks are stored in memory. If I'm being asked to write out a new
bookmarks file as part of a transaction, what happens in this interleaving?

I read bookmarks
You write bookmarks
I write bookmarks, based off my stale understanding

In other words, I'll potentially cause your write to go missing unless I
validate during the transaction that the bookmarks haven't changed since I
read them before I write them.


On Wed, Nov 2, 2016 at 2:38 PM, Durham Goode  wrote:

> On 11/2/16 2:09 PM, Durham Goode wrote:
>
>> There's currently no defined order in which Mercurial should open files
>> in the .hg directory.  For instance, it's possible to read the changelog
>> first, then several seconds later read the bookmark file. If during those
>> several seconds the repo receives new commits and a bookmark moves, then it
>> will read a bookmark that points at a node that doesn't exist in the
>> inmemory changelog.
>>
>> We're seeing this on our server, which causes certain bookmarks to
>> disappear for users in some situations (until they pull again).
>>
> To clarify (as Jun has brought up outside of email), we could also solve
> this by special casing our various dependencies.  Like make
> localrepo.changelog manually load the bookmarks first (we can't do it right
> now because the bookmarkstore constructor requires the changelog so it can
> check if each bookmark exists).  So we wouldn't necessarily need a new
> layer for it, but it would require more diligence to make sure all reads
> happen in the right order across the code base.
>
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Fixing .hg file open ordering

2016-11-02 Thread Durham Goode

On 11/2/16 2:09 PM, Durham Goode wrote:
There's currently no defined order in which Mercurial should open 
files in the .hg directory.  For instance, it's possible to read the 
changelog first, then several seconds later read the bookmark file. If 
during those several seconds the repo receives new commits and a 
bookmark moves, then it will read a bookmark that points at a node 
that doesn't exist in the inmemory changelog.


We're seeing this on our server, which causes certain bookmarks to 
disappear for users in some situations (until they pull again).
To clarify (as Jun has brought up outside of email), we could also solve 
this by special casing our various dependencies.  Like make 
localrepo.changelog manually load the bookmarks first (we can't do it 
right now because the bookmarkstore constructor requires the changelog 
so it can check if each bookmark exists).  So we wouldn't necessarily 
need a new layer for it, but it would require more diligence to make 
sure all reads happen in the right order across the code base.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Fixing .hg file open ordering

2016-11-02 Thread Durham Goode
There's currently no defined order in which Mercurial should open files 
in the .hg directory.  For instance, it's possible to read the changelog 
first, then several seconds later read the bookmark file. If during 
those several seconds the repo receives new commits and a bookmark 
moves, then it will read a bookmark that points at a node that doesn't 
exist in the inmemory changelog.


We're seeing this on our server, which causes certain bookmarks to 
disappear for users in some situations (until they pull again).


Has anyone thought about how to solve this?  I know we talked about a 
better storage model that allowed for more snapshot views of the store, 
but that seems farther off.


I've been thinking about two possible ideas:

1. Introduce a store accessor class that all reads go through. An 
instance of it would live on the localrepo and it is responsible for 
providing a snapshotted view of the store. It would contain all the 
inmemory instances of store structures (the revlogs, the bookmark store, 
etc). All reads of those structures would go through that class, and it 
would be responsible for invalidation (like @storecache) and loading 
them in the right order (so reading the changelog would force the 
bookmarks to be read first).


This might also be useful for letting other classes hold a reference to 
the store without holding a reference to the repo. For instance, 
changectx and manifestctx could hold a reference to the store accessor 
(since they need access to the changelog and manifest revlogs for 
reads), instead of the repo (which is a source of circular dependencies).


Having the store be separate from the repo instance could also be useful 
for having multiple store snapshots inmemory at once.  I'm not sure why 
we would want to, but we could diff the originally loaded store state 
with a future store state by instantiating two.


2. The option above would be a pretty significant change. Alternatively, 
I was think about teaching the opener class about the concept of 
dependencies.  We could tell it ".hg/bookmark" should always be opened 
before ".hg/store/00changelog.i". Then, if anyone tries to read 
.hg/store/00changelog.i, we will automatically open .hg/bookmark first, 
if it hasn't already been opened.  Then when the bookmark reader comes 
along, we hand it the already opened file handle.


It's somewhat weird to do it at the opener level, but it would mean we 
could avoid refactoring the rest of the code base.



Thoughts? Better ideas?

Durham

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel