Re: [PATCH 00/12] Fix some reference-related races

2013-06-18 Thread Michael Haggerty
On 06/18/2013 08:13 PM, Ramsay Jones wrote:
> Michael Haggerty wrote:
>> On 06/15/2013 10:13 PM, Ramsay Jones wrote:
>>> Now, turning to the new code, t3211-peel-ref.sh test #7 now works, but
>>> test #8 still fails...
> 
> [ ... ]
> 
>> It should be impossible, because the current process is
>> holding packed-refs.lock, and therefore other git processes should
>> refuse to change the packed-refs file.
> 
> :-P You are assuming that a single process can't lie to itself ...
> 
> [ ... ]
> 
> I should not have assumed that you knew what I meant by "schizophrenic
> stat() functions" above; sorry about that! If you are interested, then
> the following commits may be useful reading: adbc0b6, 7faee6b, 7974843,
> 05bab3ea, 924aaf3e and b8a97333.

Thanks, that helps.

>>> I haven't checked the remaining test failures to see if they are
>>> caused by this code (I don't think so, but ...), but this failure
>>> is clearly a cygwin specific issue.
>>
>> Thanks again for the testing and analysis,
> 
> So, unless you feel the need to fix this yourself, you can probably
> ignore this issue for now. I will hopefully find time to fix it up
> before this topic progresses to next. (Although I don't have any
> feeling for the time-frame of this topic).

Despite reading the commits that you referenced, I still don't feel
competent to fix this myself so I gratefully accept your offer.
Ideally, whatever complexity is needed would be hidden in the functions
stat_validity_check() and stat_validity_update() added by patch 09/12 of
my series, and possibly match_stat_data() from 08/12.

Let me know if I can help.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/12] Fix some reference-related races

2013-06-18 Thread Ramsay Jones
Michael Haggerty wrote:
> Thanks for all of the information.
> 
> On 06/15/2013 10:13 PM, Ramsay Jones wrote:
>> Michael Haggerty wrote:
>>> *This patch series must be built on top of mh/reflife.*

[ ... ]

>> You may be wondering why clear_packed_ref_cache() is called? Well, that
>> is because stat_validity_check() *incorrectly* indicates that the
>> packed-refs file has changed. Why does it do that? Well, this is one
>> more example of the problems caused by the cygwin schizophrenic stat()
>> functions. :( [ARGH]
>>

[ ... ]

> So if I understand correctly, all of the above is *without* the
> refcounting changes introduced in mh/ref-races?  Is so, then it is not
> surprising, as this is exactly the sort of problem that the reference
> counting is meant to solve.

Yes, as I said, this describes the old (non refcounted) series.
This particular problem (the segmentation fault) is fixed by the new
series (as noted below). [Note, however, that the packed-refs file
will still be re-read more often than needed.]

>> Now, turning to the new code, t3211-peel-ref.sh test #7 now works, but
>> test #8 still fails...

[ ... ]

> These "internal error: packed-ref cache cleared while locked" failures
> result from an internal consistency check that clear_packed_ref_cache()
> is not called while the write lock is held on the packed-refs file.  A
> call to c_p_r_c() could result from
> 
> * a programming error
> 
> * a determination based on the packed-refs file stat that the file needs
> to be re-read
> 
> Judging from what you said about cygwin, I assume that the latter is
> happening.

Indeed.

> It should be impossible, because the current process is
> holding packed-refs.lock, and therefore other git processes should
> refuse to change the packed-refs file.

:-P You are assuming that a single process can't lie to itself ...

[ ... ]

> Yikes!  ECYGWINFAIL.

Ah, NO, this should read ECYGWINGITFAIL.
This is a self-inflicted wound; it has nothing much to do with cygwin.

I should not have assumed that you knew what I meant by "schizophrenic
stat() functions" above; sorry about that! If you are interested, then
the following commits may be useful reading: adbc0b6, 7faee6b, 7974843,
05bab3ea, 924aaf3e and b8a97333.

[ ... ]

>> I haven't checked the remaining test failures to see if they are
>> caused by this code (I don't think so, but ...), but this failure
>> is clearly a cygwin specific issue.
> 
> Thanks again for the testing and analysis,

So, unless you feel the need to fix this yourself, you can probably
ignore this issue for now. I will hopefully find time to fix it up
before this topic progresses to next. (Although I don't have any
feeling for the time-frame of this topic).

HTH

ATB,
Ramsay Jones



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/12] Fix some reference-related races

2013-06-15 Thread Michael Haggerty
Thanks for all of the information.

On 06/15/2013 10:13 PM, Ramsay Jones wrote:
> Michael Haggerty wrote:
>> *This patch series must be built on top of mh/reflife.*
>>
> 
> [...]
> 
>> The other problem was that the for_each_ref() functions will die if
>> the ref cache that they are iterating over is freed out from under
>> them.  This problem is solved by using reference counts to avoid
>> freeing the old packed ref cache (even if it is no longer valid) until
>> all users are done with it.
> 
> Yes, I found exactly this happened to me on cygwin, earlier this week,
> with the previous version of this code. After seeing this mail, I had
> decided not to describe the failure on the old version, but wait and
> test this version instead.
> 
> This version is a great improvement, but it still has some failures on
> cygwin. So, it may be worth (briefly) describing the old failure anyway!
> Note that several tests failed, but I will only mention t3211-peel-ref.sh
> tests #7-8.
> 
>   $ pwd
>   /home/ramsay/git/t/trash directory.t3211-peel-ref
>   $
> 
>   $ ../../bin-wrappers/git show-ref -d
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/heads/master
>   eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/outside/foo
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/outside/foo^{}
> 5 [main] git 3540 _cygtls::handle_exceptions: Error while dumping 
> state (p
>   robably corrupted stack)
>   Segmentation fault (core dumped)
>   $
> 
> The stack-trace for the faulting code looks something like:
> 
>   cmd_show_ref()
> for_each_ref(show_ref, NULL)
>   do_for_each_ref(&ref_cache, "", show_ref, 0, 0, NULL)
> do_for_each_entry(&ref_cache, "", do_one_ref, &data)
>   do_for_each_entry_in_dirs(packed_dir, loose_dir, do_one_ref, &data)
> *dfeeid() recursive calls*
>   do_one_ref(entry, &data)
> show_ref("refs/outside/foo", sha1, NULL) [2nd match]
>   peel_ref("refs/outside/foo", sha1)
> peel_entry(entry, 0)
>   peel_object(name, sha1)
> deref_tag_noverify(o)
>   parse_object(sha1 )
> lookup_replace_object(sha1)
>   do_lookup_replace_object(sha1)
> prepare_replace_object()
> 
>   [un-indent here!]
>   for_each_replace_ref(register_replace_ref, NULL)
> do_for_each_ref(&ref_cache, "refs/replace", fn, 13, 0, NULL)
>   do_for_each_entry(&ref_cache, "refs/replace", fn, &data)
> get_packed_refs(&ref_cache)
>   clear_packed_ref_cache(&ref_cache) *free_ref_entries etc*
>   ** return to show_ref() [2nd match] above **
>   ** return to recursive dfeeid() call in original iteration
>   ** dir1->entries has been free()-ed and reused => segmentation fault
>   [dir1->entries == 0x64633263 => dc2c => part of sha1 for refs/outside/foo]
> 
> So, the nested "replace-reference-iteration" causes the ref_cache to be
> freed out from under the initial show-ref iteration, so this works:
> 
>   $ GIT_NO_REPLACE_OBJECTS=1 ../../bin-wrappers/git show-ref -d
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/heads/master
>   eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/outside/foo
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/outside/foo^{}
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/tags/base
>   eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/tags/foo
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/tags/foo^{}
>   $
> 
> You may be wondering why clear_packed_ref_cache() is called? Well, that
> is because stat_validity_check() *incorrectly* indicates that the
> packed-refs file has changed. Why does it do that? Well, this is one
> more example of the problems caused by the cygwin schizophrenic stat()
> functions. :( [ARGH]
> 
> At this point, I tried running 'git show-ref' with core.checkstat set
> on the command line; but that didn't work! I had to fix show-ref and
> re-build git, and then, this works:
> 
>   $ ../../bin-wrappers/git -c core.checkstat=minimal -c core.trustctime=f
>   alse show-ref -d
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/heads/master
>   eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/outside/foo
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/outside/foo^{}
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/tags/base
>   eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/tags/foo
>   d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/tags/foo^{}
>   $

So if I understand correctly, all of the above is *without* the
refcounting changes introduced in mh/ref-races?  Is so, then it is not
surprising, as this is exactly the sort of problem that the reference
counting is meant to solve.

> Now, turning to the new code, t3211-peel-ref.sh test #7 now works, but
> test #8 still fails...
> 
>   $ ./t3211-peel-ref.sh -i -v
> 
>   ...
> 
>   ok 7 - refs are peeled outside of refs/tags (old packed)
> 
>   expecting success:

Re: [PATCH 00/12] Fix some reference-related races

2013-06-15 Thread Ramsay Jones
Michael Haggerty wrote:
> *This patch series must be built on top of mh/reflife.*
> 

[...]

> The other problem was that the for_each_ref() functions will die if
> the ref cache that they are iterating over is freed out from under
> them.  This problem is solved by using reference counts to avoid
> freeing the old packed ref cache (even if it is no longer valid) until
> all users are done with it.

Yes, I found exactly this happened to me on cygwin, earlier this week,
with the previous version of this code. After seeing this mail, I had
decided not to describe the failure on the old version, but wait and
test this version instead.

This version is a great improvement, but it still has some failures on
cygwin. So, it may be worth (briefly) describing the old failure anyway!
Note that several tests failed, but I will only mention t3211-peel-ref.sh
tests #7-8.

  $ pwd
  /home/ramsay/git/t/trash directory.t3211-peel-ref
  $

  $ ../../bin-wrappers/git show-ref -d
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/heads/master
  eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/outside/foo
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/outside/foo^{}
5 [main] git 3540 _cygtls::handle_exceptions: Error while dumping state 
(p
  robably corrupted stack)
  Segmentation fault (core dumped)
  $

The stack-trace for the faulting code looks something like:

  cmd_show_ref()
for_each_ref(show_ref, NULL)
  do_for_each_ref(&ref_cache, "", show_ref, 0, 0, NULL)
do_for_each_entry(&ref_cache, "", do_one_ref, &data)
  do_for_each_entry_in_dirs(packed_dir, loose_dir, do_one_ref, &data)
*dfeeid() recursive calls*
  do_one_ref(entry, &data)
show_ref("refs/outside/foo", sha1, NULL) [2nd match]
  peel_ref("refs/outside/foo", sha1)
peel_entry(entry, 0)
  peel_object(name, sha1)
deref_tag_noverify(o)
  parse_object(sha1 )
lookup_replace_object(sha1)
  do_lookup_replace_object(sha1)
prepare_replace_object()

  [un-indent here!]
  for_each_replace_ref(register_replace_ref, NULL)
do_for_each_ref(&ref_cache, "refs/replace", fn, 13, 0, NULL)
  do_for_each_entry(&ref_cache, "refs/replace", fn, &data)
get_packed_refs(&ref_cache)
  clear_packed_ref_cache(&ref_cache) *free_ref_entries etc*
  ** return to show_ref() [2nd match] above **
  ** return to recursive dfeeid() call in original iteration
  ** dir1->entries has been free()-ed and reused => segmentation fault
  [dir1->entries == 0x64633263 => dc2c => part of sha1 for refs/outside/foo]

So, the nested "replace-reference-iteration" causes the ref_cache to be
freed out from under the initial show-ref iteration, so this works:

  $ GIT_NO_REPLACE_OBJECTS=1 ../../bin-wrappers/git show-ref -d
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/heads/master
  eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/outside/foo
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/outside/foo^{}
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/tags/base
  eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/tags/foo
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/tags/foo^{}
  $

You may be wondering why clear_packed_ref_cache() is called? Well, that
is because stat_validity_check() *incorrectly* indicates that the
packed-refs file has changed. Why does it do that? Well, this is one
more example of the problems caused by the cygwin schizophrenic stat()
functions. :( [ARGH]

At this point, I tried running 'git show-ref' with core.checkstat set
on the command line; but that didn't work! I had to fix show-ref and
re-build git, and then, this works:

  $ ../../bin-wrappers/git -c core.checkstat=minimal -c core.trustctime=f
  alse show-ref -d
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/heads/master
  eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/outside/foo
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/outside/foo^{}
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/tags/base
  eb0e854c2cd2511c7571b1a5e3c8b8146193fb30 refs/tags/foo
  d1ff1c9224ae5e58a7656fb9ecc95865d42ed71e refs/tags/foo^{}
  $

Now, turning to the new code, t3211-peel-ref.sh test #7 now works, but
test #8 still fails...

  $ ./t3211-peel-ref.sh -i -v

  ...

  ok 7 - refs are peeled outside of refs/tags (old packed)

  expecting success:
  git pack-refs --all &&
  cp .git/packed-refs fully-peeled &&
  git branch yadda &&
  git pack-refs --all &&
  git branch -d yadda &&
  test_cmp fully-peeled .git/packed-refs

  fatal: internal error: packed-ref cache cleared while locked
  not ok 8 - peeled refs survive deletion of packed ref
  #
  #   git pack-refs --all &&
  #   cp .git/packed-refs fully-peeled &&
  #   git branch yadda &&
  #   git pack-refs --all &

Re: [PATCH 00/12] Fix some reference-related races

2013-06-12 Thread Jeff King
On Tue, Jun 11, 2013 at 11:48:20PM +0200, Michael Haggerty wrote:

> *This patch series must be built on top of mh/reflife.*

Applying on top of what Junio has in mh/reflife seems to create
conflicts at the first patch. I didn't look into it, though, but just
read the patches and looked at the packed-refs-transactions branch at
git://github.com/mhagger/git.git.

> The other problem was that the for_each_ref() functions will die if
> the ref cache that they are iterating over is freed out from under
> them.  This problem is solved by using reference counts to avoid
> freeing the old packed ref cache (even if it is no longer valid) until
> all users are done with it.

I was worried that the reference-counting would end up invasive and
ugly, but it turned out quite nice.

Overall, I think the series goes in the right direction, and I didn't
see any implementation flaws. The core of the race-handling is the same
as in mine, but your "hyperactive repository" patch shows how badly mine
can break under load.

I sent a few comments to specific patches, but I'd be fine with applying
it as-is. Thanks for working on it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/12] Fix some reference-related races

2013-06-11 Thread Michael Haggerty
*This patch series must be built on top of mh/reflife.*

This patch series fixes some races reading loose and packed refs.
Most of the problems, and some of the solutions, were pointed out by
Jeff King [1] but some other work was necessary to prevent his fixes
from causing problems elsewhere.

The basic race being addressed is that at any time "pack-refs --prune"
might be run at any time.  It rewrites the packed-refs file and then
deletes the just-packed loose refs.

Readers, then, to get a self-consistent snapshot of references [2],
must be sure to read all of the loose references it will need *before*
reading the packed references (or at least verifying that the packed
references that it read earlier are still valid).  But given the
lazy-loading of the loose references cache, this was not always the
case.

So the core of this patch series is to force loose references for an
iteration to be read all at once into the cache, and *then* to verify
that the packed-refs cache is up-to-date and if not to reload it.
Similarly, when looking up single references, a loose reference is
sought first, and then the validity of the packed-refs cache is
verified, and then the loose reference is sought in the cache.

The problem is that there was a lot of code that assumed that the
lifetime of the reference cache was essentially infinite.  The
mh/reflife patch series (which has made it to next) fixed callers who
retained pointers to refnames in the cache.

The other problem was that the for_each_ref() functions will die if
the ref cache that they are iterating over is freed out from under
them.  This problem is solved by using reference counts to avoid
freeing the old packed ref cache (even if it is no longer valid) until
all users are done with it.

Once those are done, it is possible to invalidate the packed refs
cache when needed.  So (1) we always read all loose references that
will be needed in an iteration before the iteration starts, and (2) we
add a check (based on file metadata) whenever the packed-refs cache is
accessed that it is still up-to-date WRT the packed-refs file, and if
not reread it (but leave the old copy in memory as long as its
refcount is nonzero).

Along the way, this patch series adds simple transactions around the
packed-refs file/cache.  The transaction interface is public.  I think
this is a step in a good direction, because other race conditions not
addressed by this patch series are likely to require transactions
across the whole reference namespace to be made 100% reliable.

As a stress test, the test suite can be run with a simulated
"hyperactive repository" in which the packed-refs file is made to look
like it changes every time it is checked (except when its lock is
held):

 refs.c 

@@ -1075,8 +1075,8 @@ static struct packed_ref_cache 
*get_packed_ref_cache(struct ref_cache *refs)
else
packed_refs_file = git_path("packed-refs");

-   if (refs->packed &&
-   !stat_validity_check(&refs->packed->validity, packed_refs_file))
+   if (refs->packed && !refs->packed->lock
+   /*!stat_validity_check(&refs->packed->validity, packed_refs_file)*/)
clear_packed_ref_cache(refs);

if (!refs->packed) {


It passes the stress test.

[1] http://thread.gmane.org/gmane.comp.version-control.git/223299/focus=223526

Jeff King (2):
  get_packed_ref_cache: reload packed-refs file when it changes
  for_each_ref: load all loose refs before packed refs

Michael Haggerty (10):
  repack_without_ref(): split list curation and entry writing
  pack_refs(): split creation of packed refs and entry writing
  refs: wrap the packed refs cache in a level of indirection
  refs: implement simple transactions for the packed-refs file
  refs: manage lifetime of packed refs cache via reference counting
  do_for_each_entry(): increment the packed refs cache refcount
  packed_ref_cache: increment refcount when locked
  Extract a struct stat_data from cache_entry
  add a stat_validity struct
  refs: do not invalidate the packed-refs cache unnecessarily

 builtin/clone.c|   7 +-
 builtin/ls-files.c |  12 ++-
 cache.h|  60 +--
 read-cache.c   | 181 +++
 refs.c | 308 -
 refs.h |  27 -
 6 files changed, 464 insertions(+), 131 deletions(-)

-- 
1.8.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html