Re: git fast-import crashing on big imports

2017-01-19 Thread Ulrich Spörlein
2017-01-18 22:51 GMT+01:00 Jeff King :
>
> On Wed, Jan 18, 2017 at 03:27:04PM -0500, Jeff King wrote:
>
> > On Wed, Jan 18, 2017 at 09:20:07PM +0100, Ulrich Spörlein wrote:
> >
> > > I found your commit via bisect in case you were wondering. Thanks for
> > > picking this up.
> >
> > Still downloading. However, just looking at the code, the obvious
> > culprit would be clear_delta_base_cache(), which is called from
> > literally nowhere except fast-import, and then only when checkpointing.

Sorry for the bad URL, pesky last minute changes ...

> Hmm. I haven't reproduced your exact issue, but I was able to produce
> some hijinks in that function.
>
> The problem is that the hashmap_iter interface is unreliable if entries
> are added or removed from the map during the iteration.
>
> I suspect the patch below may fix things for you. It works around it by
> walking over the lru list (either is fine, as they both contain all
> entries, and since we're clearing everything, we don't care about the
> order).

Confirmed. With the patch applied, I can import the whole 55G in one go
without any crashes or aborts. Thanks much!

>
> ---
>  sha1_file.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 1eb47f611..d20714d6b 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2342,11 +2342,10 @@ static inline void release_delta_base_cache(struct 
> delta_base_cache_entry *ent)
>
>  void clear_delta_base_cache(void)
>  {
> -   struct hashmap_iter iter;
> -   struct delta_base_cache_entry *entry;
> -   for (entry = hashmap_iter_first(_base_cache, );
> -entry;
> -entry = hashmap_iter_next()) {
> +   struct list_head *lru, *tmp;
> +   list_for_each_safe(lru, tmp, _base_cache_lru) {
> +   struct delta_base_cache_entry *entry =
> +   list_entry(lru, struct delta_base_cache_entry, lru);
> release_delta_base_cache(entry);
> }
>  }
> --
> 2.11.0.698.gd6b48ab4c
>
>
>
>
> >
> > -Peff


Re: git fast-import crashing on big imports

2017-01-18 Thread Jeff King
On Wed, Jan 18, 2017 at 03:27:04PM -0500, Jeff King wrote:

> On Wed, Jan 18, 2017 at 09:20:07PM +0100, Ulrich Spörlein wrote:
> 
> > I found your commit via bisect in case you were wondering. Thanks for
> > picking this up.
> 
> Still downloading. However, just looking at the code, the obvious
> culprit would be clear_delta_base_cache(), which is called from
> literally nowhere except fast-import, and then only when checkpointing.

Hmm. I haven't reproduced your exact issue, but I was able to produce
some hijinks in that function.

The problem is that the hashmap_iter interface is unreliable if entries
are added or removed from the map during the iteration.

I suspect the patch below may fix things for you. It works around it by
walking over the lru list (either is fine, as they both contain all
entries, and since we're clearing everything, we don't care about the
order).

---
 sha1_file.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 1eb47f611..d20714d6b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2342,11 +2342,10 @@ static inline void release_delta_base_cache(struct 
delta_base_cache_entry *ent)
 
 void clear_delta_base_cache(void)
 {
-   struct hashmap_iter iter;
-   struct delta_base_cache_entry *entry;
-   for (entry = hashmap_iter_first(_base_cache, );
-entry;
-entry = hashmap_iter_next()) {
+   struct list_head *lru, *tmp;
+   list_for_each_safe(lru, tmp, _base_cache_lru) {
+   struct delta_base_cache_entry *entry =
+   list_entry(lru, struct delta_base_cache_entry, lru);
release_delta_base_cache(entry);
}
 }
-- 
2.11.0.698.gd6b48ab4c




> 
> -Peff


Re: git fast-import crashing on big imports

2017-01-18 Thread Jeff King
On Wed, Jan 18, 2017 at 09:20:07PM +0100, Ulrich Spörlein wrote:

> I found your commit via bisect in case you were wondering. Thanks for
> picking this up.

Still downloading. However, just looking at the code, the obvious
culprit would be clear_delta_base_cache(), which is called from
literally nowhere except fast-import, and then only when checkpointing.

-Peff


Re: git fast-import crashing on big imports

2017-01-18 Thread Jeff King
On Wed, Jan 18, 2017 at 09:38:14AM -0500, Jeff King wrote:

> On Wed, Jan 18, 2017 at 03:01:17PM +0100, Ulrich Spoerlein wrote:
> 
> > Yo Jeff, your commit 8261e1f139db3f8aa6f9fd7d98c876cbeb0f927c from Aug
> > 22nd, that changes delta_base_cache to use hashmap.h is the culprit for
> > git fast-import crashing on large imports.
> 
> I actually saw your bug report the other day and tried to download the
> dump file, but got a 404. Can you double check that it is available?

The URL in your original mail was bogus:

> > I have a dump file that you can grab at
> > http://scan.spoerlein.net/pub/freebsd-base.dump.xz (19G, 55G uncompressed)

Via guess-and-check, I found:

  http://www.spoerlein.net/pub/freebsd-base.git.fi.xz

I'll see if I can reproduce the problem.

-Peff


Re: git fast-import crashing on big imports

2017-01-18 Thread Jeff King
On Wed, Jan 18, 2017 at 03:01:17PM +0100, Ulrich Spoerlein wrote:

> Yo Jeff, your commit 8261e1f139db3f8aa6f9fd7d98c876cbeb0f927c from Aug
> 22nd, that changes delta_base_cache to use hashmap.h is the culprit for
> git fast-import crashing on large imports.

I actually saw your bug report the other day and tried to download the
dump file, but got a 404. Can you double check that it is available?

-Peff


Re: git fast-import crashing on big imports

2017-01-18 Thread Ulrich Spoerlein
Yo Jeff, your commit 8261e1f139db3f8aa6f9fd7d98c876cbeb0f927c from Aug
22nd, that changes delta_base_cache to use hashmap.h is the culprit for
git fast-import crashing on large imports.

Please read below, you can find a 55G SVN dump that should show the
problem after a couple of minutes to less than an hour. Please also see
similar issues from 2009 and 2011. This seems to be a rather fragile
part of the code, could you add unit tests that make sure this
regression is not re-introduce again once you fix it? Thanks!

I'm happy to test any patches that you can provide.

Cheers,
Uli

On Do., 2017-01-12 at 09:21:38 +0100, Ulrich Spörlein wrote:
> Hey,
> 
> the FreeBSD svn2git conversion is crashing somewhat
> non-deterministically during its long conversion process. From memory,
> this was not as bad is it is with more recent versions of git (but I
> can't be sure, really).
> 
> I have a dump file that you can grab at
> http://scan.spoerlein.net/pub/freebsd-base.dump.xz (19G, 55G uncompressed)
> that shows this problem after a couple of minutes of runtime. The caveat is
> that for another member of the team on a different machine the crashes are on
> different revisions.
> 
> Googling around I found two previous threads that were discussing
> problems just like this (memory corruption, bad caching, etc)
> 
> https://www.spinics.net/lists/git/msg93598.html  from 2009
> and
> http://git.661346.n2.nabble.com/long-fast-import-errors-out-quot-failed-to-apply-delta-quot-td6557884.html
> from 2011
> 
> % git fast-import --stats < ../freebsd-base.dump
> ...
> progress SVN r49318 branch master = :49869
> progress SVN r49319 branch stable/3 = :49870
> progress SVN r49320 branch master = :49871
> error: failed to apply delta
> error: bad offset for revindex
> error: bad offset for revindex
> error: bad offset for revindex
> error: bad offset for revindex
> error: bad offset for revindex
> fatal: Can't load tree b35ae4e9c2a41677e84a3f14bed09f584c3ff25e
> fast-import: dumping crash report to fast_import_crash_29613
> 
> 
> fast-import crash report:
> fast-import process: 29613
> parent process : 29612
> at 2017-01-11 19:33:37 +
> 
> fatal: Can't load tree b35ae4e9c2a41677e84a3f14bed09f584c3ff25e
> 
> 
> git fsck shows a somewhat incomplete pack file (I guess that's expected if the
> process dies mid-stream?)
> 
> % git fsck
> Checking object directories: 100% (256/256), done.
> error: failed to apply delta6/614500)
> error: cannot unpack d1d7ee1f81c6767c5e0f75d14d400d7512a85a0f from 
> ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack at offset 
> 122654805
> error: failed to apply delta
> error: failed to read delta base object 
> d1d7ee1f81c6767c5e0f75d14d400d7512a85a0f at offset 122654805 from 
> ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack
> error: cannot unpack 8523bde63ef34bef725347994fdaec996d756510 from 
> ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack at offset 
> 122671596
> error: failed to apply delta0/614500)
> error: failed to read delta base object 
> d1d7ee1f81c6767c5e0f75d14d400d7512a85a0f at offset 122654805 from 
> ./objects/pack/pack-e28fcea43fc221d2ebe92857b484da58bb888237.pack
> ...
> 
> 
> Any comments on whether the original problems from 2009 and 2011 were ever
> fixed and committed?
> 
> Some more facts:
> - git version 2.11.0
> - I don't recall these sorts of crashes with a git from 2-3 years ago
> - adding more checkpoints does help, but not fix the problem, it merely shifts
>   the crashes around to different revisions
> - incremental runs of the conversion *will* complete most of the time, but
>   depending on how often checkpoints are used, I've seen it croak on specific
>   commits and not being able to progress further :(
> 
> Thanks for any pointers or things to try!
> Cheers
> Uli