Re: [BUG] resolved deltas

2014-08-28 Thread Jeff King
On Mon, Aug 25, 2014 at 06:39:45PM +0200, René Scharfe wrote:

 Thanks, that looks good.  But while preparing the patch I noticed that
 the added test sometimes fails.  Helgrind pointed outet a race
 condition.  It is not caused by the patch to turn the asserts into
 regular ifs, however -- here's a Helgrind report for the original code
 with the new test:

Interesting. I couldn't convince Helgrind to catch such a case, but it
makes sense. We split the delta-resolving work by dividing up the base
objects. We then find any deltas that need that base object (which is
read-only). If there's only one instances of the base, then we'll be the
only thread working on those delta. But if there are two such bases,
they're going to look at the same deltas.

So we need some kind of mutual exclusion so that only one thread
proceeds with resolving the delta. The real_type check sort-of
functions in that way (except of course it is not actually thread safe).
So one obvious option is just a coarse-grained global lock to modify or
check real_type fields. That would probably perform badly (lots of
useless lock contention on unrelated objects), but at least it would
work.

If we accept pushing a lock into _each_ object_entry, then we would get
a lot less lock contention (i.e., none at all in the common case of no
duplicates). The cost is storing one lock per object, though. Not great,
but probably OK.

Is there some way we can make real_type itself more atomic? I.e., use it
as the mutex with the rule that we do not claim the object_entry as
ours to work on until we atomically set delta_obj-real_type. I think
doing a compare-and-swap looking for OBJ_REF_DELTA and replacing it with
base-real_type would be enough there (and substitute OBJ_OFS_DELTA for
the second conditional, of course).

However, I'm not sure we can portably rely on having a compare-and-swap
primitive (or that we want to go through the hassle of conditionally
using it).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] resolved deltas

2014-08-28 Thread Jeff King
On Thu, Aug 28, 2014 at 06:08:21PM -0400, Jeff King wrote:

 So we need some kind of mutual exclusion so that only one thread
 proceeds with resolving the delta. The real_type check sort-of
 functions in that way (except of course it is not actually thread safe).

Here's a patch which implements that. Since I couldn't replicate the
original problem with helgrind, I am just guessing at whether it
properly fixes it (well, it is more than just a guess; I used my brain
to analyze it, but that is far from foolproof).

It uses a single lock. I did a best-of-five timing of git index-pack
--verify on the kernel repo, both before and after. The results ended
up quite similar (both about 57s), though the run-to-run numbers are all
over the place (up to about 65s in the worst case). So maybe it is not
so bad.

As I implemented, I realized that even with the mutex, I really was just
implementing compare_and_swap (and I wrote it that way to make it more
obvious). So if we wanted to, it would be trivial to replace the
claim_delta function with a true compare-and-swap instruction if the
compiler and processor support it.

---
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f7dc5b0..ed9e253 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -112,6 +112,10 @@ static pthread_mutex_t deepest_delta_mutex;
 #define deepest_delta_lock()   lock_mutex(deepest_delta_mutex)
 #define deepest_delta_unlock() unlock_mutex(deepest_delta_mutex)
 
+static pthread_mutex_t object_entry_mutex;
+#define object_entry_lock()lock_mutex(object_entry_mutex)
+#define object_entry_unlock()  unlock_mutex(object_entry_mutex)
+
 static pthread_key_t key;
 
 static inline void lock_mutex(pthread_mutex_t *mutex)
@@ -135,6 +139,7 @@ static void init_thread(void)
init_recursive_mutex(read_mutex);
pthread_mutex_init(counter_mutex, NULL);
pthread_mutex_init(work_mutex, NULL);
+   pthread_mutex_init(object_entry_mutex, NULL);
if (show_stat)
pthread_mutex_init(deepest_delta_mutex, NULL);
pthread_key_create(key, NULL);
@@ -157,6 +162,7 @@ static void cleanup_thread(void)
pthread_mutex_destroy(read_mutex);
pthread_mutex_destroy(counter_mutex);
pthread_mutex_destroy(work_mutex);
+   pthread_mutex_destroy(object_entry_mutex);
if (show_stat)
pthread_mutex_destroy(deepest_delta_mutex);
for (i = 0; i  nr_threads; i++)
@@ -862,7 +868,6 @@ static void resolve_delta(struct object_entry *delta_obj,
 {
void *base_data, *delta_data;
 
-   delta_obj-real_type = base-obj-real_type;
if (show_stat) {
delta_obj-delta_depth = base-obj-delta_depth + 1;
deepest_delta_lock();
@@ -888,6 +893,21 @@ static void resolve_delta(struct object_entry *delta_obj,
counter_unlock();
 }
 
+static int claim_delta(struct object_entry *delta_obj,
+  enum object_type delta_type,
+  enum object_type base_type)
+{
+   enum object_type old_type;
+
+   object_entry_lock();
+   old_type = delta_obj-real_type;
+   if (old_type == delta_type)
+   delta_obj-real_type = base_type;
+   object_entry_unlock();
+
+   return old_type == delta_type;
+}
+
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
  struct base_data *prev_base)
 {
@@ -914,7 +934,7 @@ static struct base_data *find_unresolved_deltas_1(struct 
base_data *base,
if (base-ref_first = base-ref_last) {
struct object_entry *child = objects + 
deltas[base-ref_first].obj_no;
 
-   if (child-real_type == OBJ_REF_DELTA) {
+   if (claim_delta(child, OBJ_REF_DELTA, base-obj-real_type)) {
struct base_data *result = alloc_base_data();
 
resolve_delta(child, base, result);
@@ -930,7 +950,7 @@ static struct base_data *find_unresolved_deltas_1(struct 
base_data *base,
if (base-ofs_first = base-ofs_last) {
struct object_entry *child = objects + 
deltas[base-ofs_first].obj_no;
 
-   if (child-real_type == OBJ_OFS_DELTA) {
+   if (claim_delta(child, OBJ_OFS_DELTA, base-obj-real_type)) {
struct base_data *result = alloc_base_data();
 
resolve_delta(child, base, result);
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] resolved deltas

2014-08-28 Thread Jeff King
On Thu, Aug 28, 2014 at 06:08:21PM -0400, Jeff King wrote:

 Interesting. I couldn't convince Helgrind to catch such a case...

Ugh. It helps if you actually helgrind the git binary, and not the
shell-script from bin-wrappers. I can easily replicate the problem now.
The patch I just posted seems to fix it (at least it has been running in
a loop for over a minute with no failures, whereas before it took only a
few seconds to provoke a failure).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] resolved deltas

2014-08-28 Thread Jeff King
On Thu, Aug 28, 2014 at 06:15:18PM -0400, Jeff King wrote:

 As I implemented, I realized that even with the mutex, I really was just
 implementing compare_and_swap (and I wrote it that way to make it more
 obvious). So if we wanted to, it would be trivial to replace the
 claim_delta function with a true compare-and-swap instruction if the
 compiler and processor support it.

That's something like this on top:

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ed9e253..1782a46 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -897,6 +897,10 @@ static int claim_delta(struct object_entry *delta_obj,
   enum object_type delta_type,
   enum object_type base_type)
 {
+#ifdef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
+   return __sync_bool_compare_and_swap(delta_obj-real_type, delta_type,
+   base_type);
+#else
enum object_type old_type;
 
object_entry_lock();
@@ -906,6 +910,7 @@ static int claim_delta(struct object_entry *delta_obj,
object_entry_unlock();
 
return old_type == delta_type;
+#endif
 }
 
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,


Though I guess gcc's __sync stuff is old, and the new C11 thing is
stdatomic.h. I guess we could support either if we can find the right
preprocessor macros (I am not even sure if the above is correct; the 4
is for the size of the data type, but this is an enum, so we don't even
know what the right size is...).

However, the above doesn't seem to be any faster for me, so it may not
be worth the hassle.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] resolved deltas

2014-08-28 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Thu, Aug 28, 2014 at 06:08:21PM -0400, Jeff King wrote:

 Interesting. I couldn't convince Helgrind to catch such a case...

 Ugh. It helps if you actually helgrind the git binary, and not the
 shell-script from bin-wrappers. I can easily replicate the problem now.
 The patch I just posted seems to fix it (at least it has been running in
 a loop for over a minute with no failures, whereas before it took only a
 few seconds to provoke a failure).

Thanks for digging.  I'll pick it up and may comment on it in
tomorrow's cycle (it is a bit too late for today's cycle,
unfortunately).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] resolved deltas

2014-08-25 Thread René Scharfe
Am 23.08.2014 um 13:18 schrieb Jeff King:
 On Sat, Aug 23, 2014 at 07:04:59AM -0400, Jeff King wrote:
 
 On Sat, Aug 23, 2014 at 06:56:40AM -0400, Jeff King wrote:

 So I think your patch is doing the right thing.

 By the way, if you want to add a test to your patch, there is
 infrastructure in t5308 to create packs with duplicate objects. If I
 understand the problem correctly, you could trigger this by having a
 delta object whose base is duplicated, even without the missing object.
 
 This actually turned out to be really easy. The test below fails without
 your patch and passes with it. Please feel free to squash it in.
 
 diff --git a/t/t5308-pack-detect-duplicates.sh 
 b/t/t5308-pack-detect-duplicates.sh
 index 9c5a876..50f7a69 100755
 --- a/t/t5308-pack-detect-duplicates.sh
 +++ b/t/t5308-pack-detect-duplicates.sh
 @@ -77,4 +77,19 @@ test_expect_success 'index-pack can reject packs with 
 duplicates' '
   test_expect_code 1 git cat-file -e $LO_SHA1
   '
   
 +test_expect_success 'duplicated delta base does not trigger assert' '
 + clear_packs 
 + {
 + A=01d7713666f4de822776c7622c10f1b07de280dc 
 + B=e68fe8129b546b101aee9510c5328e7f21ca1d18 
 + pack_header 3 
 + pack_obj $A $B 
 + pack_obj $B 
 + pack_obj $B
 + } dups.pack 
 + pack_trailer dups.pack 
 + git index-pack --stdin dups.pack 
 + test_must_fail git index-pack --stdin --strict dups.pack
 +'
 +
   test_done

Thanks, that looks good.  But while preparing the patch I noticed that
the added test sometimes fails.  Helgrind pointed outet a race
condition.  It is not caused by the patch to turn the asserts into
regular ifs, however -- here's a Helgrind report for the original code
with the new test:

==34949== Helgrind, a thread error detector
==34949== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==34949== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==34949== Command: /home/lsr/src/git/t/../bin-wrappers/git index-pack --stdin
==34949==
==34949== Helgrind, a thread error detector
==34949== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==34949== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==34949== Command: /home/lsr/src/git/git index-pack --stdin
==34949==
==34949== ---Thread-Announcement--
==34949==
==34949== Thread #3 was created
==34949==at 0x594DF7E: clone (clone.S:74)
==34949==by 0x544A2B9: do_clone.constprop.3 (createthread.c:75)
==34949==by 0x544B762: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==34949==by 0x4C2D55D: pthread_create_WRK (hg_intercepts.c:269)
==34949==by 0x43ABB8: cmd_index_pack (index-pack.c:1097)
==34949==by 0x405B6A: handle_builtin (git.c:351)
==34949==by 0x404CE8: main (git.c:575)
==34949==
==34949== ---Thread-Announcement--
==34949==
==34949== Thread #2 was created
==34949==at 0x594DF7E: clone (clone.S:74)
==34949==by 0x544A2B9: do_clone.constprop.3 (createthread.c:75)
==34949==by 0x544B762: pthread_create@@GLIBC_2.2.5 (createthread.c:245)
==34949==by 0x4C2D55D: pthread_create_WRK (hg_intercepts.c:269)
==34949==by 0x43ABB8: cmd_index_pack (index-pack.c:1097)
==34949==by 0x405B6A: handle_builtin (git.c:351)
==34949==by 0x404CE8: main (git.c:575)
==34949==
==34949== 
==34949==
==34949== Possible data race during read of size 4 at 0x5E15910 by thread #3
==34949== Locks held: none
==34949==at 0x439327: find_unresolved_deltas (index-pack.c:918)
==34949==by 0x439666: threaded_second_pass (index-pack.c:1002)
==34949==by 0x4C2D6F6: mythread_wrapper (hg_intercepts.c:233)
==34949==by 0x544B0A3: start_thread (pthread_create.c:309)
==34949==
==34949== This conflicts with a previous write of size 4 by thread #2
==34949== Locks held: none
==34949==at 0x4390E2: resolve_delta (index-pack.c:865)
==34949==by 0x439340: find_unresolved_deltas (index-pack.c:919)
==34949==by 0x439666: threaded_second_pass (index-pack.c:1002)
==34949==by 0x4C2D6F6: mythread_wrapper (hg_intercepts.c:233)
==34949==by 0x544B0A3: start_thread (pthread_create.c:309)
==34949==
==34949== Address 0x5E15910 is 48 bytes inside a block of size 256 alloc'd
==34949==at 0x4C2A7D0: calloc (vg_replace_malloc.c:618)
==34949==by 0x50CA83: xcalloc (wrapper.c:119)
==34949==by 0x439AF6: cmd_index_pack (index-pack.c:1643)
==34949==by 0x405B6A: handle_builtin (git.c:351)
==34949==by 0x404CE8: main (git.c:575)
==34949==
git: builtin/index-pack.c:918: find_unresolved_deltas_1: Assertion 
`child-real_type == OBJ_REF_DELTA' failed.
==34949==
==34949== For counts of detected and suppressed errors, rerun with: -v
==34949== Use --history-level=approx or =none to gain increased speed, at
==34949== the cost of reduced accuracy of conflicting-access 

Re: [BUG] resolved deltas

2014-08-25 Thread Shawn Pearce
On Sat, Aug 23, 2014 at 3:56 AM, Jeff King p...@peff.net wrote:
 [+cc spearce, as I think this is a bug in code.google.com's sending
  side, and he can probably get the attention of the right folks]
...
 My guess is a bug on the sending side. We have seen duplicate pack
 objects before, but never (AFAIK) combined with a missing object. This
 really seems like the sender just sent the wrong data for one object.

 IIRC, code.google.com is backed by their custom Dulwich implementation
 which runs on BigTable. We'll see if Shawn can get this to the right
 people as a bug report. :)

Thanks. This is a bug in the code.google.com implementation that is
running on Bigtable. I forwarded the report to the team that manages
that service so they can investigate further.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] resolved deltas

2014-08-23 Thread René Scharfe
Am 22.08.2014 um 21:41 schrieb Martin von Gagern:
 On 21.08.2014 13:35, Petr Stodulka wrote:
 Hi guys,
 I wanted post you patch here for this bug, but I can't find primary
 source of this problem [0], because I don't understand some ideas in the
 code.

 […]

 Any next ideas/hints or explanation of these functions? I began study
 source code and mechanisms of the git this week, so don't beat me yet
 please :-)

 Regards,
 Petr

 [0] https://bugzilla.redhat.com/show_bug.cgi?id=1099919
 
 Some pointers to related reports and investigations:
 
 https://groups.google.com/forum/#!topic/mapsforge-dev/IF6mgmwvZmY
 https://groups.google.com/forum/#!topic/mapsforge-dev/f2KvFALlkvo
 https://code.google.com/p/support/issues/detail?id=31571
 https://groups.google.com/forum/#!topic/mapsforge-dev/nomzr5dkkqc
 http://thread.gmane.org/gmane.comp.version-control.git/254626
 
 The last is my own bug report to this mailing list here, which
 unfortunately received no reaction yet. In that report, I can confirm
 that the commit introducing the assertion is the same commit which
 causes things to fail:
 https://github.com/git/git/commit/7218a215efc7ae46f7ca8d82442f354e
 
 In this https://code.google.com/p/mapsforge/ repository, resolve_delta
 gets called twice for some delta. The first time, type and real_type are
 identical, but by the second pass in find_unresolved_deltas the real
 type will have chaned to OBJ_TREE. This caused the old code to simply
 scip the object, but the new code aborts instead.
 
 So far my understanding. I'm not sure whether this kind of duplicate
 resolution is something normal or indicates some breakage in the
 repository in question. But aborting seems a bad solution in either case.

Git 1.7.6 clones the repo without reporting an error or warning, but a
check shows that a tree object is missing:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into mapsforge...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.48 MiB/s, done.
  Resolving deltas: 100% (4349/4349), done.
  $ cd mapsforge/
  $ git fsck
  broken link fromtree eb95fb0268c43f512e2ce512e6625072acafbdb5
totree a0155d8d5bb58afb9a5d20459be023899c9a1cef
  missing tree a0155d8d5bb58afb9a5d20459be023899c9a1cef
  dangling tree b6f32087526f43205bf8b5e6539936da787ecb1a

Git 2.1.0 hits an assertion:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into 'mapsforge'...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.53 MiB/s, done.
  git: builtin/index-pack.c:918: find_unresolved_deltas_1: Assertion 
`child-real_type == OBJ_REF_DELTA' failed.
  error: index-pack died of signal 6
  fatal: index-pack failed

The patch below, which makes index-pack ignore objects with unexpected
real_type as before, changes the shown error message, but clone still
fails:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into 'mapsforge'...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.43 MiB/s, done.
  Resolving deltas: 100% (4348/4348), done.
  fatal: did not receive expected object 
a0155d8d5bb58afb9a5d20459be023899c9a1cef
  fatal: index-pack failed

Turning that fatal error into a warning get us a bit further:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into 'mapsforge'...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.38 MiB/s, done.
  Resolving deltas: 100% (4350/4350), done.
  warning: did not receive expected object 
a0155d8d5bb58afb9a5d20459be023899c9a1cef
  fatal: The same object bc386be34bd4781f71bb68d72a6e0aee3124201e appears twice 
in the pack
  fatal: index-pack failed

Disabling strict checking (WRITE_IDX_STRICT) as well lets clone
succeed, but a check shows that a tree is missing, as wiht git 1.7.6:

  $ git clone https://code.google.com/p/mapsforge/
  Cloning into 'mapsforge'...
  remote: Counting objects: 12879, done.
  Receiving objects: 100% (12879/12879), 10.19 MiB | 2.37 MiB/s, done.
  Resolving deltas: 100% (4349/4349), done.
  warning: did not receive expected object 
a0155d8d5bb58afb9a5d20459be023899c9a1cef
  Checking connectivity... done.
  $ cd mapsforge/
  $ git fsck
  Checking object directories: 100% (256/256), done.
  Checking objects: 100% (12879/12879), done.
  broken link fromtree eb95fb0268c43f512e2ce512e6625072acafbdb5
totree a0155d8d5bb58afb9a5d20459be023899c9a1cef
  missing tree a0155d8d5bb58afb9a5d20459be023899c9a1cef

Cloning the repo with Egit works fine, but git fsck shows the same
results as above.

https://github.com/applantation/mapsforge has that missing tree
object, by the way.

OK, what now?  It's better to show an error message instead of
failing an assertion if a repo is found to be corrupt because the
issue is caused by external input.  I don't know if the patch
below may have any bad side effects, though, so no sign-off at
this 

Re: [BUG] resolved deltas

2014-08-23 Thread Jeff King
[+cc spearce, as I think this is a bug in code.google.com's sending
 side, and he can probably get the attention of the right folks]

On Sat, Aug 23, 2014 at 12:12:08PM +0200, René Scharfe wrote:

 Git 1.7.6 clones the repo without reporting an error or warning, but a
 check shows that a tree object is missing:

Thanks for digging on this. I happened to be looking at it at the exact
same time, so now I can stop. :)

 The patch below, which makes index-pack ignore objects with unexpected
 real_type as before, changes the shown error message, but clone still
 fails:
 
   $ git clone https://code.google.com/p/mapsforge/
   Cloning into 'mapsforge'...
   remote: Counting objects: 12879, done.
   Receiving objects: 100% (12879/12879), 10.19 MiB | 2.43 MiB/s, done.
   Resolving deltas: 100% (4348/4348), done.
   fatal: did not receive expected object 
 a0155d8d5bb58afb9a5d20459be023899c9a1cef
   fatal: index-pack failed

This makes sense. Early versions of index-pack didn't know how to check
for missing objects, but now it does. However, we only trigger that code
if we are using --strict, or if we are cloning (in which case we pass
the --check-self-contained-and-connected option).

If you are doing a regular fetch, we rely on check_everything_connected
after the pack is received. So (with your patch):

  $ git init
  $ git fetch https://code.google.com/p/mapsforge/
  remote: Counting objects: 12298, done.
  Receiving objects: 100% (12298/12298), 9.24 MiB | 959.00 KiB/s, done.
  Resolving deltas: 100% (4107/4107), done.
  error: Could not read a0155d8d5bb58afb9a5d20459be023899c9a1cef
  fatal: bad tree object a0155d8d5bb58afb9a5d20459be023899c9a1cef
  error: https://code.google.com/p/mapsforge/ did not send all necessary objects

That all makes sense.

 Turning that fatal error into a warning get us a bit further:
 
   $ git clone https://code.google.com/p/mapsforge/
   Cloning into 'mapsforge'...
   remote: Counting objects: 12879, done.
   Receiving objects: 100% (12879/12879), 10.19 MiB | 2.38 MiB/s, done.
   Resolving deltas: 100% (4350/4350), done.
   warning: did not receive expected object 
 a0155d8d5bb58afb9a5d20459be023899c9a1cef
   fatal: The same object bc386be34bd4781f71bb68d72a6e0aee3124201e appears 
 twice in the pack
   fatal: index-pack failed
 
 Disabling strict checking (WRITE_IDX_STRICT) as well lets clone
 succeed, but a check shows that a tree is missing, as wiht git 1.7.6:

Interesting that one object is duplicated and another object is missing.
The pack doesn't contain the sha1s of the objects; we compute them on
the fly in index-pack. So it's likely that the real problem is that one
entry in the pack either has the wrong delta data, or mentions the wrong
base object. And does it in such a way that we generate the another
object that does exist (so the packfile data isn't garbled or corrupted;
it's a bug on the sending side that uses the wrong data).

 https://github.com/applantation/mapsforge has that missing tree
 object, by the way.

Unsurprisingly, it's a tree object quite similar to the duplicated one.

 OK, what now?  It's better to show an error message instead of
 failing an assertion if a repo is found to be corrupt because the
 issue is caused by external input.  I don't know if the patch
 below may have any bad side effects, though, so no sign-off at
 this time.

Definitely. Your patch looks like an improvement. The objects in the
delta list must have had their real_types set to REF_DELTA and OFS_DELTA
at some point (since that is how they got on the list). And there is no
way for the type field to change from a delta type to anything else
_except_ by us resolving the delta. So I think the condition triggering
that assert cannot mean anything except that we have a duplicate object
(and it is not actually the delta object that is duplicated, but rather
its base, as seeing it again is what causes us to try to resolve twice).

So we know at this point that we have a duplicate object, and we could
warn or say something. But I think we probably would not want to. If
--strict is in effect, then we will notice and complain later. And if it
is not, then we should allow it without comment (in this case the repo
is broken, but it would not _have_ to be).

So I think your patch is doing the right thing.

 Allowing git to clone a broken repo sounds useful, up to point.
 In this particular case older versions had no problem doing that,
 so it seems worth supporting.

I think with your patch we are fine. Without --strict (which is implied
on a clone), you can still git init and git fetch the broken pack
(fetch will complain, but it leaves the pack in place).

We may want to adjust the fact that --check-self-contained-and-connected
implies strict (it builds on the fsck bits of index-pack, but it does
not have to imply a full fsck, nor strict index writing).

 And how did the tree object went missing in the first place?

My guess is a bug on the sending side. We have seen duplicate pack
objects 

Re: [BUG] resolved deltas

2014-08-23 Thread Jeff King
On Sat, Aug 23, 2014 at 06:56:40AM -0400, Jeff King wrote:

 So I think your patch is doing the right thing.

By the way, if you want to add a test to your patch, there is
infrastructure in t5308 to create packs with duplicate objects. If I
understand the problem correctly, you could trigger this by having a
delta object whose base is duplicated, even without the missing object.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] resolved deltas

2014-08-23 Thread Jeff King
On Sat, Aug 23, 2014 at 07:04:59AM -0400, Jeff King wrote:

 On Sat, Aug 23, 2014 at 06:56:40AM -0400, Jeff King wrote:
 
  So I think your patch is doing the right thing.
 
 By the way, if you want to add a test to your patch, there is
 infrastructure in t5308 to create packs with duplicate objects. If I
 understand the problem correctly, you could trigger this by having a
 delta object whose base is duplicated, even without the missing object.

This actually turned out to be really easy. The test below fails without
your patch and passes with it. Please feel free to squash it in.

diff --git a/t/t5308-pack-detect-duplicates.sh 
b/t/t5308-pack-detect-duplicates.sh
index 9c5a876..50f7a69 100755
--- a/t/t5308-pack-detect-duplicates.sh
+++ b/t/t5308-pack-detect-duplicates.sh
@@ -77,4 +77,19 @@ test_expect_success 'index-pack can reject packs with 
duplicates' '
test_expect_code 1 git cat-file -e $LO_SHA1
 '
 
+test_expect_success 'duplicated delta base does not trigger assert' '
+   clear_packs 
+   {
+   A=01d7713666f4de822776c7622c10f1b07de280dc 
+   B=e68fe8129b546b101aee9510c5328e7f21ca1d18 
+   pack_header 3 
+   pack_obj $A $B 
+   pack_obj $B 
+   pack_obj $B
+   } dups.pack 
+   pack_trailer dups.pack 
+   git index-pack --stdin dups.pack 
+   test_must_fail git index-pack --stdin --strict dups.pack
+'
+
 test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] resolved deltas

2014-08-22 Thread Martin von Gagern
On 21.08.2014 13:35, Petr Stodulka wrote:
 Hi guys,
 I wanted post you patch here for this bug, but I can't find primary
 source of this problem [0], because I don't understand some ideas in the
 code.

 […]
 
 Any next ideas/hints or explanation of these functions? I began study
 source code and mechanisms of the git this week, so don't beat me yet
 please :-)
 
 Regards,
 Petr
 
 [0] https://bugzilla.redhat.com/show_bug.cgi?id=1099919

Some pointers to related reports and investigations:

https://groups.google.com/forum/#!topic/mapsforge-dev/IF6mgmwvZmY
https://groups.google.com/forum/#!topic/mapsforge-dev/f2KvFALlkvo
https://code.google.com/p/support/issues/detail?id=31571
https://groups.google.com/forum/#!topic/mapsforge-dev/nomzr5dkkqc
http://thread.gmane.org/gmane.comp.version-control.git/254626

The last is my own bug report to this mailing list here, which
unfortunately received no reaction yet. In that report, I can confirm
that the commit introducing the assertion is the same commit which
causes things to fail:
https://github.com/git/git/commit/7218a215efc7ae46f7ca8d82442f354e

In this https://code.google.com/p/mapsforge/ repository, resolve_delta
gets called twice for some delta. The first time, type and real_type are
identical, but by the second pass in find_unresolved_deltas the real
type will have chaned to OBJ_TREE. This caused the old code to simply
scip the object, but the new code aborts instead.

So far my understanding. I'm not sure whether this kind of duplicate
resolution is something normal or indicates some breakage in the
repository in question. But aborting seems a bad solution in either case.

Greetings,
 Martin von Gagern


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] resolved deltas

2014-08-21 Thread Petr Stodulka

Hi guys,
I wanted post you patch here for this bug, but I can't find primary 
source of this problem [0], because I don't understand some ideas in the 
code. So what I investigate:


Bug is reprodusible since git version 1.8.3.1 (may earlier 1.8.xx, but I 
don't test it) to actual upstream version.
This problem doesn't exists in version 1.7.xx - or more precisely is 
not reproducible. May this is reproducible
since commit 7218a215 - in this commit was added assert in file 
builtin/index-pack.c (actual line is 918), but I didn't test this.


This assert tests if object's (child) real type == OBJ_REF_DELTA. This 
will failure for object with real_type == OBJ_TREE (set as parent's 
type) and type == OBJ_REF_DELTA. Here some prints of important variables 
before failure assert() (from older version but I think that values are 
still actual in this case):

--
(gdb) p base-ref_first
$9 = 3223

(gdb) p deltas[3223]
$10 = {
  base = {
sha1 = \274\070k\343K\324x\037q\273h\327*n\n\356\061$ \036,
offset = 2267795834784135356
  },
  obj_no = 11152
}

(gdb) p *child
$11 = {
  idx = {
sha1 = J\242i\251\261\273\305\067\236%CE\022\257\252\342[;\tD,
crc32 = 2811659028,
offset = 10392153
  },
  size = 30,
  hdr_size = 22,
  type = OBJ_REF_DELTA,
  real_type = OBJ_TREE,
  delta_depth = 0,
  base_object_no = 5458
}

(gdb) p objects[5458]
$13 = {
  idx = {
sha1 = \274\070k\343K\324x\037q\273h\327*n\n\356\061$ \036,
crc32 = 3724458534,
offset = 6879168
  },
  size = 236,
  hdr_size = 2,
  type = OBJ_TREE,
  real_type = OBJ_TREE,
  delta_depth = 0,
  base_object_no = 0
}
---

base_object_no is static 5458. base-ref_first child's object are 
dynamic. If you want stop process in same position my recommendation for 
gdb (if you use gdb) when you will be in file index-pack.c:

br 1093
cont
set variable nr_threads = 1
br 
cond 2 i == 6300
cont
br 916
cont
---
compilated without any optimisation, line numbers modified for commit 
6c4ab27f2378ce67940b4496365043119d72
Condition i == 6300 --- last iteration before failure has dynamic rank 
in range 6304 to 6309 in the most cases (that's weird for me, when count 
of downloaded objects is statically 12806, may wrong search of children?)

---

Here I am lost. I don't know really what I can do next here, because I 
don't understand some ideas in code. e.g. searching of child - functions 
find_delta(), find_delta_children(). Calculation on line 618:


int next = (first+last) / 2;

I still don't understand. I didn't find description of this searching 
algorithm in tech. documentation but I didn't read all yet. However I 
think that source of problems could be somewhere in these two functions. 
When child is found, its real_type is set to parent's type in function 
resolve_delta() on the line 865 and then lasts wait for failure. I don't 
think that problem is in repository itself [1], but it is possible.


Any next ideas/hints or explanation of these functions? I began study 
source code and mechanisms of the git this week, so don't beat me yet 
please :-)


Regards,
Petr

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1099919
[1] git clone https://code.google.com/p/mapsforge/ mapsforge.git
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] resolved deltas

2014-08-21 Thread Petr Stodulka



snip
Bug is reprodusible since git version 1.8.3.1 (may earlier 1.8.xx, but 
I don't test it) to actual upstream version.
This problem doesn't exists in version 1.7.xx - or more precisely is 
not reproducible. May this is reproducible
since commit 7218a215 - in this commit was added assert in file 
builtin/index-pack.c (actual line is 918), but I didn't test this.

Ok so this is reproducible since this commit because of assert().

Here I am lost. I don't know really what I can do next here, because I 
don't understand some ideas in code. e.g. searching of child - 
functions find_delta(), find_delta_children(). Calculation on line 618:


int next = (first+last) / 2;

I still don't understand. I didn't find description of this searching 
algorithm in tech. documentation but I didn't read all yet. However I 
think that source of problems could be somewhere in these two 
functions. When child is found, its real_type is set to parent's type 
in function resolve_delta() on the line 865 and then lasts wait for 
failure. I don't think that problem is in repository itself [1], but 
it is possible.
I read history of commits and my idea seems to be incorrect. It seems 
more like some error in repository itself. But I'd rather get opinion 
from someone who knows this code and ideas better.


Regards,
Petr


[0] https://bugzilla.redhat.com/show_bug.cgi?id=1099919
[1] git clone https://code.google.com/p/mapsforge/ mapsforge.git


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html