Re: [PATCH] fast-import: catch deletion of non-existent file in input

2012-07-15 Thread Jeff King
On Sun, Jul 15, 2012 at 01:11:51PM -0500, Jonathan Nieder wrote:

> > Subject: fast-import: catch deletion of non-existent file in input
> [...]
> > We silently ignored the bogus "D foo" directive, and the
> > resulting tree incorrectly contained "bar". With this patch,
> > we notice the bogus input and die.
> 
> This breaks svn-fe, which relies on the existing semantics when asked
> to copy an empty directory.

Thanks for the report. I had a worry while writing this that somebody
was relying on the behavior. Let's just drop it, then. It's nice to
catch errors in exporters, but not at the expense of compatibility
issues.

We could introduce a new feature bit, but I'm not sure it is really
worthwhile. The older versions of bzr-fast-export would not set the bit
anyway, and newer versions are already fixed, so it is kind of closing
the barn door after the horse has left (we might catch other bugs, but
this one is kind of oddly specific; if somebody wanted to audit
fast-import for other similar cases and introduce a "strict" feature
bit, that might be worthwhile. But for this single change, I don't think
so).

> Let's repeat that for emphasis: API breaks in fast-import not guarded
> with a new "feature" type are not ok.

Totally agree. The question in my mind was whether this was a bug fix or
an API change, and it sounds like it is too far towards the latter.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fast-import: catch deletion of non-existent file in input

2012-07-15 Thread Jonathan Nieder
Hi,

Jeff King wrote:

> Subject: fast-import: catch deletion of non-existent file in input
[...]
> We silently ignored the bogus "D foo" directive, and the
> resulting tree incorrectly contained "bar". With this patch,
> we notice the bogus input and die.

This breaks svn-fe, which relies on the existing semantics when asked
to copy an empty directory.

That's my fault because we never check that in the testsuite, but I
also wouldn't be surprised if other importers were relying on the same
thing.

Any API break this big without a justification along the lines

We can be confident that no existing importer uses this
construct because ...

_needs_ to be guarded by a new "feature" to be safe for existing
importers.

Let's repeat that for emphasis: API breaks in fast-import not guarded
with a new "feature" type are not ok.

Sorry,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fast-import: catch deletion of non-existent file in input

2012-07-15 Thread Jeff King
On Fri, Jul 13, 2012 at 03:39:50PM +0200, Andreas Schwab wrote:

> >> The output contains these lines:
> >> 
> >> R a/b b/b
> >> D a/b
> >> 
> >> Changing the second line to D b/b fixes the bug.
> >
> > Yeah, I agree that is problematic. But I do not think it is a
> > fast-import bug, but rather bogus output generated by bzr fast-export (I
> > am not clear from what you wrote above if you are considering it a bug
> > that fast-import is confused). It seems nonsensical to mention a file
> > both as a rename source and as deleted in the same revision, and
> > certainly I would not expect an importer to deduce a link between the
> > second line and b/b.
> 
> IMHO fast-import should raise an error in this case, like it does when
> you switch the lines.

Here's a patch to do so. It means that fast-import is a little more
strict about deletions than it used to be. I think it's probably a good
idea for fast-import to err on the side of strictness (since erring on
the other side may mean corrupt input). I don't know if the omission of
error checking in the original was a philosophical choice or just
random. :)

-- >8 --
Subject: fast-import: catch deletion of non-existent file in input

Fast-import does not do a lot of input verification; in
particular, you can mark a path as deleted by a commit even
if that files does not exist in the parent at all.

On the one hand, it is nice for fast-import to be liberal in
what it accepts, as it means we are more forgiving of
exporter bugs or of people slicing and dicing the input. On
the other hand, this can mask bugs in exporters, leading to
a subtly wrong import result. In this case, bzr's
fast-export generated a sequence like:

  R foo bar
  D foo

when it meant to say:

  R foo bar
  D bar

We silently ignored the bogus "D foo" directive, and the
resulting tree incorrectly contained "bar". With this patch,
we notice the bogus input and die.

This patch adds a new test script specifically for checking
bogus inputs to fast-import. We also need to tweak t9300,
which appeared to be accidentally deleting a non-existent
path in an otherwise unrelated test.

Signed-off-by: Jeff King 
---
The tweak in t9300 looks like the "dst2" entry was just leftover cruft
from writing the test; there is no dst2 mentioned anywhere else. It
comes from David's 8dc6a37; pleae correct me if my assumptions isn't
correct.

 fast-import.c|  5 ++-
 t/t9300-fast-import.sh   |  1 -
 t/t9302-fast-import-bogus.sh | 77 
 3 files changed, 81 insertions(+), 2 deletions(-)
 create mode 100755 t/t9302-fast-import-bogus.sh

diff --git a/fast-import.c b/fast-import.c
index eed97c8..779adfc 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2367,6 +2367,7 @@ static void file_change_d(struct branch *b)
const char *p = command_buf.buf + 2;
static struct strbuf uq = STRBUF_INIT;
const char *endp;
+   struct tree_entry leaf;
 
strbuf_reset(&uq);
if (!unquote_c_style(&uq, p, &endp)) {
@@ -2374,7 +2375,9 @@ static void file_change_d(struct branch *b)
die("Garbage after path in: %s", command_buf.buf);
p = uq.buf;
}
-   tree_content_remove(&b->branch_tree, p, NULL);
+   tree_content_remove(&b->branch_tree, p, &leaf);
+   if (!leaf.versions[1].mode)
+   die("Path %s not in branch", p);
 }
 
 static void file_change_cr(struct branch *b, int rename)
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 2fcf269..7624ba5 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1203,7 +1203,6 @@ test_expect_success PIPE 'N: empty directory reads as 
missing' '
printf "%s\n" "$line" >response &&
cat <<-\EOF
D dst1
-   D dst2
EOF
) |
git fast-import --cat-blob-fd=3 3>backflow &&
diff --git a/t/t9302-fast-import-bogus.sh b/t/t9302-fast-import-bogus.sh
new file mode 100755
index 000..95c5196
--- /dev/null
+++ b/t/t9302-fast-import-bogus.sh
@@ -0,0 +1,77 @@
+#!/bin/sh
+
+test_description='test that fast-import handles bogus input correctly'
+. ./test-lib.sh
+
+# A few shorthands to make writing sample input easier
+counter=0
+mark() {
+   counter=$(( $counter + 1)) &&
+   echo "mark :$counter"
+}
+
+blob() {
+   echo blob &&
+   mark &&
+   cat <<-\EOF
+   data 4
+   foo
+
+   EOF
+}
+
+ident() {
+   test_tick &&
+   echo "author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE"
+   echo "committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 
$GIT_COMMITTER_DATE"
+}
+
+commit() {
+   echo "commit refs/heads/master" &&
+   mark &&
+   ident &&
+   cat <<-\EOF
+   data 8
+   message
+   EOF
+}
+
+root() {
+   blob &&
+   m=$counter &&
+   echo "reset refs/heads/master" &&
+   commit &&
+   echo "M 100644 :$m file" &&
+   echo
+}
+
+test_exp