Re: Verifiable git archives?

2014-01-25 Thread Andy Lutomirski
Here's a rather hackish implementation of the write side.  Any
thoughts on the format?  (Obviously the implementation needs work.
For example, it needs to be optional.

Thoughts so far:
 - I want to put the value of prefix into an extended header.
 - Should blobs have their sha1 hashes in an extended header?  Pros:
it makes figuring out substitutions easier.  Cons: it adds 512 bytes
per file.
 - I want to support tags as roots.
 - I (or someone) need to write a verifier / verified unpacker.  Does
git accept Python code?

This thing is tested in the sense that GNU tar unpacks its output
without any warnings or other fanfare.

--Andy
diff --git a/archive-tar.c b/archive-tar.c
index 719b629..c6bf7e4 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -2,6 +2,8 @@
  * Copyright (c) 2005, 2006 Rene Scharfe
  */
 #include cache.h
+#include tree.h
+#include object.h
 #include tar.h
 #include archive.h
 #include streaming.h
@@ -200,6 +202,74 @@ static int write_extended_header(struct archiver_args *args,
 	return 0;
 }
 
+/*
+ * A GIT-SCM object header is a global extended header that embeds a single
+ * git object.  This object serves a purpose described by the purpose
+ * field.  Valid purposes include:
+ *
+ *  - root -- an object that, by itself, in conjunction with other roots,
+ *or in conjunction with external data, identifies a root to use to
+ *verify this archive.
+ *  - vrfy -- an object that can be use to prove that the contents
+ *of this archive are as described.
+ *
+ * There's one basic rule to observe: every vrfy object must hash to
+ * a SHA-1 that matches something described in a root, another vrfy object,
+ * or something typed in by a user decoding the archive.
+ *
+ * (Of course, if you want the archive to be usefully verifiable, all of the
+ *  non-GIT-SCM contents should also be attributable to an appropriate
+ *  vrfy object.)
+ *
+ * The fields are:
+ *  GIT-SCM.obj.purpose: the purpose of the embedded object
+ *  GIT-SCM.obj.sha1: the sha1 of the embedded object
+ *  GIT-SCM.obj.type: the type of the embedded object
+ *  GIT-SCM.obj.data: the data in the embedded object
+ *
+ * The block header is intentionally unspecified, except that it must
+ * have typeflag 'g'.  (This is to allow some flexibility in trying to
+ * preserve compatibility with old tar implementations.)
+ */
+static int write_gitscm_obj_header(struct archiver_args *args,
+   const char *purpose,
+   const unsigned char *sha1)
+{
+	struct strbuf ext_header = STRBUF_INIT;
+	struct ustar_header header;
+	unsigned int mode;
+	enum object_type type;
+	unsigned long size;
+	void *buffer;
+	const char *typestr;
+	int err = 0;
+
+	strbuf_append_ext_header(ext_header, GIT-SCM.obj.purpose,
+ purpose, strlen(purpose));
+	strbuf_append_ext_header(ext_header, GIT-SCM.obj.sha1,
+ sha1_to_hex(sha1), 40);
+
+	buffer = read_sha1_file(sha1, type, size);
+	typestr = typename(type);
+
+	strbuf_append_ext_header(ext_header, GIT-SCM.obj.type,
+ typestr, strlen(typestr));
+	strbuf_append_ext_header(ext_header, GIT-SCM.obj.data,
+ buffer, size);
+	free(buffer);
+	buffer = NULL;
+
+	memset(header, 0, sizeof(header));
+	*header.typeflag = TYPEFLAG_GLOBAL_HEADER;
+	mode = 0100666;
+	strcpy(header.name, pax_global_header);
+	prepare_header(args, header, mode, ext_header.len);
+	write_blocked(header, sizeof(header));
+	write_blocked(ext_header.buf, ext_header.len);
+	strbuf_release(ext_header);
+	return err;
+}
+
 static int write_tar_entry(struct archiver_args *args,
 			   const unsigned char *sha1,
 			   const char *path, size_t pathlen,
@@ -212,6 +282,10 @@ static int write_tar_entry(struct archiver_args *args,
 	void *buffer;
 	int err = 0;
 
+	if (S_ISDIR(mode)) {
+		write_gitscm_obj_header(args, vrfy, sha1);
+	}
+
 	memset(header, 0, sizeof(header));
 
 	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
@@ -384,8 +458,11 @@ static int write_tar_archive(const struct archiver *ar,
 
 	if (args-commit_sha1)
 		err = write_global_extended_header(args);
-	if (!err)
+	if (!err) {
+		write_gitscm_obj_header(args, root, args-commit_sha1);
+		write_gitscm_obj_header(args, vrfy, args-tree-object.sha1);
 		err = write_archive_entries(args, write_tar_entry);
+	}
 	if (!err)
 		write_trailer();
 	return err;


Re: Verifiable git archives?

2014-01-21 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 On 01/09/2014 09:11 PM, Junio C Hamano wrote:
 Andy Lutomirski l...@amacapital.net writes:
 
 It's possible, in principle, to shove enough metadata into the output
 of 'git archive' to allow anyone to verify (without cloning the repo)
 to verify that the archive is a correct copy of a given commit.  Would
 this be considered a useful feature?

 Presumably there would be a 'git untar' command that would report
 failure if it fails to verify the archive contents.

 This could be as simple as including copies of the commit object and
 all relevant tree objects and checking all of the hashes when
 untarring.
 
 You only need the object name of the top-level tree.  After untar
 the archive into an empty directory, make it a new repository and
 git add .  git write-tree---the result should match the
 top-level tree the archive was supposed to contain.
 [...]

 This wouldn't work if any files were excluded from the archive using
 gitattribute export-ignore (or export-subst, which you already
 mentioned in a follow-up email).

Correct.  By and such below, I meant any and all futzing that
makes the resulting working tree different from the tree object
being archived ;-)  That includes the line-ending configuration
and other things as well.

Also, if you used keyword substitution and such when creating an
archive, then the filesystem entities resulting from expanding
it would not match the original.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Verifiable git archives?

2014-01-18 Thread Michael Haggerty
On 01/09/2014 09:11 PM, Junio C Hamano wrote:
 Andy Lutomirski l...@amacapital.net writes:
 
 It's possible, in principle, to shove enough metadata into the output
 of 'git archive' to allow anyone to verify (without cloning the repo)
 to verify that the archive is a correct copy of a given commit.  Would
 this be considered a useful feature?

 Presumably there would be a 'git untar' command that would report
 failure if it fails to verify the archive contents.

 This could be as simple as including copies of the commit object and
 all relevant tree objects and checking all of the hashes when
 untarring.
 
 You only need the object name of the top-level tree.  After untar
 the archive into an empty directory, make it a new repository and
 git add .  git write-tree---the result should match the
 top-level tree the archive was supposed to contain.
 [...]

This wouldn't work if any files were excluded from the archive using
gitattribute export-ignore (or export-subst, which you already
mentioned in a follow-up email).

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Verifiable git archives?

2014-01-09 Thread Stefan Beller
On 09.01.2014 04:10, Andy Lutomirski wrote:
 It's possible, in principle, to shove enough metadata into the output
 of 'git archive' to allow anyone to verify (without cloning the repo)
 to verify that the archive is a correct copy of a given commit.  Would
 this be considered a useful feature?
 

Do you know git bundles?


 Presumably there would be a 'git untar' command that would report
 failure if it fails to verify the archive contents.
 
 This could be as simple as including copies of the commit object and
 all relevant tree objects and checking all of the hashes when
 untarring.
 

I thought the git archive rather had the purpose of creating plain
archives not polluted with any gitish stuff.

 (Even better: allow subsets of the repository to be archived and
 verified as well.)

Stefan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Verifiable git archives?

2014-01-09 Thread Junio C Hamano
Andy Lutomirski l...@amacapital.net writes:

 It's possible, in principle, to shove enough metadata into the output
 of 'git archive' to allow anyone to verify (without cloning the repo)
 to verify that the archive is a correct copy of a given commit.  Would
 this be considered a useful feature?

 Presumably there would be a 'git untar' command that would report
 failure if it fails to verify the archive contents.

 This could be as simple as including copies of the commit object and
 all relevant tree objects and checking all of the hashes when
 untarring.

You only need the object name of the top-level tree.  After untar
the archive into an empty directory, make it a new repository and
git add .  git write-tree---the result should match the
top-level tree the archive was supposed to contain.

Of course, you can write git verify-archive that does the same
computation all in-core, without actually extracting the archive
into an empty directory.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Verifiable git archives?

2014-01-09 Thread Andy Lutomirski
On Thu, Jan 9, 2014 at 12:11 PM, Junio C Hamano gits...@pobox.com wrote:
 Andy Lutomirski l...@amacapital.net writes:

 It's possible, in principle, to shove enough metadata into the output
 of 'git archive' to allow anyone to verify (without cloning the repo)
 to verify that the archive is a correct copy of a given commit.  Would
 this be considered a useful feature?

 Presumably there would be a 'git untar' command that would report
 failure if it fails to verify the archive contents.

 This could be as simple as including copies of the commit object and
 all relevant tree objects and checking all of the hashes when
 untarring.

 You only need the object name of the top-level tree.  After untar
 the archive into an empty directory, make it a new repository and
 git add .  git write-tree---the result should match the
 top-level tree the archive was supposed to contain.

Hmm.  I didn't realize that there was enough metadata in the 'git
archive' output to reproduce the final tree.  If I can make it work,
would you accept a patch to add another extended pax header containing
the commit object and the top-level tree hash to the 'git archive'
tarball output?


 Of course, you can write git verify-archive that does the same
 computation all in-core, without actually extracting the archive
 into an empty directory.

Hmm.  I'll play with this.

--Andy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Verifiable git archives?

2014-01-09 Thread Junio C Hamano
Andy Lutomirski l...@amacapital.net writes:

 You only need the object name of the top-level tree.  After untar
 the archive into an empty directory, make it a new repository and
 git add .  git write-tree---the result should match the
 top-level tree the archive was supposed to contain.

 Hmm.  I didn't realize that there was enough metadata in the 'git
 archive' output to reproduce the final tree.

We do record the commit object name in the extended header when
writing a tar archive already, but you have to grab the commit
object from somewhere in order to read the top-level tree object
name, which we do not record.

Also, if you used keyword substitution and such when creating an
archive, then the filesystem entities resulting from expanding it
would not match the original.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Verifiable git archives?

2014-01-09 Thread Andy Lutomirski
On Thu, Jan 9, 2014 at 2:46 PM, Junio C Hamano gits...@pobox.com wrote:
 Andy Lutomirski l...@amacapital.net writes:

 You only need the object name of the top-level tree.  After untar
 the archive into an empty directory, make it a new repository and
 git add .  git write-tree---the result should match the
 top-level tree the archive was supposed to contain.

 Hmm.  I didn't realize that there was enough metadata in the 'git
 archive' output to reproduce the final tree.

 We do record the commit object name in the extended header when
 writing a tar archive already, but you have to grab the commit
 object from somewhere in order to read the top-level tree object
 name, which we do not record.

This could be changed :)


 Also, if you used keyword substitution and such when creating an
 archive, then the filesystem entities resulting from expanding it
 would not match the original.


In the simple case, you'd need to have an archive with no prefix or
funny business (or at least a known prefix).  In the fancy case, you
could at least verify that all the file contents really came from git,
but then you'd really need the tree objects.

The use case I have in mind is for projects to distribute archives but
only need to sign the tagged git commit id.  I think this should be
doable without too much pain.  (This assumes that the release doesn't
contain autogen output and such.)

--Andy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html