Re: [PATCH v5] Verify index file before we opportunistically update it

2014-04-10 Thread Duy Nguyen
On Thu, Apr 10, 2014 at 12:34 PM, Yiannis Marangos
yiannis.maran...@gmail.com wrote:
 +/*
 + * This function verifies if index_state has the correct sha1 of an index 
 file.
 + * Don't die if we have any other failure, just return 0.
 + */
 +static int verify_index_from(const struct index_state *istate, const char 
 *path)
 +{
 +   int fd;
 +   struct stat st;
 +   struct cache_header *hdr;
 +   void *mmap_addr;
 +   size_t mmap_size;
 +
 +   if (!istate-initialized)
 +   return 0;
 +
 +   fd = open(path, O_RDONLY);
 +   if (fd  0)
 +   return 0;
 +
 +   if (fstat(fd, st))
 +   return 0;
 +
 +   /* file is too big */
 +   if (st.st_size  (size_t)st.st_size)
 +   return 0;
 +
 +   mmap_size = (size_t)st.st_size;
 +   if (mmap_size  sizeof(struct cache_header) + 20)
 +   return 0;
 +
 +   mmap_addr = mmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 +   close(fd);
 +   if (mmap_addr == MAP_FAILED)
 +   return 0;
 +
 +   hdr = mmap_addr;
 +   if (verify_hdr(hdr, mmap_size)  0)
 +   goto unmap;

verify_hdr() is a bit expensive because you need to digest the whole
index file (could big as big as 14MB on webkit). Could we get away
without it? I mean, is it enough that we pick the last 20 bytes and
compare it with istate-sha1? If we only need 20 bytes, pread() may be
better than mmap().

The chance of SHA-1 collision is small enough for us to ignore, I
think. And if a client updates the index without updating the trailing
sha-1, the index is broken and we don't have to worry about
overwriting it.

 +
 +   if (hashcmp(istate-sha1, (unsigned char *)hdr + mmap_size - 20))
 +   goto unmap;
 +
 +   munmap(mmap_addr, mmap_size);
 +   return 1;
 +
 +unmap:
 +   munmap(mmap_addr, mmap_size);
 +   return 0;
 +}
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] Verify index file before we opportunistically update it

2014-04-10 Thread Yiannis Marangos
On Thu, Apr 10, 2014 at 05:40:55PM +0700, Duy Nguyen wrote:
 verify_hdr() is a bit expensive because you need to digest the whole
 index file (could big as big as 14MB on webkit). Could we get away
 without it? I mean, is it enough that we pick the last 20 bytes and
 compare it with istate-sha1? If we only need 20 bytes, pread() may be
 better than mmap().
 
 The chance of SHA-1 collision is small enough for us to ignore, I
 think. And if a client updates the index without updating the trailing
 sha-1, the index is broken and we don't have to worry about
 overwriting it.

That's true, I will commit version 6 in a bit.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] Verify index file before we opportunistically update it

2014-04-10 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 verify_hdr() is a bit expensive because you need to digest the whole
 index file (could big as big as 14MB on webkit). Could we get away
 without it? I mean, is it enough that we pick the last 20 bytes and
 compare it with istate-sha1? If we only need 20 bytes, pread() may be
 better than mmap().

True.

I do not think we offer xread() equivalent for pread() to help
coders to avoid the common mistake of failing to resume interrupted
system calls, though.

The only callsite of pread() we currently have is in index-pack.c,
which just does

n = pread(pack_fd, inbuf, n, from);
if (n  0)
die_errno(_(cannot pread pack file));

without paying attention to EAGAIN/EINTR, that seems wrong and may
want to be fixed, before we introduce another call site of pread().

Thanks.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5] Verify index file before we opportunistically update it

2014-04-09 Thread Yiannis Marangos
Before we proceed to opportunistic update we must verify that the
current index file is the same as the one that we read before. There
is a possible race if we don't do this. In the example below git-status
does opportunistic update and git-rebase updates the index, but the
race can happen in general.

  1. process A calls git-rebase (or does anything that uses the index)

  2. process A applies 1st commit

  3. process B calls git-status (or does anything that updates the index)

  4. process B reads index

  5. process A applies 2nd commit

  6. process B takes the lock, then overwrites process A's changes.

  7. process A applies 3rd commit

As an end result the 3rd commit will have a revert of the 2nd commit.
When process B takes the lock, it needs to make sure that the index
hasn't changed since step 4.

Signed-off-by: Yiannis Marangos yiannis.maran...@gmail.com
---

Version 4 contains fixes based on Junio's comments.
Version 5 fixs a typo in commit message (git-show - git-status)

 cache.h  |  1 +
 read-cache.c | 58 +-
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 107ac61..0460f06 100644
--- a/cache.h
+++ b/cache.h
@@ -279,6 +279,7 @@ struct index_state {
 initialized : 1;
struct hashmap name_hash;
struct hashmap dir_hash;
+   unsigned char sha1[20];
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index ba13353..73a15b1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1477,6 +1477,7 @@ int read_index_from(struct index_state *istate, const 
char *path)
if (verify_hdr(hdr, mmap_size)  0)
goto unmap;
 
+   hashcpy(istate-sha1, (unsigned char *)hdr + mmap_size - 20);
istate-version = ntohl(hdr-hdr_version);
istate-cache_nr = ntohl(hdr-hdr_entries);
istate-cache_alloc = alloc_nr(istate-cache_nr);
@@ -1760,6 +1761,61 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct 
cache_entry *ce,
return result;
 }
 
+/*
+ * This function verifies if index_state has the correct sha1 of an index file.
+ * Don't die if we have any other failure, just return 0.
+ */
+static int verify_index_from(const struct index_state *istate, const char 
*path)
+{
+   int fd;
+   struct stat st;
+   struct cache_header *hdr;
+   void *mmap_addr;
+   size_t mmap_size;
+
+   if (!istate-initialized)
+   return 0;
+
+   fd = open(path, O_RDONLY);
+   if (fd  0)
+   return 0;
+
+   if (fstat(fd, st))
+   return 0;
+
+   /* file is too big */
+   if (st.st_size  (size_t)st.st_size)
+   return 0;
+
+   mmap_size = (size_t)st.st_size;
+   if (mmap_size  sizeof(struct cache_header) + 20)
+   return 0;
+
+   mmap_addr = mmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
+   close(fd);
+   if (mmap_addr == MAP_FAILED)
+   return 0;
+
+   hdr = mmap_addr;
+   if (verify_hdr(hdr, mmap_size)  0)
+   goto unmap;
+
+   if (hashcmp(istate-sha1, (unsigned char *)hdr + mmap_size - 20))
+   goto unmap;
+
+   munmap(mmap_addr, mmap_size);
+   return 1;
+
+unmap:
+   munmap(mmap_addr, mmap_size);
+   return 0;
+}
+
+static int verify_index(const struct index_state *istate)
+{
+   return verify_index_from(istate, get_index_file());
+}
+
 static int has_racy_timestamp(struct index_state *istate)
 {
int entries = istate-cache_nr;
@@ -1779,7 +1835,7 @@ static int has_racy_timestamp(struct index_state *istate)
 void update_index_if_able(struct index_state *istate, struct lock_file 
*lockfile)
 {
if ((istate-cache_changed || has_racy_timestamp(istate)) 
-   !write_index(istate, lockfile-fd))
+   verify_index(istate)  !write_index(istate, lockfile-fd))
commit_locked_index(lockfile);
else
rollback_lock_file(lockfile);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html