Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
Duy Nguyen writes: > On Sat, Apr 12, 2014 at 11:19 AM, Junio C Hamano wrote: > >> In the spectrum between useful and insane, there is a point where we >> should just tell the insane: don't do it then. > > ... A process' dying is a > way of telling people "this is insane" without having to draw a line > between dos and dont's in documents. Fair enough. Instead of "that is insane and you may end up recording what you do not intend to", being able to say "that is insane and you will see some commands failing when you may end up being in such a situation" is a lot more explicit. Thanks for injecting some sanity to me. > Serializing write access to make both competing processes is nice, but > that's a separate step that may or may not be worth pursuing. And I'm > towards not worth pursuing. I think we are in agreement on that point. I would want to see the proposed "read without locking, compute heavily and then take a lock and revalidate before writing it out, or fail" done with a focus on not hurting the "sane" use pattern, i.e. making the "revalidate" part as light as possible. I however cannot seem to find any optimization opportunities other than the "open, read and compare the checksum". Two I thought and discarded are: - checking timestamp of the index file itself, and failing when it has changed (without opening or reading the checksum) is not good; it is optimizing for the wrong case, because we would need to check the checksum anyway when the timestamps match. - checking i-num of the index file itself, and failing when it has changed (without opening or reading the checksum) is not good, either. Creating "index.lock", writing to it and renaming it to "index", keeping the "index" during the whole period, would ensure that the index file that results with this single cycle would get a different i-num from the original, but if that is done twice or more, the same i-num can be reused and defeat the check. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
On Sat, Apr 12, 2014 at 11:19 AM, Junio C Hamano wrote: > Duy Nguyen writes: > >> On Sat, Apr 12, 2014 at 3:43 AM, Junio C Hamano wrote: >>> Having said that, nobody sane would be running two simultaneous >>> operations that are clearly write-oriented competing with each other >>> against the same index file. >> >> When it comes to racing, sanity does not matter much. People could >> just do it without thinking what exactly is happening behind the >> scene. > > Well, "a race is a race is a race" would also be my knee-jerk > reaction when anybody utters the word "race", but you need to > realize that we are not talking about races like object creation > while "gc --auto" running in the background or two pushes trying to > update the same ref to different values, which are meaningful use > cases. > > What is the race under discussion about? It is about the index, > which corresponds one-to-one to the working tree, so in order for > the "race" to matter, you need to be racing against another process > that is not cooperating with you (e.g. a continuous and uncontrolled > "git add" updating the index while you are doing a real work), > mucking with the index in the same working tree. How could such a > workflow any useful in the real life? > > In the spectrum between useful and insane, there is a point where we > should just tell the insane: don't do it then. The thing is people do not usually have that level of knowledge of how git works. They could write a cron job to do something in some repos, not aware at all about these non-cooperations. Telling people not to automatically touch a git directory at all is a bit extreme. I think this patch is about guarding the user from shooting themselves. Either a command works correctly, not not work at all. A process' dying is a way of telling people "this is insane" without having to draw a line between dos and dont's in documents. Serializing write access to make both competing processes is nice, but that's a separate step that may or may not be worth pursuing. And I'm towards not worth pursuing. As you metioned in the next mail, serializing helps two competing processes, but not two command sequences. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
Junio C Hamano writes: > What is the race under discussion about? It is about the index, > which corresponds one-to-one to the working tree, so in order for > the "race" to matter, you need to be racing against another process > that is not cooperating with you (e.g. a continuous and uncontrolled > "git add" updating the index while you are doing a real work), > mucking with the index in the same working tree. How could such a > workflow any useful in the real life? > > In the spectrum between useful and insane, there is a point where we > should just tell the insane: don't do it then. > > Having said that... > >>> So in that sense that can be done as a less urgent follow-up for this topic. >> >> Yeah if racing at refresh time is a real problem, sure we should solve it >> first. > > ... In order to solve the write-vs-write competition in a useful > way, you must serialize the competing accesses. E.g. "git add" > would first take a write lock on the index before read_cache(), and > then do its operation and finally release the lock by the usual > write-to-lock-close-then-rename dance. Extending on this a bit. Here I didn't mean the traditional ".lock" we create via the lockfile() API. Rather, we would want to use fcntl/flock style lock that lets others _wait_, not _fail_. Because... > The lazy "read and refresh in-core first, hoping that we did not > compete with anybody, and then check just before writing out after > taking the lock" is a very good solution for the opportunistic > update codepath, because it is an option not to write the result out > when there actually was an update by somebody else. But such an > opportunistic locking scheme does not work for write-vs-write > competition. Upon a real conflict, you need to fail the entire > operation. ...having multiple conflicting writers on the single index file is what you thought about worth protecting against. When somebody else is pounding on the index file you are trying to prepare your next commmit in, with his writing that can unexpectedly overwrite what you prepared, you would at least want the accesses serialized, without getting half of your attempt to "git add" fail (and having to redo). For that, you would want your "git add" to wait while the other party is mucking with the index under lock, instead of failing, which is what you would get from the traditional lockfile API based locking. But that still takes us back to the "don't do it then". It is true that, with serialization, you may be able to guarantee that one "git add" you do would start from one state (which may record the state of your previous "git add", or which may have further been modified by the other process) and ends with whatever that "git add" added, without any other change. But even in that case, when you finally do a "git commit", can you say you know what is in the index? I do not think so. After all, the root cause of the "race" issue is that the other process pounds at the same index while you are working on it, without any coordination with you. So... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
Duy Nguyen writes: > On Sat, Apr 12, 2014 at 3:43 AM, Junio C Hamano wrote: >> Having said that, nobody sane would be running two simultaneous >> operations that are clearly write-oriented competing with each other >> against the same index file. > > When it comes to racing, sanity does not matter much. People could > just do it without thinking what exactly is happening behind the > scene. Well, "a race is a race is a race" would also be my knee-jerk reaction when anybody utters the word "race", but you need to realize that we are not talking about races like object creation while "gc --auto" running in the background or two pushes trying to update the same ref to different values, which are meaningful use cases. What is the race under discussion about? It is about the index, which corresponds one-to-one to the working tree, so in order for the "race" to matter, you need to be racing against another process that is not cooperating with you (e.g. a continuous and uncontrolled "git add" updating the index while you are doing a real work), mucking with the index in the same working tree. How could such a workflow any useful in the real life? In the spectrum between useful and insane, there is a point where we should just tell the insane: don't do it then. Having said that... >> So in that sense that can be done as a less urgent follow-up for this topic. > > Yeah if racing at refresh time is a real problem, sure we should solve it > first. ... In order to solve the write-vs-write competition in a useful way, you must serialize the competing accesses. E.g. "git add" would first take a write lock on the index before read_cache(), and then do its operation and finally release the lock by the usual write-to-lock-close-then-rename dance. The lazy "read and refresh in-core first, hoping that we did not compete with anybody, and then check just before writing out after taking the lock" is a very good solution for the opportunistic update codepath, because it is an option not to write the result out when there actually was an update by somebody else. But such an opportunistic locking scheme does not work for write-vs-write competition. Upon a real conflict, you need to fail the entire operation. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
On Sat, Apr 12, 2014 at 3:43 AM, Junio C Hamano wrote: > Having said that, nobody sane would be running two simultaneous > operations that are clearly write-oriented competing with each other > against the same index file. When it comes to racing, sanity does not matter much. People could just do it without thinking what exactly is happening behind the scene. > So in that sense that can be done as a less urgent follow-up for this topic. Yeah if racing at refresh time is a real problem, sure we should solve it first. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
On Fri, Apr 11, 2014 at 01:43:47PM -0700, Junio C Hamano wrote: > Having said that, nobody sane would be running two simultaneous > operations that are clearly write-oriented competing with each other > against the same index file. So in that sense that can be done as a > less urgent follow-up for this topic. I'm willing to take this task, I decide to spend some of my free time for git development. But since I'm a newbie with it's architecture I don't know when it will be ready because I want to explore the code a bit. -- Yiannis -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
Junio C Hamano writes: > Yeah, I was hoping that the real write codepath (as opposed to "this > is read only and we read the index without holding a lock---now we > noticed that the index needs refreshing, and we know how the > resulting refreshed index should look like, perhaps we can write it > to save cycles for other processes" codepath where we cannot and > should not take a lock early) would take the lock and then read, but > because that is not the way they work, the need the same protection, > I would think. Having said that, nobody sane would be running two simultaneous operations that are clearly write-oriented competing with each other against the same index file. So in that sense that can be done as a less urgent follow-up for this topic. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
Duy Nguyen writes: > On Fri, Apr 11, 2014 at 2:28 AM, Junio C Hamano wrote: >> Yiannis Marangos writes: >> >>> + n = xpread(fd, sha1, 20, st.st_size - 20); >>> + if (n != 20) >>> + goto out; >> >> I think it is possible for pread(2) to give you a short-read. >> >> The existing callers of emulated mmap and index-pack are prepared to >> handle a short-read correctly, but I do not think this code does. >> >> I'll queue this instead in the meantime. > > There are two things to sort out (sorry I can't spend much time on it > right now): should the same sha1 test be done in write_index(), in > addition to update_index_if_able(). And what do we do about > istate->sha1[] in discard_index(), keep it or clear it. Yeah, I was hoping that the real write codepath (as opposed to "this is read only and we read the index without holding a lock---now we noticed that the index needs refreshing, and we know how the resulting refreshed index should look like, perhaps we can write it to save cycles for other processes" codepath where we cannot and should not take a lock early) would take the lock and then read, but because that is not the way they work, the need the same protection, I would think. As discard_index() is not "we are removing all the entries because the user told us to 'rm -r .'" but is "for the purpose of our internal processing we do not need the old contents of the index", my knee-jerk reaction is that we should not clear it. Any operation that makes the resulting index empty (i.e. no path being tracked) should still write an empty index (with the usual header and the most importantly the trailing file checksum), but that goes without saying. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
On Fri, Apr 11, 2014 at 09:47:09AM +0200, Torsten Bögershausen wrote: > 6. process A applies a commit: > - read the index into memory > - take the lock > - update the index file on disc > - release the lock So here we can have race condition. Maybe we should implement Duy's idea of verifying the sha1 in write_index()? > The new code works like this: > > 8. process B applies a commit: > - take the lock > - verifies tha the index file on disc has the same sha as the one read > before > # And if not: What do we do? die() or retry() ? > - update the index file on disc > - release the lock IMO, we must never die() if we want to do opportunistic update and we should never retry because it's a bit expensive if we do opportunistic update. -- Yiannis -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
On Fri, Apr 11, 2014 at 2:28 AM, Junio C Hamano wrote: > +static int verify_index_from(const struct index_state *istate, const char > *path) > +{ > + int fd; > + ssize_t n; > + struct stat st; > + unsigned char sha1[20]; > + > + if (!istate->initialized) > + return 0; > + > + fd = open(path, O_RDONLY); > + if (fd < 0) > + return 0; Suppose another process "git rm --cached :/" is racing with us and this imaginary git is so smart that it figures if nothing valueable is left in the index, there's no need to write the index down at all. So it removes $GIT_DIR/index, then $GIT_DIR/index.lock. When we come here, we see ENOENT, but we should return 1 instead because the file removal in this case is a form of change. That opens a question about writing a new index. I think we could use all-zero sha-1 as the indicator that this is a fresh new index. If istate->sha1[] is all-zero and no index file exists, then we do not need to verify (i.e. return 0). However if istate->sha1[] is all-zero, but $GIT_DIR/index exists, then return 1. I'm still not sure if elsewhere in the code base we read $GIT_DIR/index to active_index, create a new index_state, copy entries over (but not active_index.sha1[]), then write the _new_ index_state down. That would hit the "however" statement above and incorrectly return 1. I suppose that all other errors except ENOENT could be safely ignored (i.e. return 0 regardless of istate->sha1[]). > + > + if (fstat(fd, &st)) > + goto out; > + > + if (st.st_size < sizeof(struct cache_header) + 20) > + goto out; > + > + n = pread_in_full(fd, sha1, 20, st.st_size - 20); > + if (n != 20) > + goto out; > + > + if (hashcmp(istate->sha1, sha1)) > + goto out; > + > + close(fd); > + return 1; > + > +out: > + close(fd); > + return 0; > +} > + > +static int verify_index(const struct index_state *istate) > +{ > + return verify_index_from(istate, get_index_file()); > +} > + > static int has_racy_timestamp(struct index_state *istate) > { > int entries = istate->cache_nr; > @@ -1766,7 +1811,7 @@ static int has_racy_timestamp(struct index_state > *istate) > void update_index_if_able(struct index_state *istate, struct lock_file > *lockfile) > { > if ((istate->cache_changed || has_racy_timestamp(istate)) && > - !write_index(istate, lockfile->fd)) > + verify_index(istate) && !write_index(istate, lockfile->fd)) > commit_locked_index(lockfile); > else > rollback_lock_file(lockfile); > diff --git a/wrapper.c b/wrapper.c > index 5b3c7fc..bc1bfb8 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -232,6 +232,26 @@ ssize_t write_in_full(int fd, const void *buf, size_t > count) > return total; > } > > +ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset) > +{ > + char *p = buf; > + ssize_t total = 0; > + > + while (count > 0) { > + ssize_t loaded = xpread(fd, p, count, offset); > + if (loaded < 0) > + return -1; > + if (loaded == 0) > + return total; > + count -= loaded; > + p += loaded; > + total += loaded; > + offset += loaded; > + } > + > + return total; > +} > + > int xdup(int fd) > { > int ret = dup(fd); > -- > 1.9.2-590-g468068b > > > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
On 2014-04-10 21.28, Junio C Hamano wrote: > Yiannis Marangos writes: > >> +n = xpread(fd, sha1, 20, st.st_size - 20); >> +if (n != 20) >> +goto out; > > I think it is possible for pread(2) to give you a short-read. > > The existing callers of emulated mmap and index-pack are prepared to > handle a short-read correctly, but I do not think this code does. > > I'll queue this instead in the meantime. > > -- >8 -- > From: Yiannis Marangos > Date: Thu, 10 Apr 2014 21:31:21 +0300 > Subject: [PATCH] read-cache.c: verify index file before we opportunistically > update it > > Before we proceed to opportunistically update the index (often done > by an otherwise read-only operation like "git status" and "git diff" > that internally refreshes the index), we must verify that the > current index file is the same as the one that we read earlier > before we took the lock on it, in order to avoid a possible race. > > In the example below git-status does "opportunistic update" and > git-rebase updates the index, but the race can happen in general. I'm not sure if we need the second or third commit of process A at all. My understanding is that the simplified version will have problems as well: 1. process A calls git-rebase (or does anything that updates the index) 2. process change 3. process B calls git-status (or does anything that updates the index) 4. process B reads the index file into memory 5. process change 6. process A applies a commit: - read the index into memory - take the lock - update the index file on disc - release the lock 7. process change 8. process B applies a commit: - take the lock - update the index in memory and write the index file to disc - release the lock Now process B has overwritten the commit from process A, which is wrong. The new code works like this: 8. process B applies a commit: - take the lock - verifies tha the index file on disc has the same sha as the one read before # And if not: What do we do? die() or retry() ? - update the index file on disc - release the lock [] > > +static int verify_index_from(const struct index_state *istate, const char > *path) > +{ > + int fd; > + ssize_t n; > + struct stat st; > + unsigned char sha1[20]; > + > + if (!istate->initialized) > + return 0; > + > + fd = open(path, O_RDONLY); > + if (fd < 0) > + return 0; > + > + if (fstat(fd, &st)) > + goto out; > + > + if (st.st_size < sizeof(struct cache_header) + 20) > + goto out; > + > + n = pread_in_full(fd, sha1, 20, st.st_size - 20); Minor : What is the advantage of pread() against a lseek()/read_in_full() combo? fd is opened and used only in one thread. Introducing pread()/ pread_in_full() could be done in an other commit, or do I miss something ? > + if (n != 20) > + goto out; > +static int verify_index(const struct index_state *istate) > +{ > + return verify_index_from(istate, get_index_file()); > +} > + Minor: Do we really need the wrapper function verify_index_from()? It seems as if the whole code from verify_index_from() could go into verify_index(), which will call get_index_file() > static int has_racy_timestamp(struct index_state *istate) > { > int entries = istate->cache_nr; > @@ -1766,7 +1811,7 @@ static int has_racy_timestamp(struct index_state > *istate) > void update_index_if_able(struct index_state *istate, struct lock_file > *lockfile) > { > if ((istate->cache_changed || has_racy_timestamp(istate)) && > - !write_index(istate, lockfile->fd)) > + verify_index(istate) && !write_index(istate, lockfile->fd)) > commit_locked_index(lockfile); > else > rollback_lock_file(lockfile); > diff --git a/wrapper.c b/wrapper.c > index 5b3c7fc..bc1bfb8 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -232,6 +232,26 @@ ssize_t write_in_full(int fd, const void *buf, size_t > count) > return total; > } > > +ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset) > +{ > + char *p = buf; > + ssize_t total = 0; > + > + while (count > 0) { > + ssize_t loaded = xpread(fd, p, count, offset); > + if (loaded < 0) > + return -1; > + if (loaded == 0) > + return total; > + count -= loaded; > + p += loaded; > + total += loaded; > + offset += loaded; > + } > + > + return total; > +} > + > int xdup(int fd) > { > int ret = dup(fd); > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
On Fri, Apr 11, 2014 at 2:28 AM, Junio C Hamano wrote: > Yiannis Marangos writes: > >> + n = xpread(fd, sha1, 20, st.st_size - 20); >> + if (n != 20) >> + goto out; > > I think it is possible for pread(2) to give you a short-read. > > The existing callers of emulated mmap and index-pack are prepared to > handle a short-read correctly, but I do not think this code does. > > I'll queue this instead in the meantime. There are two things to sort out (sorry I can't spend much time on it right now): should the same sha1 test be done in write_index(), in addition to update_index_if_able(). And what do we do about istate->sha1[] in discard_index(), keep it or clear it. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 2/2] Verify index file before we opportunistically update it
Yiannis Marangos writes: > + n = xpread(fd, sha1, 20, st.st_size - 20); > + if (n != 20) > + goto out; I think it is possible for pread(2) to give you a short-read. The existing callers of emulated mmap and index-pack are prepared to handle a short-read correctly, but I do not think this code does. I'll queue this instead in the meantime. -- >8 -- From: Yiannis Marangos Date: Thu, 10 Apr 2014 21:31:21 +0300 Subject: [PATCH] read-cache.c: verify index file before we opportunistically update it Before we proceed to opportunistically update the index (often done by an otherwise read-only operation like "git status" and "git diff" that internally refreshes the index), we must verify that the current index file is the same as the one that we read earlier before we took the lock on it, in order to avoid a possible race. In the example below git-status does "opportunistic update" and git-rebase updates the index, but the race can happen in general. 1. process A calls git-rebase (or does anything that uses the index) 2. process A applies 1st commit 3. process B calls git-status (or does anything that updates the index) 4. process B reads index 5. process A applies 2nd commit 6. process B takes the lock, then overwrites process A's changes. 7. process A applies 3rd commit As an end result the 3rd commit will have a revert of the 2nd commit. When process B takes the lock, it needs to make sure that the index hasn't changed since step 4. Signed-off-by: Yiannis Marangos Signed-off-by: Junio C Hamano --- cache.h | 3 +++ read-cache.c | 47 ++- wrapper.c| 20 3 files changed, 69 insertions(+), 1 deletion(-) diff --git a/cache.h b/cache.h index ce377e1..9244c38 100644 --- a/cache.h +++ b/cache.h @@ -279,6 +279,7 @@ struct index_state { initialized : 1; struct hash_table name_hash; struct hash_table dir_hash; + unsigned char sha1[20]; }; extern struct index_state the_index; @@ -1199,6 +1200,8 @@ extern void fsync_or_die(int fd, const char *); extern ssize_t read_in_full(int fd, void *buf, size_t count); extern ssize_t write_in_full(int fd, const void *buf, size_t count); +extern ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset); + static inline ssize_t write_str_in_full(int fd, const char *str) { return write_in_full(fd, str, strlen(str)); diff --git a/read-cache.c b/read-cache.c index 33dd676..f4a0d61 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1464,6 +1464,7 @@ int read_index_from(struct index_state *istate, const char *path) if (verify_hdr(hdr, mmap_size) < 0) goto unmap; + hashcpy(istate->sha1, (unsigned char *)hdr + mmap_size - 20); istate->version = ntohl(hdr->hdr_version); istate->cache_nr = ntohl(hdr->hdr_entries); istate->cache_alloc = alloc_nr(istate->cache_nr); @@ -1747,6 +1748,50 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce, return result; } +/* + * This function verifies if index_state has the correct sha1 of the + * index file. Don't die if we have any other failure, just return 0. + */ +static int verify_index_from(const struct index_state *istate, const char *path) +{ + int fd; + ssize_t n; + struct stat st; + unsigned char sha1[20]; + + if (!istate->initialized) + return 0; + + fd = open(path, O_RDONLY); + if (fd < 0) + return 0; + + if (fstat(fd, &st)) + goto out; + + if (st.st_size < sizeof(struct cache_header) + 20) + goto out; + + n = pread_in_full(fd, sha1, 20, st.st_size - 20); + if (n != 20) + goto out; + + if (hashcmp(istate->sha1, sha1)) + goto out; + + close(fd); + return 1; + +out: + close(fd); + return 0; +} + +static int verify_index(const struct index_state *istate) +{ + return verify_index_from(istate, get_index_file()); +} + static int has_racy_timestamp(struct index_state *istate) { int entries = istate->cache_nr; @@ -1766,7 +1811,7 @@ static int has_racy_timestamp(struct index_state *istate) void update_index_if_able(struct index_state *istate, struct lock_file *lockfile) { if ((istate->cache_changed || has_racy_timestamp(istate)) && - !write_index(istate, lockfile->fd)) + verify_index(istate) && !write_index(istate, lockfile->fd)) commit_locked_index(lockfile); else rollback_lock_file(lockfile); diff --git a/wrapper.c b/wrapper.c index 5b3c7fc..bc1bfb8 100644 --- a/wrapper.c +++ b/wrapper.c @@ -232,6 +232,26 @@ ssize_t write_in_full(int fd, const void *buf, size_t count) return total; } +ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset) +{ + char *p = buf; + ssize_t total =
[PATCH v7 2/2] Verify index file before we opportunistically update it
Before we proceed to "opportunistic update" we must verify that the current index file is the same as the one that we read before. There is a possible race if we don't do this. In the example below git-status does "opportunistic update" and git-rebase updates the index, but the race can happen in general. 1. process A calls git-rebase (or does anything that uses the index) 2. process A applies 1st commit 3. process B calls git-status (or does anything that updates the index) 4. process B reads index 5. process A applies 2nd commit 6. process B takes the lock, then overwrites process A's changes. 7. process A applies 3rd commit As an end result the 3rd commit will have a revert of the 2nd commit. When process B takes the lock, it needs to make sure that the index hasn't changed since step 4. Signed-off-by: Yiannis Marangos --- Version 4 contains fixes based on Junio's comments. Version 5 fixs a typo in commit message (git-show -> git-status). Version 6 removes verify_hdr() and use pread() instead of mmap(). Version 7 use xpread() (added with [PATCH v7 1/2]) instead of pread(). cache.h | 1 + read-cache.c | 47 ++- 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/cache.h b/cache.h index 107ac61..0460f06 100644 --- a/cache.h +++ b/cache.h @@ -279,6 +279,7 @@ struct index_state { initialized : 1; struct hashmap name_hash; struct hashmap dir_hash; + unsigned char sha1[20]; }; extern struct index_state the_index; diff --git a/read-cache.c b/read-cache.c index ba13353..28de1a6 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1477,6 +1477,7 @@ int read_index_from(struct index_state *istate, const char *path) if (verify_hdr(hdr, mmap_size) < 0) goto unmap; + hashcpy(istate->sha1, (unsigned char *)hdr + mmap_size - 20); istate->version = ntohl(hdr->hdr_version); istate->cache_nr = ntohl(hdr->hdr_entries); istate->cache_alloc = alloc_nr(istate->cache_nr); @@ -1760,6 +1761,50 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce, return result; } +/* + * This function verifies if index_state has the correct sha1 of an index file. + * Don't die if we have any other failure, just return 0. + */ +static int verify_index_from(const struct index_state *istate, const char *path) +{ + int fd; + ssize_t n; + struct stat st; + unsigned char sha1[20]; + + if (!istate->initialized) + return 0; + + fd = open(path, O_RDONLY); + if (fd < 0) + return 0; + + if (fstat(fd, &st)) + goto out; + + if (st.st_size < sizeof(struct cache_header) + 20) + goto out; + + n = xpread(fd, sha1, 20, st.st_size - 20); + if (n != 20) + goto out; + + if (hashcmp(istate->sha1, sha1)) + goto out; + + close(fd); + return 1; + +out: + close(fd); + return 0; +} + +static int verify_index(const struct index_state *istate) +{ + return verify_index_from(istate, get_index_file()); +} + static int has_racy_timestamp(struct index_state *istate) { int entries = istate->cache_nr; @@ -1779,7 +1824,7 @@ static int has_racy_timestamp(struct index_state *istate) void update_index_if_able(struct index_state *istate, struct lock_file *lockfile) { if ((istate->cache_changed || has_racy_timestamp(istate)) && - !write_index(istate, lockfile->fd)) + verify_index(istate) && !write_index(istate, lockfile->fd)) commit_locked_index(lockfile); else rollback_lock_file(lockfile); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html