Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-14 Thread Junio C Hamano
Duy Nguyen  writes:

> On Sat, Apr 12, 2014 at 11:19 AM, Junio C Hamano  wrote:
>
>> In the spectrum between useful and insane, there is a point where we
>> should just tell the insane: don't do it then.
>
> ... A process' dying is a
> way of telling people "this is insane" without having to draw a line
> between dos and dont's in documents.

Fair enough.  Instead of "that is insane and you may end up
recording what you do not intend to", being able to say "that is
insane and you will see some commands failing when you may end up
being in such a situation" is a lot more explicit.

Thanks for injecting some sanity to me.

> Serializing write access to make both competing processes is nice, but
> that's a separate step that may or may not be worth pursuing. And I'm
> towards not worth pursuing.

I think we are in agreement on that point.

I would want to see the proposed "read without locking, compute
heavily and then take a lock and revalidate before writing it out,
or fail" done with a focus on not hurting the "sane" use pattern,
i.e. making the "revalidate" part as light as possible.  I however
cannot seem to find any optimization opportunities other than the
"open, read and compare the checksum".  Two I thought and discarded
are:

 - checking timestamp of the index file itself, and failing when it
   has changed (without opening or reading the checksum) is not
   good; it is optimizing for the wrong case, because we would need
   to check the checksum anyway when the timestamps match.

 - checking i-num of the index file itself, and failing when it has
   changed (without opening or reading the checksum) is not good,
   either.  Creating "index.lock", writing to it and renaming it to
   "index", keeping the "index" during the whole period, would
   ensure that the index file that results with this single cycle
   would get a different i-num from the original, but if that is
   done twice or more, the same i-num can be reused and defeat the
   check.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-12 Thread Duy Nguyen
On Sat, Apr 12, 2014 at 11:19 AM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> On Sat, Apr 12, 2014 at 3:43 AM, Junio C Hamano  wrote:
>>> Having said that, nobody sane would be running two simultaneous
>>> operations that are clearly write-oriented competing with each other
>>> against the same index file.
>>
>> When it comes to racing, sanity does not matter much. People could
>> just do it without thinking what exactly is happening behind the
>> scene.
>
> Well, "a race is a race is a race" would also be my knee-jerk
> reaction when anybody utters the word "race", but you need to
> realize that we are not talking about races like object creation
> while "gc --auto" running in the background or two pushes trying to
> update the same ref to different values, which are meaningful use
> cases.
>
> What is the race under discussion about?  It is about the index,
> which corresponds one-to-one to the working tree, so in order for
> the "race" to matter, you need to be racing against another process
> that is not cooperating with you (e.g. a continuous and uncontrolled
> "git add" updating the index while you are doing a real work),
> mucking with the index in the same working tree.  How could such a
> workflow any useful in the real life?
>
> In the spectrum between useful and insane, there is a point where we
> should just tell the insane: don't do it then.

The thing is people do not usually have that level of knowledge of how
git works. They could write a cron job to do something in some repos,
not aware at all about these non-cooperations. Telling people not to
automatically touch a git directory at all is a bit extreme. I think
this patch is about guarding the user from shooting themselves. Either
a command works correctly, not not work at all. A process' dying is a
way of telling people "this is insane" without having to draw a line
between dos and dont's in documents.

Serializing write access to make both competing processes is nice, but
that's a separate step that may or may not be worth pursuing. And I'm
towards not worth pursuing. As you metioned in the next mail,
serializing helps two competing processes, but not two command
sequences.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-12 Thread Junio C Hamano
Junio C Hamano  writes:

> What is the race under discussion about?  It is about the index,
> which corresponds one-to-one to the working tree, so in order for
> the "race" to matter, you need to be racing against another process
> that is not cooperating with you (e.g. a continuous and uncontrolled
> "git add" updating the index while you are doing a real work),
> mucking with the index in the same working tree.  How could such a
> workflow any useful in the real life?
>
> In the spectrum between useful and insane, there is a point where we
> should just tell the insane: don't do it then.
>
> Having said that...
>
>>> So in that sense that can be done as a less urgent follow-up for this topic.
>>
>> Yeah if racing at refresh time is a real problem, sure we should solve it 
>> first.
>
> ... In order to solve the write-vs-write competition in a useful
> way, you must serialize the competing accesses.  E.g. "git add"
> would first take a write lock on the index before read_cache(), and
> then do its operation and finally release the lock by the usual
> write-to-lock-close-then-rename dance.

Extending on this a bit.

Here I didn't mean the traditional ".lock" we create via the
lockfile() API.  Rather, we would want to use fcntl/flock style lock
that lets others _wait_, not _fail_.  Because...

> The lazy "read and refresh in-core first, hoping that we did not
> compete with anybody, and then check just before writing out after
> taking the lock" is a very good solution for the opportunistic
> update codepath, because it is an option not to write the result out
> when there actually was an update by somebody else.  But such an
> opportunistic locking scheme does not work for write-vs-write
> competition.  Upon a real conflict, you need to fail the entire
> operation.

...having multiple conflicting writers on the single index file is
what you thought about worth protecting against.  When somebody else
is pounding on the index file you are trying to prepare your next
commmit in, with his writing that can unexpectedly overwrite what
you prepared, you would at least want the accesses serialized,
without getting half of your attempt to "git add" fail (and having
to redo).  For that, you would want your "git add" to wait while the
other party is mucking with the index under lock, instead of
failing, which is what you would get from the traditional lockfile
API based locking.

But that still takes us back to the "don't do it then".  It is true
that, with serialization, you may be able to guarantee that one "git
add" you do would start from one state (which may record the state
of your previous "git add", or which may have further been modified
by the other process) and ends with whatever that "git add" added,
without any other change.  But even in that case, when you finally
do a "git commit", can you say you know what is in the index?  I do
not think so.  After all, the root cause of the "race" issue is that
the other process pounds at the same index while you are working on
it, without any coordination with you.

So...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-11 Thread Junio C Hamano
Duy Nguyen  writes:

> On Sat, Apr 12, 2014 at 3:43 AM, Junio C Hamano  wrote:
>> Having said that, nobody sane would be running two simultaneous
>> operations that are clearly write-oriented competing with each other
>> against the same index file.
>
> When it comes to racing, sanity does not matter much. People could
> just do it without thinking what exactly is happening behind the
> scene.

Well, "a race is a race is a race" would also be my knee-jerk
reaction when anybody utters the word "race", but you need to
realize that we are not talking about races like object creation
while "gc --auto" running in the background or two pushes trying to
update the same ref to different values, which are meaningful use
cases.

What is the race under discussion about?  It is about the index,
which corresponds one-to-one to the working tree, so in order for
the "race" to matter, you need to be racing against another process
that is not cooperating with you (e.g. a continuous and uncontrolled
"git add" updating the index while you are doing a real work),
mucking with the index in the same working tree.  How could such a
workflow any useful in the real life?

In the spectrum between useful and insane, there is a point where we
should just tell the insane: don't do it then.

Having said that...

>> So in that sense that can be done as a less urgent follow-up for this topic.
>
> Yeah if racing at refresh time is a real problem, sure we should solve it 
> first.

... In order to solve the write-vs-write competition in a useful
way, you must serialize the competing accesses.  E.g. "git add"
would first take a write lock on the index before read_cache(), and
then do its operation and finally release the lock by the usual
write-to-lock-close-then-rename dance.

The lazy "read and refresh in-core first, hoping that we did not
compete with anybody, and then check just before writing out after
taking the lock" is a very good solution for the opportunistic
update codepath, because it is an option not to write the result out
when there actually was an update by somebody else.  But such an
opportunistic locking scheme does not work for write-vs-write
competition.  Upon a real conflict, you need to fail the entire
operation.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-11 Thread Duy Nguyen
On Sat, Apr 12, 2014 at 3:43 AM, Junio C Hamano  wrote:
> Having said that, nobody sane would be running two simultaneous
> operations that are clearly write-oriented competing with each other
> against the same index file.

When it comes to racing, sanity does not matter much. People could
just do it without thinking what exactly is happening behind the
scene.

> So in that sense that can be done as a less urgent follow-up for this topic.

Yeah if racing at refresh time is a real problem, sure we should solve it first.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-11 Thread Yiannis Marangos
On Fri, Apr 11, 2014 at 01:43:47PM -0700, Junio C Hamano wrote:
> Having said that, nobody sane would be running two simultaneous
> operations that are clearly write-oriented competing with each other
> against the same index file.  So in that sense that can be done as a
> less urgent follow-up for this topic.

I'm willing to take this task, I decide to spend some of my free time
for git development. But since I'm a newbie with it's architecture I
don't know when it will be ready because I want to explore the code a
bit.

--
Yiannis
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-11 Thread Junio C Hamano
Junio C Hamano  writes:

> Yeah, I was hoping that the real write codepath (as opposed to "this
> is read only and we read the index without holding a lock---now we
> noticed that the index needs refreshing, and we know how the
> resulting refreshed index should look like, perhaps we can write it
> to save cycles for other processes" codepath where we cannot and
> should not take a lock early) would take the lock and then read, but
> because that is not the way they work, the need the same protection,
> I would think.

Having said that, nobody sane would be running two simultaneous
operations that are clearly write-oriented competing with each other
against the same index file.  So in that sense that can be done as a
less urgent follow-up for this topic.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-11 Thread Junio C Hamano
Duy Nguyen  writes:

> On Fri, Apr 11, 2014 at 2:28 AM, Junio C Hamano  wrote:
>> Yiannis Marangos  writes:
>>
>>> + n = xpread(fd, sha1, 20, st.st_size - 20);
>>> + if (n != 20)
>>> + goto out;
>>
>> I think it is possible for pread(2) to give you a short-read.
>>
>> The existing callers of emulated mmap and index-pack are prepared to
>> handle a short-read correctly, but I do not think this code does.
>>
>> I'll queue this instead in the meantime.
>
> There are two things to sort out (sorry I can't spend much time on it
> right now): should the same sha1 test be done in write_index(), in
> addition to update_index_if_able(). And what do we do about
> istate->sha1[] in discard_index(), keep it or clear it.

Yeah, I was hoping that the real write codepath (as opposed to "this
is read only and we read the index without holding a lock---now we
noticed that the index needs refreshing, and we know how the
resulting refreshed index should look like, perhaps we can write it
to save cycles for other processes" codepath where we cannot and
should not take a lock early) would take the lock and then read, but
because that is not the way they work, the need the same protection,
I would think.  As discard_index() is not "we are removing all the
entries because the user told us to 'rm -r .'" but is "for the
purpose of our internal processing we do not need the old contents
of the index", my knee-jerk reaction is that we should not clear it.

Any operation that makes the resulting index empty (i.e. no path
being tracked) should still write an empty index (with the usual
header and the most importantly the trailing file checksum), but
that goes without saying.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-11 Thread Yiannis Marangos
On Fri, Apr 11, 2014 at 09:47:09AM +0200, Torsten Bögershausen wrote:
>   6. process A applies a commit:
>   - read the index into memory
>   - take the lock
>   - update the index file on disc
>   - release the lock

So here we can have race condition. Maybe we should implement Duy's
idea of verifying the sha1 in write_index()?


> The new code works like this:
>
>   8. process B applies a commit:
>   - take the lock
>   - verifies tha the index file on disc has the same sha as the one read 
> before
>   # And if not: What do we do? die() or retry() ?
>   - update the index file on disc
>   - release the lock

IMO, we must never die() if we want to do opportunistic update and we
should never retry because it's a bit expensive if we do opportunistic update.


--
Yiannis
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-11 Thread Duy Nguyen
On Fri, Apr 11, 2014 at 2:28 AM, Junio C Hamano  wrote:
> +static int verify_index_from(const struct index_state *istate, const char 
> *path)
> +{
> +   int fd;
> +   ssize_t n;
> +   struct stat st;
> +   unsigned char sha1[20];
> +
> +   if (!istate->initialized)
> +   return 0;
> +
> +   fd = open(path, O_RDONLY);
> +   if (fd < 0)
> +   return 0;

Suppose another process "git rm --cached :/" is racing with us and
this imaginary git is so smart that it figures if nothing valueable is
left in the index, there's no need to write the index down at all. So
it removes $GIT_DIR/index, then $GIT_DIR/index.lock. When we come
here, we see ENOENT, but we should return 1 instead because the file
removal in this case is a form of change.

That opens a question about writing a new index. I think we could use
all-zero sha-1 as the indicator that this is a fresh new index. If
istate->sha1[] is all-zero and no index file exists, then we do not
need to verify (i.e. return 0). However if istate->sha1[] is all-zero,
but $GIT_DIR/index exists, then return 1.

I'm still not sure if elsewhere in the code base we read
$GIT_DIR/index to active_index, create a new index_state, copy entries
over (but not active_index.sha1[]), then write the _new_ index_state
down. That would hit the "however" statement above and incorrectly
return 1.

I suppose that all other errors except ENOENT could be safely ignored
(i.e. return 0 regardless of istate->sha1[]).

> +
> +   if (fstat(fd, &st))
> +   goto out;
> +
> +   if (st.st_size < sizeof(struct cache_header) + 20)
> +   goto out;
> +
> +   n = pread_in_full(fd, sha1, 20, st.st_size - 20);
> +   if (n != 20)
> +   goto out;
> +
> +   if (hashcmp(istate->sha1, sha1))
> +   goto out;
> +
> +   close(fd);
> +   return 1;
> +
> +out:
> +   close(fd);
> +   return 0;
> +}
> +
> +static int verify_index(const struct index_state *istate)
> +{
> +   return verify_index_from(istate, get_index_file());
> +}
> +
>  static int has_racy_timestamp(struct index_state *istate)
>  {
> int entries = istate->cache_nr;
> @@ -1766,7 +1811,7 @@ static int has_racy_timestamp(struct index_state 
> *istate)
>  void update_index_if_able(struct index_state *istate, struct lock_file 
> *lockfile)
>  {
> if ((istate->cache_changed || has_racy_timestamp(istate)) &&
> -   !write_index(istate, lockfile->fd))
> +   verify_index(istate) && !write_index(istate, lockfile->fd))
> commit_locked_index(lockfile);
> else
> rollback_lock_file(lockfile);
> diff --git a/wrapper.c b/wrapper.c
> index 5b3c7fc..bc1bfb8 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -232,6 +232,26 @@ ssize_t write_in_full(int fd, const void *buf, size_t 
> count)
> return total;
>  }
>
> +ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
> +{
> +   char *p = buf;
> +   ssize_t total = 0;
> +
> +   while (count > 0) {
> +   ssize_t loaded = xpread(fd, p, count, offset);
> +   if (loaded < 0)
> +   return -1;
> +   if (loaded == 0)
> +   return total;
> +   count -= loaded;
> +   p += loaded;
> +   total += loaded;
> +   offset += loaded;
> +   }
> +
> +   return total;
> +}
> +
>  int xdup(int fd)
>  {
> int ret = dup(fd);
> --
> 1.9.2-590-g468068b
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-11 Thread Torsten Bögershausen
On 2014-04-10 21.28, Junio C Hamano wrote:
> Yiannis Marangos  writes:
> 
>> +n = xpread(fd, sha1, 20, st.st_size - 20);
>> +if (n != 20)
>> +goto out;
> 
> I think it is possible for pread(2) to give you a short-read.
> 
> The existing callers of emulated mmap and index-pack are prepared to
> handle a short-read correctly, but I do not think this code does.
> 
> I'll queue this instead in the meantime.
> 
> -- >8 --
> From: Yiannis Marangos 
> Date: Thu, 10 Apr 2014 21:31:21 +0300
> Subject: [PATCH] read-cache.c: verify index file before we opportunistically 
> update it
> 
> Before we proceed to opportunistically update the index (often done
> by an otherwise read-only operation like "git status" and "git diff"
> that internally refreshes the index), we must verify that the
> current index file is the same as the one that we read earlier
> before we took the lock on it, in order to avoid a possible race.
> 
> In the example below git-status does "opportunistic update" and
> git-rebase updates the index, but the race can happen in general.
I'm not sure if we need the second or third commit of process A at all.
My understanding is that the simplified version will have problems as well:

  1. process A calls git-rebase (or does anything that updates the index)
  2. process change
  3. process B calls git-status (or does anything that updates the index)
  4. process B reads the index file into memory
  5. process change
  6. process A applies a commit:
  - read the index into memory
  - take the lock
  - update the index file on disc
  - release the lock
  7. process change
  8. process B applies a commit:
  - take the lock
  - update the index in memory and write the index file to disc
  - release the lock

   Now process B has overwritten the commit from process A, which is wrong.

The new code works like this:

  8. process B applies a commit:
  - take the lock
  - verifies tha the index file on disc has the same sha as the one read 
before
  # And if not: What do we do? die() or retry() ?
  - update the index file on disc
  - release the lock
[]
> 
> +static int verify_index_from(const struct index_state *istate, const char 
> *path)
> +{
> + int fd;
> + ssize_t n;
> + struct stat st;
> + unsigned char sha1[20];
> +
> + if (!istate->initialized)
> + return 0;
> +
> + fd = open(path, O_RDONLY);
> + if (fd < 0)
> + return 0;
> +
> + if (fstat(fd, &st))
> + goto out;
> +
> + if (st.st_size < sizeof(struct cache_header) + 20)
> + goto out;
> +
> + n = pread_in_full(fd, sha1, 20, st.st_size - 20);
Minor :
What is the advantage of pread() against a lseek()/read_in_full() combo?
fd is opened and used only in one thread.
Introducing pread()/ pread_in_full() could be done in an other commit,
or do I miss something ?
> + if (n != 20)
> + goto out;

> +static int verify_index(const struct index_state *istate)
> +{
> + return verify_index_from(istate, get_index_file());
> +}
> +
Minor:
Do we really need the wrapper function verify_index_from()?
It seems as if the whole code from verify_index_from() could go into
verify_index(), which will call get_index_file()
  
 
>  static int has_racy_timestamp(struct index_state *istate)
>  {
>   int entries = istate->cache_nr;
> @@ -1766,7 +1811,7 @@ static int has_racy_timestamp(struct index_state 
> *istate)
>  void update_index_if_able(struct index_state *istate, struct lock_file 
> *lockfile)
>  {
>   if ((istate->cache_changed || has_racy_timestamp(istate)) &&
> - !write_index(istate, lockfile->fd))
> + verify_index(istate) && !write_index(istate, lockfile->fd))
>   commit_locked_index(lockfile);
>   else
>   rollback_lock_file(lockfile);
> diff --git a/wrapper.c b/wrapper.c
> index 5b3c7fc..bc1bfb8 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -232,6 +232,26 @@ ssize_t write_in_full(int fd, const void *buf, size_t 
> count)
>   return total;
>  }
>  
> +ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
> +{
> + char *p = buf;
> + ssize_t total = 0;
> +
> + while (count > 0) {
> + ssize_t loaded = xpread(fd, p, count, offset);
> + if (loaded < 0)
> + return -1;
> + if (loaded == 0)
> + return total;
> + count -= loaded;
> + p += loaded;
> + total += loaded;
> + offset += loaded;
> + }
> +
> + return total;
> +}
> +
>  int xdup(int fd)
>  {
>   int ret = dup(fd);
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-10 Thread Duy Nguyen
On Fri, Apr 11, 2014 at 2:28 AM, Junio C Hamano  wrote:
> Yiannis Marangos  writes:
>
>> + n = xpread(fd, sha1, 20, st.st_size - 20);
>> + if (n != 20)
>> + goto out;
>
> I think it is possible for pread(2) to give you a short-read.
>
> The existing callers of emulated mmap and index-pack are prepared to
> handle a short-read correctly, but I do not think this code does.
>
> I'll queue this instead in the meantime.

There are two things to sort out (sorry I can't spend much time on it
right now): should the same sha1 test be done in write_index(), in
addition to update_index_if_able(). And what do we do about
istate->sha1[] in discard_index(), keep it or clear it.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-10 Thread Junio C Hamano
Yiannis Marangos  writes:

> + n = xpread(fd, sha1, 20, st.st_size - 20);
> + if (n != 20)
> + goto out;

I think it is possible for pread(2) to give you a short-read.

The existing callers of emulated mmap and index-pack are prepared to
handle a short-read correctly, but I do not think this code does.

I'll queue this instead in the meantime.

-- >8 --
From: Yiannis Marangos 
Date: Thu, 10 Apr 2014 21:31:21 +0300
Subject: [PATCH] read-cache.c: verify index file before we opportunistically 
update it

Before we proceed to opportunistically update the index (often done
by an otherwise read-only operation like "git status" and "git diff"
that internally refreshes the index), we must verify that the
current index file is the same as the one that we read earlier
before we took the lock on it, in order to avoid a possible race.

In the example below git-status does "opportunistic update" and
git-rebase updates the index, but the race can happen in general.

  1. process A calls git-rebase (or does anything that uses the index)

  2. process A applies 1st commit

  3. process B calls git-status (or does anything that updates the index)

  4. process B reads index

  5. process A applies 2nd commit

  6. process B takes the lock, then overwrites process A's changes.

  7. process A applies 3rd commit

As an end result the 3rd commit will have a revert of the 2nd commit.
When process B takes the lock, it needs to make sure that the index
hasn't changed since step 4.

Signed-off-by: Yiannis Marangos 
Signed-off-by: Junio C Hamano 
---
 cache.h  |  3 +++
 read-cache.c | 47 ++-
 wrapper.c| 20 
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index ce377e1..9244c38 100644
--- a/cache.h
+++ b/cache.h
@@ -279,6 +279,7 @@ struct index_state {
 initialized : 1;
struct hash_table name_hash;
struct hash_table dir_hash;
+   unsigned char sha1[20];
 };
 
 extern struct index_state the_index;
@@ -1199,6 +1200,8 @@ extern void fsync_or_die(int fd, const char *);
 
 extern ssize_t read_in_full(int fd, void *buf, size_t count);
 extern ssize_t write_in_full(int fd, const void *buf, size_t count);
+extern ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset);
+
 static inline ssize_t write_str_in_full(int fd, const char *str)
 {
return write_in_full(fd, str, strlen(str));
diff --git a/read-cache.c b/read-cache.c
index 33dd676..f4a0d61 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1464,6 +1464,7 @@ int read_index_from(struct index_state *istate, const 
char *path)
if (verify_hdr(hdr, mmap_size) < 0)
goto unmap;
 
+   hashcpy(istate->sha1, (unsigned char *)hdr + mmap_size - 20);
istate->version = ntohl(hdr->hdr_version);
istate->cache_nr = ntohl(hdr->hdr_entries);
istate->cache_alloc = alloc_nr(istate->cache_nr);
@@ -1747,6 +1748,50 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct 
cache_entry *ce,
return result;
 }
 
+/*
+ * This function verifies if index_state has the correct sha1 of the
+ * index file.  Don't die if we have any other failure, just return 0.
+ */
+static int verify_index_from(const struct index_state *istate, const char 
*path)
+{
+   int fd;
+   ssize_t n;
+   struct stat st;
+   unsigned char sha1[20];
+
+   if (!istate->initialized)
+   return 0;
+
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return 0;
+
+   if (fstat(fd, &st))
+   goto out;
+
+   if (st.st_size < sizeof(struct cache_header) + 20)
+   goto out;
+
+   n = pread_in_full(fd, sha1, 20, st.st_size - 20);
+   if (n != 20)
+   goto out;
+
+   if (hashcmp(istate->sha1, sha1))
+   goto out;
+
+   close(fd);
+   return 1;
+
+out:
+   close(fd);
+   return 0;
+}
+
+static int verify_index(const struct index_state *istate)
+{
+   return verify_index_from(istate, get_index_file());
+}
+
 static int has_racy_timestamp(struct index_state *istate)
 {
int entries = istate->cache_nr;
@@ -1766,7 +1811,7 @@ static int has_racy_timestamp(struct index_state *istate)
 void update_index_if_able(struct index_state *istate, struct lock_file 
*lockfile)
 {
if ((istate->cache_changed || has_racy_timestamp(istate)) &&
-   !write_index(istate, lockfile->fd))
+   verify_index(istate) && !write_index(istate, lockfile->fd))
commit_locked_index(lockfile);
else
rollback_lock_file(lockfile);
diff --git a/wrapper.c b/wrapper.c
index 5b3c7fc..bc1bfb8 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -232,6 +232,26 @@ ssize_t write_in_full(int fd, const void *buf, size_t 
count)
return total;
 }
 
+ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
+{
+   char *p = buf;
+   ssize_t total =

[PATCH v7 2/2] Verify index file before we opportunistically update it

2014-04-10 Thread Yiannis Marangos
Before we proceed to "opportunistic update" we must verify that the
current index file is the same as the one that we read before. There
is a possible race if we don't do this. In the example below git-status
does "opportunistic update" and git-rebase updates the index, but the
race can happen in general.

  1. process A calls git-rebase (or does anything that uses the index)

  2. process A applies 1st commit

  3. process B calls git-status (or does anything that updates the index)

  4. process B reads index

  5. process A applies 2nd commit

  6. process B takes the lock, then overwrites process A's changes.

  7. process A applies 3rd commit

As an end result the 3rd commit will have a revert of the 2nd commit.
When process B takes the lock, it needs to make sure that the index
hasn't changed since step 4.

Signed-off-by: Yiannis Marangos 
---

Version 4 contains fixes based on Junio's comments.
Version 5 fixs a typo in commit message (git-show -> git-status).
Version 6 removes verify_hdr() and use pread() instead of mmap().
Version 7 use xpread() (added with [PATCH v7 1/2]) instead of pread().

 cache.h  |  1 +
 read-cache.c | 47 ++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 107ac61..0460f06 100644
--- a/cache.h
+++ b/cache.h
@@ -279,6 +279,7 @@ struct index_state {
 initialized : 1;
struct hashmap name_hash;
struct hashmap dir_hash;
+   unsigned char sha1[20];
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index ba13353..28de1a6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1477,6 +1477,7 @@ int read_index_from(struct index_state *istate, const 
char *path)
if (verify_hdr(hdr, mmap_size) < 0)
goto unmap;
 
+   hashcpy(istate->sha1, (unsigned char *)hdr + mmap_size - 20);
istate->version = ntohl(hdr->hdr_version);
istate->cache_nr = ntohl(hdr->hdr_entries);
istate->cache_alloc = alloc_nr(istate->cache_nr);
@@ -1760,6 +1761,50 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct 
cache_entry *ce,
return result;
 }
 
+/*
+ * This function verifies if index_state has the correct sha1 of an index file.
+ * Don't die if we have any other failure, just return 0.
+ */
+static int verify_index_from(const struct index_state *istate, const char 
*path)
+{
+   int fd;
+   ssize_t n;
+   struct stat st;
+   unsigned char sha1[20];
+
+   if (!istate->initialized)
+   return 0;
+
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return 0;
+
+   if (fstat(fd, &st))
+   goto out;
+
+   if (st.st_size < sizeof(struct cache_header) + 20)
+   goto out;
+
+   n = xpread(fd, sha1, 20, st.st_size - 20);
+   if (n != 20)
+   goto out;
+
+   if (hashcmp(istate->sha1, sha1))
+   goto out;
+
+   close(fd);
+   return 1;
+
+out:
+   close(fd);
+   return 0;
+}
+
+static int verify_index(const struct index_state *istate)
+{
+   return verify_index_from(istate, get_index_file());
+}
+
 static int has_racy_timestamp(struct index_state *istate)
 {
int entries = istate->cache_nr;
@@ -1779,7 +1824,7 @@ static int has_racy_timestamp(struct index_state *istate)
 void update_index_if_able(struct index_state *istate, struct lock_file 
*lockfile)
 {
if ((istate->cache_changed || has_racy_timestamp(istate)) &&
-   !write_index(istate, lockfile->fd))
+   verify_index(istate) && !write_index(istate, lockfile->fd))
commit_locked_index(lockfile);
else
rollback_lock_file(lockfile);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html