Re: [PATCH v5 23/35] lockfile: avoid transitory invalid states

2014-09-23 Thread Michael Haggerty
On 09/17/2014 12:45 AM, Jonathan Nieder wrote:
 Michael Haggerty wrote:
 
 We could probably continue to use the filename field to encode the
 state by being careful to write characters 1..N-1 of the filename
 first, and then overwrite the NUL at filename[0] with the first
 character of the filename, but that would be awkward and error-prone.

 So, instead of using the filename field to determine whether the
 lock_file object is active, add a new field lock_file::active for
 this purpose.
 
 Nice.
 
 [...]
 --- a/cache.h
 +++ b/cache.h
 @@ -576,6 +576,7 @@ extern int refresh_index(struct index_state *, unsigned 
 int flags, const struct
  
  struct lock_file {
  struct lock_file *next;
 +volatile sig_atomic_t active;
  int fd;
  pid_t owner;
  char on_list;
 [...]
 +++ b/lockfile.c
 @@ -27,16 +27,19 @@
 [...]
 @@ -189,9 +198,14 @@ static int lock_file(struct lock_file *lk, const char 
 *path, int flags)
  atexit(remove_lock_file);
  }
  
 +if (lk-active)
 +die(BUG: lock_file(\%s\) called with an active lock_file 
 object,
 +path);
 
 The error message doesn't make it entirely obvious to me what went
 wrong.
 
 Maybe something like
 
   die(BUG: cannot lock_file(\%s\) on active struct lock_file,
   path);

This is an internal sanity check that users should never see, and if
they do the first thing a developer will do is grep the source code for
the error message in the bug report and then he/she will see exactly
what went wrong. So I don't think it is very important that this error
message be super self-explanatory.

But it doesn't hurt, so I'll make the change you suggest.

 lock_file already assumed on_list was initialized to zero but it
 wasn't particularly obvious since everything else is blindly
 scribbled over.  Probably worth mentioning in the API docs that
 the lock_file should be zeroed before calling hold_lock_file_...

Good point. I will document this better.

 [...]
 @@ -326,6 +341,7 @@ int commit_lock_file(struct lock_file *lk)
  if (rename(lk-filename, result_file))
  goto rollback;
  
 +lk-active = 0;
  lk-filename[0] = 0;
 
 Is it useful to set filename[0] any more?
 
 It seems potentially fragile to set both, since new code can appear
 that only sets or checks one or the other.  Would it make sense to
 rename the filename field to make sure no new callers relying on the
 filename[0] convention sneak in (with the new convention being that
 the filename doesn't get cleared, to avoid problems)?

I admit that nobody should be relying on filename being cleared anymore.
And I can see your point that somebody might come to rely on this
implementation detail. But I don't like leaving valid filenames around
where a bug might cause them to be accessed accidentally. I would rather
set the filename to the empty string so that any attempt to use it
causes an immediate error message from the OS rather than accessing the
wrong file.

I will note in the lock_file docstring that client code should not rely
on the filename being empty when in the 'unlocked' state.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 23/35] lockfile: avoid transitory invalid states

2014-09-16 Thread Michael Haggerty
Because remove_lock_file() can be called any time by the signal
handler, it is important that any lock_file objects that are in the
lock_file_list are always in a valid state.  And since lock_file
objects are often reused (but are never removed from lock_file_list),
that means we have to be careful whenever mutating a lock_file object
to always keep it in a well-defined state.

This was formerly not the case, because part of the state was encoded
by setting lk-filename to the empty string vs. a valid filename.  It
is wrong to assume that this string can be updated atomically; for
example, even

strcpy(lk-filename, value)

is unsafe.  But the old code was even more reckless; for example,

strcpy(lk-filename, path);
if (!(flags  LOCK_NODEREF))
resolve_symlink(lk-filename, max_path_len);
strcat(lk-filename, .lock);

During the call to resolve_symlink(), lk-filename contained the name
of the file that was being locked, not the name of the lockfile.  If a
signal were raised during that interval, then the signal handler would
have deleted the valuable file!

We could probably continue to use the filename field to encode the
state by being careful to write characters 1..N-1 of the filename
first, and then overwrite the NUL at filename[0] with the first
character of the filename, but that would be awkward and error-prone.

So, instead of using the filename field to determine whether the
lock_file object is active, add a new field lock_file::active for
this purpose.  Be careful to set this field only when filename really
contains the name of a file that should be deleted on cleanup.

Helped-by: Johannes Sixt j...@kdbg.org
Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
---
 cache.h  |  1 +
 lockfile.c   | 47 ---
 read-cache.c |  1 +
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index e4e7c56..a367220 100644
--- a/cache.h
+++ b/cache.h
@@ -576,6 +576,7 @@ extern int refresh_index(struct index_state *, unsigned int 
flags, const struct
 
 struct lock_file {
struct lock_file *next;
+   volatile sig_atomic_t active;
int fd;
pid_t owner;
char on_list;
diff --git a/lockfile.c b/lockfile.c
index 6ae5c84..55fbb41 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -27,16 +27,19 @@
  * Instead of (3), the change can be rolled back by deleting lockfile.
  *
  * This module keeps track of all locked files in lock_file_list.
- * When the first file is locked, it registers an atexit(3) handler;
- * when the program exits, the handler rolls back any files that have
- * been locked but were never committed or rolled back.
+ * When the first file is locked, it registers an atexit(3) handler
+ * and a signal handler; when the program exits, the handler rolls
+ * back any files that have been locked but were never committed or
+ * rolled back.
  *
  * A lock_file is owned by the process that created it. The lock_file
  * object has an owner field that records its owner. This field is
  * used to prevent a forked process from closing a lock_file of its
  * parent.
  *
- * A lock_file object can be in several states:
+ * Because the signal handler can be called at any time, a lock_file
+ * object must always be in a well-defined state.  The possible states
+ * are as follows:
  *
  * - Uninitialized.  In this state the object's on_list field must be
  *   zero but the rest of its contents need not be initialized.  As
@@ -44,19 +47,25 @@
  *   registered in the lock_file_list, and on_list is set.
  *
  * - Locked, lockfile open (after hold_lock_file_for_update(),
- *   hold_lock_file_for_append(), or reopen_lock_file()). In this
- *   state, the lockfile exists, filename holds the filename of the
- *   lockfile, fd holds a file descriptor open for writing to the
- *   lockfile, and owner holds the PID of the process that locked the
- *   file.
+ *   hold_lock_file_for_append(), or reopen_lock_file()).  In this
+ *   state:
+ *   - the lockfile exists
+ *   - active is set
+ *   - filename holds the filename of the lockfile
+ *   - fd holds a file descriptor open for writing to the lockfile
+ *   - owner holds the PID of the process that locked the file
  *
  * - Locked, lockfile closed (after close_lock_file()).  Same as the
  *   previous state, except that the lockfile is closed and fd is -1.
  *
  * - Unlocked (after commit_lock_file(), rollback_lock_file(), or a
- *   failed attempt to lock).  In this state, filename[0] == '\0' and
- *   fd is -1.  The object is left registered in the lock_file_list,
- *   and on_list is set.
+ *   failed attempt to lock).  In this state:
+ *   - active is unset
+ *   - filename[0] == '\0' (usually, though there are transitory states
+ * in which this condition doesn't hold)
+ *   - fd is -1
+ *   - the object is left registered in the lock_file_list, and
+ * on_list is set.
  *
  * See Documentation/api-lockfile.txt for more information.
  */

Re: [PATCH v5 23/35] lockfile: avoid transitory invalid states

2014-09-16 Thread Jonathan Nieder
Michael Haggerty wrote:

 We could probably continue to use the filename field to encode the
 state by being careful to write characters 1..N-1 of the filename
 first, and then overwrite the NUL at filename[0] with the first
 character of the filename, but that would be awkward and error-prone.
 
 So, instead of using the filename field to determine whether the
 lock_file object is active, add a new field lock_file::active for
 this purpose.

Nice.

[...]
 --- a/cache.h
 +++ b/cache.h
 @@ -576,6 +576,7 @@ extern int refresh_index(struct index_state *, unsigned 
 int flags, const struct
  
  struct lock_file {
   struct lock_file *next;
 + volatile sig_atomic_t active;
   int fd;
   pid_t owner;
   char on_list;
[...]
 +++ b/lockfile.c
 @@ -27,16 +27,19 @@
[...]
 @@ -189,9 +198,14 @@ static int lock_file(struct lock_file *lk, const char 
 *path, int flags)
   atexit(remove_lock_file);
   }
  
 + if (lk-active)
 + die(BUG: lock_file(\%s\) called with an active lock_file 
 object,
 + path);

The error message doesn't make it entirely obvious to me what went
wrong.

Maybe something like

die(BUG: cannot lock_file(\%s\) on active struct lock_file,
path);

lock_file already assumed on_list was initialized to zero but it
wasn't particularly obvious since everything else is blindly
scribbled over.  Probably worth mentioning in the API docs that
the lock_file should be zeroed before calling hold_lock_file_...

[...]
 @@ -326,6 +341,7 @@ int commit_lock_file(struct lock_file *lk)
   if (rename(lk-filename, result_file))
   goto rollback;
  
 + lk-active = 0;
   lk-filename[0] = 0;

Is it useful to set filename[0] any more?

It seems potentially fragile to set both, since new code can appear
that only sets or checks one or the other.  Would it make sense to
rename the filename field to make sure no new callers relying on the
filename[0] convention sneak in (with the new convention being that
the filename doesn't get cleared, to avoid problems)?

Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html