Re: [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs

2015-06-23 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 On 06/22/2015 11:06 PM, Junio C Hamano wrote:
 ...
 What I am wondering is if we could turn the safety logic that appear
 here (i.e. no existing refs must be assumed by the set of updates,
 etc.)  into an optimization cue and implement this as a special case
 helper to ref_transaction_commit(), i.e.
 
  ref_transaction_commit(...)
 {
  if (updates are all initial creation 
 no existing refs in repository)
  return initial_ref_transaction_commit(...);
  /* otherwise we do the usual thing */
  ...
  }
 
 and have clone call ref_transaction_commit() as usual.

 The safety logic in this function is (approximately) necessary, but not
 sufficient, to guarantee safety.

Oh, no question about it, and you do not even have to bring up an
insane user runs random commands while Git is hard working on it
non use-case ;-)

 One of the shortcuts that it takes is
 not locking the references while they are being created. Therefore, it
 would be unsafe for one process to call ref_transaction_commit() while
 another is calling initial_ref_transaction_commit(). So the caller has
 to know somehow that no other processes are working in the repository
 for this optimization to be safe. It conveys that knowledge by calling
 initial_ref_transaction_commit() rather than ref_transaction_commit().

OK.  So the answer to my first question is the initial creation
logic too fragile is a resounding yes; the caller should know
that it is too crazy for the user to be competing with what it is
doing before deciding to call initial_ref_transaction_commit(),
hence we cannot automatically detect if it is safe from within
ref_transaction_commit() to use this logic as an optimization.

 But I think if anything it would make more sense to go the other direction:

 * Teach ref_transaction_commit() an option that asks it to write
   references updates to packed-refs instead of loose refs (but
   locking the references as usual).

 * Change clone to use ref_transaction_commit() like everybody
   else, passing it the new REFS_WRITE_TO_PACKED_REFS option.

 Then clone would participate in the normal locking protocol, and it
 wouldn't *matter* if another process runs before the clone is finished.

Yeah, I thought that was actually I was driving at, and doing so
without that write-to-packed-refs option, which I'd prefer to leave
it as an optimization inside ref_transaction_commit().

Except that I missed that the initial_* variant is even more
aggressive (i.e. not locking), so no such optimization is safe.

 There would also be some consistency benefits. For example, if
 core.logallrefupdates is set globally or on the command line, the
 initial reference creations would be reflogged. And other operations
 that write references in bulk could use the new
 REFS_WRITE_TO_PACKED_REFS option to prevent loose reference proliferation.

 But I don't think any of this is a problem in practice, and I think we
 can live with using the optimized-but-not-100%-safe
 initial_ref_transaction_commit() for cloning.

OK.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs

2015-06-23 Thread Michael Haggerty
On 06/22/2015 11:06 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 Error out if the ref_transaction includes more than one update for any
 refname.

 Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
 ---
  refs.c | 11 +++
  1 file changed, 11 insertions(+)
 
 This somehow feels like ehh, I now know better and this function
 should have been like this from the beginning to me.
 
 But that is OK.
 
 Is the initial creation logic too fragile to deserve its own
 function to force callers to think about it, by the way?
 
 What I am wondering is if we could turn the safety logic that appear
 here (i.e. no existing refs must be assumed by the set of updates,
 etc.)  into an optimization cue and implement this as a special case
 helper to ref_transaction_commit(), i.e.
 
   ref_transaction_commit(...)
 {
   if (updates are all initial creation 
 no existing refs in repository)
   return initial_ref_transaction_commit(...);
   /* otherwise we do the usual thing */
   ...
   }
 
 and have clone call ref_transaction_commit() as usual.

The safety logic in this function is (approximately) necessary, but not
sufficient, to guarantee safety. One of the shortcuts that it takes is
not locking the references while they are being created. Therefore, it
would be unsafe for one process to call ref_transaction_commit() while
another is calling initial_ref_transaction_commit(). So the caller has
to know somehow that no other processes are working in the repository
for this optimization to be safe. It conveys that knowledge by calling
initial_ref_transaction_commit() rather than ref_transaction_commit().

Of course the next question is, How does `git clone` know that no other
process is working in the new repository? Actually, it doesn't. For
example, I just verified that I can run

git clone $URL mygit 
sleep 0.1
cd mygit
git commit --allow-empty -m New root commit

and thereby overwrite the upstream `master` without the usual
non-fast-forward protection. I guess we are just relying on the user's
common sense not to run Git commands in a new repository before its
creation is complete.

I suppose we *could* special-case `git clone` to not finish the
initialization of the repository (for example, not write the `config`
file) until *after* the packed-refs file is written. This would prevent
other git processes from recognizing the directory as a Git repository
and so prevent them from running before the clone is finished.

But I think if anything it would make more sense to go the other direction:

* Teach ref_transaction_commit() an option that asks it to write
  references updates to packed-refs instead of loose refs (but
  locking the references as usual).

* Change clone to use ref_transaction_commit() like everybody
  else, passing it the new REFS_WRITE_TO_PACKED_REFS option.

Then clone would participate in the normal locking protocol, and it
wouldn't *matter* if another process runs before the clone is finished.
There would also be some consistency benefits. For example, if
core.logallrefupdates is set globally or on the command line, the
initial reference creations would be reflogged. And other operations
that write references in bulk could use the new
REFS_WRITE_TO_PACKED_REFS option to prevent loose reference proliferation.

But I don't think any of this is a problem in practice, and I think we
can live with using the optimized-but-not-100%-safe
initial_ref_transaction_commit() for cloning.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs

2015-06-22 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 Error out if the ref_transaction includes more than one update for any
 refname.

 Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
 ---
  refs.c | 11 +++
  1 file changed, 11 insertions(+)

This somehow feels like ehh, I now know better and this function
should have been like this from the beginning to me.

But that is OK.

Is the initial creation logic too fragile to deserve its own
function to force callers to think about it, by the way?

What I am wondering is if we could turn the safety logic that appear
here (i.e. no existing refs must be assumed by the set of updates,
etc.)  into an optimization cue and implement this as a special case
helper to ref_transaction_commit(), i.e.

ref_transaction_commit(...)
{
if (updates are all initial creation 
no existing refs in repository)
return initial_ref_transaction_commit(...);
/* otherwise we do the usual thing */
...
}

and have clone call ref_transaction_commit() as usual.
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs

2015-06-22 Thread Michael Haggerty
Error out if the ref_transaction includes more than one update for any
refname.

Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
---
 refs.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/refs.c b/refs.c
index 31661c7..53d9e45 100644
--- a/refs.c
+++ b/refs.c
@@ -4087,12 +4087,22 @@ int initial_ref_transaction_commit(struct 
ref_transaction *transaction,
int ret = 0, i;
int n = transaction-nr;
struct ref_update **updates = transaction-updates;
+   struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
 
assert(err);
 
if (transaction-state != REF_TRANSACTION_OPEN)
die(BUG: commit called for transaction that is not open);
 
+   /* Fail if a refname appears more than once in the transaction: */
+   for (i = 0; i  n; i++)
+   string_list_append(affected_refnames, updates[i]-refname);
+   string_list_sort(affected_refnames);
+   if (ref_update_reject_duplicates(affected_refnames, err)) {
+   ret = TRANSACTION_GENERIC_ERROR;
+   goto cleanup;
+   }
+
for (i = 0; i  n; i++) {
struct ref_update *update = updates[i];
 
@@ -4125,6 +4135,7 @@ int initial_ref_transaction_commit(struct ref_transaction 
*transaction,
 
 cleanup:
transaction-state = REF_TRANSACTION_CLOSED;
+   string_list_clear(affected_refnames, 0);
return ret;
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe git in