Re: [PATCH/RFC 6/7] refs: add update_refs for multiple simultaneous updates

2013-08-29 Thread Brad King
On 08/29/2013 02:20 PM, Brad King wrote:
> I wasn't happy with the asymmetry either but forgot to raise it in
> the cover letter.  We need a way to represent "old value not given"
> as different from "old value is_null_sha1".
[snip]
> Another approach is to add a dedicated member to struct ref_update.

FYI, I chose the latter approach for simplicity.  The member
can be folded into the flags by future refactoring if needed.

-Brad
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 6/7] refs: add update_refs for multiple simultaneous updates

2013-08-29 Thread Brad King
On 08/29/2013 02:32 PM, Junio C Hamano wrote:
> But it may not be a bad idea to keep the callers dumb and have this
> function always sort, dedup, *and* fail inconsistent request.

I agree.  I was just starting to write the comment for update_refs
and it basically would have said "always use ref_update_sort and
check for duplicates first".  We might was well build it into the
function.  If some call site in the future wants to optimize a case
known to be sorted it can be refactored later.

Thanks,
-Brad
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 6/7] refs: add update_refs for multiple simultaneous updates

2013-08-29 Thread Junio C Hamano
Brad King  writes:

> On 08/29/2013 01:39 PM, Junio C Hamano wrote:
>> Brad King  writes:
>>> +   for (i=0; i < n; ++i) {
>> 
>> Style:
>> 
>>  for (i = 0; i < n; i++) {
>
> Fixed.
>
>> Is it asking for AB-BA deadlock?  If so, is the caller responsible
>> for avoiding it?
>
> Since we don't actually block waiting for locks we won't really
> deadlock.

Ahh, OK.

> For Git's internal API I think we can document this in a comment so
> that update_refs does not have to sort.  Then we can add a new
> ref_update_sort function to sort an array of struct ref_update.
> The user-facing "update-ref --stdin" can then use ref_update_sort.

My immediate reaction was "is there a case where the caller knows
that it already has a sorted collection?".  The single caller you
are envisioning could collect the proposed updates to a string list
and dedup, I think, and the resulting list would then be already
sorted.

But it may not be a bad idea to keep the callers dumb and have this
function always sort, dedup, *and* fail inconsistent request.  Then
your original caller that just collects --stdin input can pass
possibly unsorted, duplicated and/or inconsistent request to the
function and have it do the sanity check.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 6/7] refs: add update_refs for multiple simultaneous updates

2013-08-29 Thread Brad King
On 08/29/2013 01:39 PM, Junio C Hamano wrote:
> Brad King  writes:
>> +for (i=0; i < n; ++i) {
> 
> Style:
> 
>   for (i = 0; i < n; i++) {

Fixed.

> Is it asking for AB-BA deadlock?  If so, is the caller responsible
> for avoiding it?

Since we don't actually block waiting for locks we won't really
deadlock.  However, if two competing callers keep repeating they
might never finish.  In order to avoid this one must sort the refs
so that locks are always acquired in a consistent order.

For Git's internal API I think we can document this in a comment so
that update_refs does not have to sort.  Then we can add a new
ref_update_sort function to sort an array of struct ref_update.
The user-facing "update-ref --stdin" can then use ref_update_sort.

The sort logic may subsume duplicate ref update detection too.
After sorting a simple linear-time scan can detect duplicates.

>> +unsigned char new_sha1[20];
>> +unsigned char *old_sha1;
> 
> This asymmetry is rather curious and will later become problematic
> when we have more users of this API.  Maybe your callers happen have
> static storage to keep old_sha1[] outside this structure but do not
> have it for new_sha1[] for some unknown reason, which may not apply
> to later callers we may want to add.

I wasn't happy with the asymmetry either but forgot to raise it in
the cover letter.  We need a way to represent "old value not given"
as different from "old value is_null_sha1".

One approach is to use a bit in the "flags" member that already
stores REF_NODEREF.  However, that will be inconsistent with the
other ref update APIs that check whether old_sha1 is NULL.  We could
still do it and document the bit to mean "ignore old_sha1 member of
struct ref_update".

Another approach is to add a dedicated member to struct ref_update.

Comments?
-Brad
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 6/7] refs: add update_refs for multiple simultaneous updates

2013-08-29 Thread Junio C Hamano
Brad King  writes:

> Add 'struct ref_update' to encode the information needed to update or
> delete a ref (name, new sha1, optional old sha1, no-deref flag).  Add
> function 'update_refs' accepting an array of updates to perform.  First
> acquire locks on all refs with verified old values.  Then update or
> delete all refs accordingly.  Fail if any one lock cannot be obtained or
> any one old value does not match.
>
> Though the refs themeselves cannot be modified together in a single
> atomic transaction, this function does enable some useful semantics.
> For example, a caller may create a new branch starting from the head of
> another branch and rewind the original branch at the same time.  This
> transfers ownership of commits between branches without risk of losing
> commits added to the original branch by a concurrent process, or risk of
> a concurrent process creating the new branch first.
>
> Signed-off-by: Brad King 
> ---
>  refs.c |   66 
> 
>  refs.h |   11 +++
>  2 files changed, 77 insertions(+)
>
> diff --git a/refs.c b/refs.c
> index 5a6c14e..0a0c19e 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -3238,6 +3238,72 @@ int update_ref(const char *action, const char *refname,
>   return update_ref_write(action, refname, sha1, lock, onerr);
>  }
>  
> +int update_refs(const char *action, struct ref_update *updates,
> + int n, enum action_on_err onerr)
> +{
> + int ret = 0, delnum = 0, i;
> + int *types;
> + struct ref_lock **locks;
> + const char **delnames;
> +
> + if (!updates || !n)
> + return 0;
> +
> + /* Allocate work space: */
> + types = xmalloc(sizeof(int)*n);
> + locks = xmalloc(sizeof(struct ref_lock*)*n);
> + delnames = xmalloc(sizeof(const char*)*n);
> +
> + /* Acquire all locks while verifying old values: */
> + for (i=0; i < n; ++i) {

Style:

for (i = 0; i < n; i++) {

> + locks[i] = update_ref_lock(updates[i].ref_name,
> +updates[i].old_sha1,
> +updates[i].flags,
> +&types[i], onerr);
> + if (!locks[i])
> + break;
> + }

Is it asking for AB-BA deadlock?  If so, is the caller responsible
for avoiding it?

> + /* Abort if we did not get all locks: */
> + if (i < n) {
> + while (--i >= 0)
> + unlock_ref(locks[i]);
> + free(types);
> + free(locks);
> + free(delnames);
> + return 1;
> + }
> +
> + /* Perform updates first so live commits remain referenced: */
> + for (i=0; i < n; ++i)
> + if (!is_null_sha1(updates[i].new_sha1)) {
> + ret |= update_ref_write(action,
> + updates[i].ref_name,
> + updates[i].new_sha1,
> + locks[i], onerr);
> + locks[i] = 0; /* freed by update_ref_write */
> + }
> +
> + /* Perform deletes now that updates are safely completed: */
> + for (i=0; i < n; ++i)
> + if (locks[i]) {
> + delnames[delnum++] = locks[i]->ref_name;
> + ret |= delete_ref_loose(locks[i], types[i]);
> + }
> + ret |= repack_without_refs(delnames, delnum);
> + for (i=0; i < delnum; ++i)
> + unlink_or_warn(git_path("logs/%s", delnames[i]));
> + clear_loose_ref_cache(&ref_cache);
> + for (i=0; i < n; ++i)
> + if (locks[i])
> + unlock_ref(locks[i]);
> +
> + free(types);
> + free(locks);
> + free(delnames);
> + return ret;
> +}
> +
>  struct ref *find_ref_by_name(const struct ref *list, const char *name)
>  {
>   for ( ; list; list = list->next)
> diff --git a/refs.h b/refs.h
> index 2cd307a..5763f3a 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -214,6 +214,17 @@ int update_ref(const char *action, const char *refname,
>   const unsigned char *sha1, const unsigned char *oldval,
>   int flags, enum action_on_err onerr);
>  
> +struct ref_update {
> + const char *ref_name;
> + unsigned char new_sha1[20];
> + unsigned char *old_sha1;

This asymmetry is rather curious and will later become problematic
when we have more users of this API.  Maybe your callers happen have
static storage to keep old_sha1[] outside this structure but do not
have it for new_sha1[] for some unknown reason, which may not apply
to later callers we may want to add.

> + int flags;
> +};
> +
> +/** lock all refs and then write all of them */
> +int update_refs(const char *action, struct ref_update *updates,
> + int n, enum action_on_err onerr);
> +
>  extern int parse_hide_refs_config(const char *var, const char *value, const 
> cha