Re: [PATCHv2] refs.c: enable large transactions

2015-04-22 Thread Michael Haggerty
On 04/21/2015 09:06 PM, Stefan Beller wrote:
 This is another attempt on enabling large transactions
 (large in terms of open file descriptors). We keep track of how many
 lock files are opened by the ref_transaction_commit function.
 When more than a reasonable amount of files is open, we close
 the file descriptors to make sure the transaction can continue.
 
 Another idea I had during implementing this was to move this file
 closing into the lock file API, such that only a certain amount of
 lock files can be open at any given point in time and we'd be 'garbage
 collecting' open fds when necessary in any relevant call to the lock
 file API. This would have brought the advantage of having such
 functionality available in other users of the lock file API as well.
 The downside however is the over complication, you really need to always
 check for (lock-fd != -1) all the time, which may slow down other parts
 of the code, which did not ask for such a feature.

Aside from a missing error check (see below), this looks good to me.

 Signed-off-by: Stefan Beller sbel...@google.com
 ---
 
 * Removed unneeded braces in the condition to check if we want to close
   the lock file.
 * made the counter for the remaining fds an unsigned int. That is what   
   get_max_fd_limit() returns, so there are no concerns for an overflow.
   Also it cannot go below 0 any more.
 * moved the initialisation of the remaining_fds a bit down and added a 
 comment  
   
  refs.c| 21 +
  t/t1400-update-ref.sh |  4 ++--
  2 files changed, 23 insertions(+), 2 deletions(-)
  
  
 
 diff --git a/refs.c b/refs.c
 index 4f495bd..34cfcdf 100644
 --- a/refs.c
 +++ b/refs.c
 @@ -3041,6 +3041,8 @@ static int write_ref_sha1(struct ref_lock *lock,
   errno = EINVAL;
   return -1;
   }
 + if (lock-lk-fd == -1)
 + reopen_lock_file(lock-lk);

You should check that reopen_lock_file() was successful.

   if (write_in_full(lock-lk-fd, sha1_to_hex(sha1), 40) != 40 ||
   write_in_full(lock-lk-fd, term, 1) != 1 ||
   close_ref(lock)  0) {
 @@ -3718,6 +3720,7 @@ int ref_transaction_commit(struct ref_transaction 
 *transaction,
  struct strbuf *err)
  {
   int ret = 0, i;
 + unsigned int remaining_fds;
   int n = transaction-nr;
   struct ref_update **updates = transaction-updates;
   struct string_list refs_to_delete = STRING_LIST_INIT_NODUP;
 @@ -3733,6 +3736,20 @@ int ref_transaction_commit(struct ref_transaction 
 *transaction,
   return 0;
   }
  
 + /*
 +  * We need to open many files in a large transaction, so come up with
 +  * a reasonable maximum. We still keep some spares for stdin/out and
 +  * other open files. Experiments determined we need more fds when
 +  * running inside our test suite than directly in the shell. It's
 +  * unclear where these fds come from. 32 should be a reasonable large
 +  * number though.
 +  */
 + remaining_fds = get_max_fd_limit();
 + if (remaining_fds  32)
 + remaining_fds -= 32;
 + else
 + remaining_fds = 0;
 +
   /* Copy, sort, and reject duplicate refs */
   qsort(updates, n, sizeof(*updates), ref_update_compare);
   if (ref_update_reject_duplicates(updates, n, err)) {
 @@ -3762,6 +3779,10 @@ int ref_transaction_commit(struct ref_transaction 
 *transaction,
   update-refname);
   goto cleanup;
   }
 + if (remaining_fds  0)
 + remaining_fds--;
 + else
 + close_lock_file(update-lock-lk);

I consider this code a stopgap, and simplicity is more important than
optimization. But just for the sake of discussion, if we planned to keep
this code around, it could be improved by not wasting open file
descriptors for references that are only being verified or deleted, like so:

if (!(flags  REF_HAVE_NEW) ||
is_null_sha1(update-new_sha1) ||
remaining_fds == 0)
close_lock_file(update-lock-lk);
else
remaining_fds--;

   }
  
   /* Perform updates first so live commits remain referenced */
 [...]

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] refs.c: enable large transactions

2015-04-22 Thread Stefan Beller
On Wed, Apr 22, 2015 at 7:11 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 + if (lock-lk-fd == -1)
 + reopen_lock_file(lock-lk);

 You should check that reopen_lock_file() was successful.

ok


 @@ -3762,6 +3779,10 @@ int ref_transaction_commit(struct ref_transaction 
 *transaction,
   update-refname);
   goto cleanup;
   }
 + if (remaining_fds  0)
 + remaining_fds--;
 + else
 + close_lock_file(update-lock-lk);

 I consider this code a stopgap, and simplicity is more important than
 optimization.

Can you explain a bit why you think this is a stopgap?

Looking at the patch this looks simple to me, as there are no
huge pain points involved. (Compared to [1] as we'd change a lot in
that series)

Also this is pretty good on performance.
The small cases do not have an additional unneeded close and reopen,
but only the
larger cases do.

[1] 
http://git.661346.n2.nabble.com/PATCHv1-0-6-Fix-bug-in-large-transactions-tt7624363.html#a7624368




 But just for the sake of discussion, if we planned to keep
 this code around, it could be improved by not wasting open file
 descriptors for references that are only being verified or deleted, like so:

I'll pick that up for the resend.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] refs.c: enable large transactions

2015-04-22 Thread Michael Haggerty
On 04/22/2015 09:09 PM, Stefan Beller wrote:
 On Wed, Apr 22, 2015 at 7:11 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 + if (lock-lk-fd == -1)
 + reopen_lock_file(lock-lk);

 You should check that reopen_lock_file() was successful.
 
 ok
 
 
 @@ -3762,6 +3779,10 @@ int ref_transaction_commit(struct ref_transaction 
 *transaction,
   update-refname);
   goto cleanup;
   }
 + if (remaining_fds  0)
 + remaining_fds--;
 + else
 + close_lock_file(update-lock-lk);

 I consider this code a stopgap, and simplicity is more important than
 optimization.
 
 Can you explain a bit why you think this is a stopgap?

At the point the lockfile is created, we have all the information we
need to write the new SHA-1 to it and close it immediately. It seems
more straightforward to do it that way than the way it is done in the
current code, where the locking and writing are separated in time and
space and now there is the small extra complication of
maybe-closing-maybe-not. But getting to the final destination requires
more refactoring than would be prudent for the upcoming release.

In other words, I think your fix is OK but that the whole area of code
has still not reached its final form. I am working on a patch series
that does what I have in mind, but it's not ready yet. As I remember I
got stuck when I realized that the reflog for HEAD is updated somewhere
out of the blue without proper locking and I haven't gotten around to
sorting it out yet.

 [...]
 But just for the sake of discussion, if we planned to keep
 this code around, it could be improved by not wasting open file
 descriptors for references that are only being verified or deleted, like so:
 
 I'll pick that up for the resend.

OK.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] refs.c: enable large transactions

2015-04-21 Thread Stefan Beller

 * Removed unneeded braces in the condition to check if we want to close
   the lock file.
 * made the counter for the remaining fds an unsigned int. That is what
   get_max_fd_limit() returns, so there are no concerns for an overflow.
   Also it cannot go below 0 any more.
 * moved the initialisation of the remaining_fds a bit down and added a comment

* Once again this replaces the last patch on top of
origin/sb/remove-fd-from-ref-lock
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html