Re: [PATCH] Enable index-pack threading in msysgit.

2014-03-20 Thread Stefan Zager
On Thu, Mar 20, 2014 at 10:21 PM, Duy Nguyen  wrote:
> On Fri, Mar 21, 2014 at 08:51:18AM +0700, Duy Nguyen wrote:
>> On Thu, Mar 20, 2014 at 11:08 PM, Stefan Zager  wrote:
>> > Duy, would you like to re-post your patch without the new pread 
>> > implementation?
>>
>> I will but let me try out the sliding window idea first. My quick
>> tests on git.git show me we may only need 21k mmap instead of 177k
>> pread. That hints some potential performance improvement.
>
> The patch at the bottom reuses (un)use_pack() instead of pread(). The
> results on linux-2.6 do not look any different. I guess we can drop
> the idea.
>
> It makes me wonder, though, what's wrong a simple patch like this to
> make pread in index-pack thread-safe? It does not look any different
> either from the performance point of view, perhaps because
> unpack_data() reads small deltas most of the time

When you serialize disk access in this way, the effect on performance
is really dependent on the behavior of the OS, as well as the locality
of the read offsets.  Assuming -- fairly, I think -- that the reads
will be pretty randomly distributed (i.e., no locality to speak of),
then your best bet is get as many read operations in flight as
possible, and let the disk scheduler optimize the seek time.

If I have a chance, I will try out this patch in msysgit on the
chromium repositories.

>
> -- 8< --
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index a6b1c17..b91f4f8 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -40,11 +40,6 @@ struct base_data {
> int ofs_first, ofs_last;
>  };
>
> -#if !defined(NO_PTHREADS) && defined(NO_THREAD_SAFE_PREAD)
> -/* pread() emulation is not thread-safe. Disable threading. */
> -#define NO_PTHREADS
> -#endif
> -
>  struct thread_local {
>  #ifndef NO_PTHREADS
> pthread_t thread;
> @@ -175,6 +170,22 @@ static void cleanup_thread(void)
>  #endif
>
>
> +#if defined(NO_THREAD_SAFE_PREAD)
> +static inline ssize_t pread_safe(int fd, void *buf, size_t count, off_t off)
> +{
> +   int ret;
> +   read_lock();
> +   ret = pread(fd, buf, count, off);
> +   read_unlock();
> +   return ret;
> +}
> +#else
> +static inline ssize_t pread_safe(int fd, void *buf, size_t count, off_t off)
> +{
> +   return pread(fd, buf, count, off);
> +}
> +#endif
> +
>  static int mark_link(struct object *obj, int type, void *data)
>  {
> if (!obj)
> @@ -542,7 +553,7 @@ static void *unpack_data(struct object_entry *obj,
>
> do {
> ssize_t n = (len < 64*1024) ? len : 64*1024;
> -   n = pread(pack_fd, inbuf, n, from);
> +   n = pread_safe(pack_fd, inbuf, n, from);
> if (n < 0)
> die_errno(_("cannot pread pack file"));
> if (!n)
> -- 8< --
>
> And the sliding window patch for the list archive
>
> -- 8< --
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index a6b1c17..6f5c6d9 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -91,7 +91,8 @@ static off_t consumed_bytes;
>  static unsigned deepest_delta;
>  static git_SHA_CTX input_ctx;
>  static uint32_t input_crc32;
> -static int input_fd, output_fd, pack_fd;
> +static int input_fd, output_fd;
> +static struct packed_git *pack;
>
>  #ifndef NO_PTHREADS
>
> @@ -224,8 +225,10 @@ static unsigned check_objects(void)
>  static void flush(void)
>  {
> if (input_offset) {
> -   if (output_fd >= 0)
> +   if (output_fd >= 0) {
> write_or_die(output_fd, input_buffer, input_offset);
> +   pack->pack_size += input_offset;
> +   }
> git_SHA1_Update(&input_ctx, input_buffer, input_offset);
> memmove(input_buffer, input_buffer + input_offset, input_len);
> input_offset = 0;
> @@ -277,6 +280,10 @@ static void use(int bytes)
>
>  static const char *open_pack_file(const char *pack_name)
>  {
> +   pack = xmalloc(sizeof(*pack) + 1);
> +   memset(pack, 0, sizeof(*pack));
> +   pack->pack_name[0] = '\0';
> +
> if (from_stdin) {
> input_fd = 0;
> if (!pack_name) {
> @@ -288,13 +295,17 @@ static const char *open_pack_file(const char *pack_name)
> output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 
> 0600);
> if (output_fd < 0)
> die_errno(_("unable to create '%s'"), pack_name);
> -

Re: [PATCH] Enable index-pack threading in msysgit.

2014-03-20 Thread Stefan Zager
On Thu, Mar 20, 2014 at 2:35 PM, Karsten Blees  wrote:
> Am 20.03.2014 17:08, schrieb Stefan Zager:
>
>> Going forward, there is still a lot of performance that gets left on
>> the table when you rule out threaded file access.  There are not so
>> many calls to read, mmap, and pread in the code; it should be possible
>> to rationalize them and make them thread-safe -- at least, thread-safe
>> for posix-compliant systems and msysgit, which covers the great
>> majority of git users, I would hope.
>>
>
> IMO a "mostly" XSI compliant pread (or even the git_pread() emulation) is 
> still better than forbidding the use of read() entirely. Switching from read 
> to pread everywhere requires that all callers have to keep track of the file 
> position, which means a _lot_ of code changes (read/xread/strbuf_read is used 
> in ~70 places throughout git). And how do you plan to deal with platforms 
> that don't have a thread-safe pread (HP, Cygwin)?
>
> Considering all that, Duy's solution of opening separate file descriptors per 
> thread seems to be the best pattern for future multi-threaded work.

Does that mean you would endorse the (N threads) * (M pack files)
approach to threading checkout and status?  That seems kind of
crazy-town to me.  Not to mention that pack windows are not shared, so
this approach to multi-threading can have the side-effect of blowing
out memory consumption.  We have already had to dial back settings for
pack.threads and core.deltaBaseCacheLimit, because threaded index-pack
was causing OOM errors on 32-bit platforms.

Cygwin (and MSVC) should be able to share a "mostly" compliant pread
implementation.  I don't have any insight into NonstopKernel; does is
really not have a thread-safe pread implementation?  If so, then I
suppose we have to #ifdef NO_PREAD, just as we do now.

I realize that these are deep changes.  However, the performance of
msysgit on the chromium repositories is pretty awful, enough so to
motivate this work.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable index-pack threading in msysgit.

2014-03-20 Thread Stefan Zager
On Thu, Mar 20, 2014 at 6:54 AM, Karsten Blees  wrote:
> Am 19.03.2014 01:46, schrieb sza...@chromium.org:
>> This adds a Windows implementation of pread.  Note that it is NOT
>> safe to intersperse calls to read() and pread() on a file
>> descriptor.
>
> This is a bad idea. You're basically fixing the multi-threaded issue twice, 
> while at the same time breaking single-threaded read/pread interop on the 
> mingw and msvc platform. Users of pread already have to take care that its 
> not thread-safe on some platforms, now you're adding another breakage that 
> has to be considered in future development.
>
> The mingw_pread implementation in [1] is both thread-safe and allows mixing 
> read/pread in single-threaded scenarios, why not use this instead?
>
> [1] http://article.gmane.org/gmane.comp.version-control.git/242120


That's not thread-safe.  There is, presumably, a point between the
first and second calls to lseek64 when the implicit position pointer
is incorrect.  If another thread executes its first call to lseek64 at
that time, then the file descriptor may end up with an incorrect
position pointer after all threads have finished.

> Duy's patch alone enables multi-threaded index-pack on all platforms 
> (including cygwin), so IMO this should be a separate patch.

Fair enough, and it's a good first step.  I would love to see it
landed soon.  On the Chrome project, we're currently distributing a
patched version of msysgit; I would very much like for us to stop
doing that, but the performance penalty is too significant right now.

Duy, would you like to re-post your patch without the new pread implementation?

Going forward, there is still a lot of performance that gets left on
the table when you rule out threaded file access.  There are not so
many calls to read, mmap, and pread in the code; it should be possible
to rationalize them and make them thread-safe -- at least, thread-safe
for posix-compliant systems and msysgit, which covers the great
majority of git users, I would hope.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Enable index-pack threading in msysgit.

2014-03-19 Thread Stefan Zager
This adds a Windows implementation of pread.  Note that it is NOT
safe to intersperse calls to read() and pread() on a file
descriptor.  According to the ReadFile spec, using the 'overlapped'
argument should not affect the implicit position pointer of the
descriptor.  Experiments have shown that this is, in fact, a lie.

To accomodate that fact, this change also incorporates:

http://article.gmane.org/gmane.comp.version-control.git/196042

... which gives each index-pack thread its own file descriptor.

Signed-off-by: Stefan Zager 
---
 builtin/index-pack.c | 30 --
 compat/mingw.c   | 37 -
 compat/mingw.h   |  3 +++
 config.mak.uname |  1 -
 4 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 2f37a38..63b8b0e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -40,17 +40,17 @@ struct base_data {
int ofs_first, ofs_last;
 };
 
-#if !defined(NO_PTHREADS) && defined(NO_THREAD_SAFE_PREAD)
-/* pread() emulation is not thread-safe. Disable threading. */
-#define NO_PTHREADS
-#endif
-
 struct thread_local {
 #ifndef NO_PTHREADS
pthread_t thread;
 #endif
struct base_data *base_cache;
size_t base_cache_used;
+/*
+ * To accomodate platforms that have pthreads, but don't have a
+ * thread-safe pread, give each thread its own file descriptor.
+ */
+   int pack_fd;
 };
 
 /*
@@ -91,7 +91,8 @@ static off_t consumed_bytes;
 static unsigned deepest_delta;
 static git_SHA_CTX input_ctx;
 static uint32_t input_crc32;
-static int input_fd, output_fd, pack_fd;
+static const char *curr_pack;
+static int input_fd, output_fd;
 
 #ifndef NO_PTHREADS
 
@@ -134,6 +135,7 @@ static inline void unlock_mutex(pthread_mutex_t *mutex)
  */
 static void init_thread(void)
 {
+   int i;
init_recursive_mutex(&read_mutex);
pthread_mutex_init(&counter_mutex, NULL);
pthread_mutex_init(&work_mutex, NULL);
@@ -141,11 +143,17 @@ static void init_thread(void)
pthread_mutex_init(&deepest_delta_mutex, NULL);
pthread_key_create(&key, NULL);
thread_data = xcalloc(nr_threads, sizeof(*thread_data));
+   for (i = 0; i < nr_threads; i++) {
+   thread_data[i].pack_fd = open(curr_pack, O_RDONLY);
+   if (thread_data[i].pack_fd == -1)
+   die_errno("unable to open %s", curr_pack);
+   }
threads_active = 1;
 }
 
 static void cleanup_thread(void)
 {
+   int i;
if (!threads_active)
return;
threads_active = 0;
@@ -155,6 +163,8 @@ static void cleanup_thread(void)
if (show_stat)
pthread_mutex_destroy(&deepest_delta_mutex);
pthread_key_delete(key);
+   for (i = 0; i < nr_threads; i++)
+   close(thread_data[i].pack_fd);
free(thread_data);
 }
 
@@ -288,13 +298,13 @@ static const char *open_pack_file(const char *pack_name)
output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 
0600);
if (output_fd < 0)
die_errno(_("unable to create '%s'"), pack_name);
-   pack_fd = output_fd;
+   nothread_data.pack_fd = output_fd;
} else {
input_fd = open(pack_name, O_RDONLY);
if (input_fd < 0)
die_errno(_("cannot open packfile '%s'"), pack_name);
output_fd = -1;
-   pack_fd = input_fd;
+   nothread_data.pack_fd = input_fd;
}
git_SHA1_Init(&input_ctx);
return pack_name;
@@ -542,7 +552,7 @@ static void *unpack_data(struct object_entry *obj,
 
do {
ssize_t n = (len < 64*1024) ? len : 64*1024;
-   n = pread(pack_fd, inbuf, n, from);
+   n = pread(get_thread_data()->pack_fd, inbuf, n, from);
if (n < 0)
die_errno(_("cannot pread pack file"));
if (!n)
@@ -1490,7 +1500,7 @@ static void show_pack_info(int stat_only)
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
-   const char *curr_pack, *curr_index;
+   const char *curr_index;
const char *index_name = NULL, *pack_name = NULL;
const char *keep_name = NULL, *keep_msg = NULL;
char *index_name_buf = NULL, *keep_name_buf = NULL;
diff --git a/compat/mingw.c b/compat/mingw.c
index 383cafe..0efc570 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -329,7 +329,42 @@ int mingw_mkdir(const char *path, int mode)
return ret;
 }
 
-int mingw_open (const char *filename, int oflags, ...)
+
+/*
+ * Warning: contrary to the specificiation, when ReadFile() is called
+ * with an

Re: [PATCH] Enable index-pack threading in msysgit.

2014-03-19 Thread Stefan Zager
On Wed, Mar 19, 2014 at 9:57 AM, Stefan Zager  wrote:
>>
>> I still don't understand how compat/pread.c does not work with pack_fd
>> per thread. I don't have Windows to test, but I forced compat/pread.c
>> on on Linux with similar pack_fd changes and it worked fine, helgrind
>> only complained about progress.c.
>>
>> A pread() implementation that is thread-safe with condition sounds
>> like an invite for trouble later. And I don't think converting read()
>> to pread() is a good idea. Platforms that rely on pread() will hit
>> first because of more use of compat/pread.c. read() seeks while
>> pread() does not, so we have to audit more code..
>
> Using one fd per thread is all well and good for something like
> index-pack, which only accesses a single pack file.  But using that
> heuristic to add threading elsewhere is probably not going to work.
> For example, I have a patch in progress to add threading to checkout,
> and another one planned to add threading to status.  In both cases, we
> would need one fd per thread per pack file, which is pretty
> ridiculous.
>
> There really aren't very many calls to read() in the code.  I don't
> think it would be very difficult to eliminate the remaining ones.  The
> more interesting question, I think is: what platforms still don't have
> a thread-safe pread implementation?

I don't want to go too deep down the rabbit hole here.  We don't have
to solve the read() vs. pread() issue once for all right now; that can
wait for another day.  The pread() implementation in this patch is
certainly no worse than the one in compat/pread.c.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable index-pack threading in msysgit.

2014-03-19 Thread Stefan Zager
On Wed, Mar 19, 2014 at 3:28 AM, Duy Nguyen  wrote:
> On Wed, Mar 19, 2014 at 2:50 PM, Stefan Zager  wrote:
>>
>> I suppose it would be possible to fix the immediate problem just by
>> using one fd per thread, without a new pread implementation.  But it
>> seems better overall to have a pread() implementation that is
>> thread-safe as long as read() and pread() aren't interspersed; and
>> then convert all existing read() calls to pread().  That would be a
>> good follow-up patch...
>
> I still don't understand how compat/pread.c does not work with pack_fd
> per thread. I don't have Windows to test, but I forced compat/pread.c
> on on Linux with similar pack_fd changes and it worked fine, helgrind
> only complained about progress.c.
>
> A pread() implementation that is thread-safe with condition sounds
> like an invite for trouble later. And I don't think converting read()
> to pread() is a good idea. Platforms that rely on pread() will hit
> first because of more use of compat/pread.c. read() seeks while
> pread() does not, so we have to audit more code..

Using one fd per thread is all well and good for something like
index-pack, which only accesses a single pack file.  But using that
heuristic to add threading elsewhere is probably not going to work.
For example, I have a patch in progress to add threading to checkout,
and another one planned to add threading to status.  In both cases, we
would need one fd per thread per pack file, which is pretty
ridiculous.

There really aren't very many calls to read() in the code.  I don't
think it would be very difficult to eliminate the remaining ones.  The
more interesting question, I think is: what platforms still don't have
a thread-safe pread implementation?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable index-pack threading in msysgit.

2014-03-19 Thread Stefan Zager
On Wed, Mar 19, 2014 at 12:30 AM, Duy Nguyen  wrote:
> On Wed, Mar 19, 2014 at 7:46 AM,   wrote:
>> This adds a Windows implementation of pread.  Note that it is NOT
>> safe to intersperse calls to read() and pread() on a file
>> descriptor.  According to the ReadFile spec, using the 'overlapped'
>> argument should not affect the implicit position pointer of the
>> descriptor.  Experiments have shown that this is, in fact, a lie.
>
> If I understand it correctly, new pread() is added because
> compat/pread.c does not work because of some other read() in between?
> Where are those read() (I can only see one in index-pack.c, but there
> could be some hidden read()..)

I *think* it's the call to fixup_pack_header_footer(), but I'm not 100% sure.

I suppose it would be possible to fix the immediate problem just by
using one fd per thread, without a new pread implementation.  But it
seems better overall to have a pread() implementation that is
thread-safe as long as read() and pread() aren't interspersed; and
then convert all existing read() calls to pread().  That would be a
good follow-up patch...


>
>>
>> To accomodate that fact, this change also incorporates:
>>
>> http://article.gmane.org/gmane.comp.version-control.git/196042
>>
>> ... which gives each index-pack thread its own file descriptor.
>> ---
>>  builtin/index-pack.c | 21 -
>>  compat/mingw.c   | 31 ++-
>>  compat/mingw.h   |  3 +++
>>  config.mak.uname |  1 -
>>  4 files changed, 49 insertions(+), 7 deletions(-)
>>
>> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
>> index 2f37a38..c02dd4c 100644
>> --- a/builtin/index-pack.c
>> +++ b/builtin/index-pack.c
>> @@ -51,6 +51,7 @@ struct thread_local {
>>  #endif
>> struct base_data *base_cache;
>> size_t base_cache_used;
>> +   int pack_fd;
>>  };
>>
>>  /*
>> @@ -91,7 +92,8 @@ static off_t consumed_bytes;
>>  static unsigned deepest_delta;
>>  static git_SHA_CTX input_ctx;
>>  static uint32_t input_crc32;
>> -static int input_fd, output_fd, pack_fd;
>> +static const char *curr_pack;
>> +static int input_fd, output_fd;
>>
>>  #ifndef NO_PTHREADS
>>
>> @@ -134,6 +136,7 @@ static inline void unlock_mutex(pthread_mutex_t *mutex)
>>   */
>>  static void init_thread(void)
>>  {
>> +   int i;
>> init_recursive_mutex(&read_mutex);
>> pthread_mutex_init(&counter_mutex, NULL);
>> pthread_mutex_init(&work_mutex, NULL);
>> @@ -141,11 +144,17 @@ static void init_thread(void)
>> pthread_mutex_init(&deepest_delta_mutex, NULL);
>> pthread_key_create(&key, NULL);
>> thread_data = xcalloc(nr_threads, sizeof(*thread_data));
>> +   for (i = 0; i < nr_threads; i++) {
>> +   thread_data[i].pack_fd = open(curr_pack, O_RDONLY);
>> +   if (thread_data[i].pack_fd == -1)
>> +   die_errno("unable to open %s", curr_pack);
>> +   }
>> threads_active = 1;
>>  }
>>
>>  static void cleanup_thread(void)
>>  {
>> +   int i;
>> if (!threads_active)
>> return;
>> threads_active = 0;
>> @@ -155,6 +164,8 @@ static void cleanup_thread(void)
>> if (show_stat)
>> pthread_mutex_destroy(&deepest_delta_mutex);
>> pthread_key_delete(key);
>> +   for (i = 0; i < nr_threads; i++)
>> +   close(thread_data[i].pack_fd);
>> free(thread_data);
>>  }
>>
>> @@ -288,13 +299,13 @@ static const char *open_pack_file(const char 
>> *pack_name)
>> output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 
>> 0600);
>> if (output_fd < 0)
>> die_errno(_("unable to create '%s'"), pack_name);
>> -   pack_fd = output_fd;
>> +   nothread_data.pack_fd = output_fd;
>> } else {
>> input_fd = open(pack_name, O_RDONLY);
>> if (input_fd < 0)
>> die_errno(_("cannot open packfile '%s'"), pack_name);
>> output_fd = -1;
>> -   pack_fd = input_fd;
>> +   nothread_data.pack_fd = input_fd;
>> }
>> git_SHA1_Init(&input_ctx);
>> return pack_name;
>> @@ -542,7 +553,7 @@ static void *unpack_data(struct object_entry *obj,
>>
>> do {
>> ssize_t n = (len < 64*1024) ? len : 64*1024;
>> -   n = pread(pack_fd, inbuf, n, from);
>> +   n = pread(get_thread_data()->pack_fd, inbuf, n, from);
>> if (n < 0)
>> die_errno(_("cannot pread pack file"));
>> if (!n)
>> @@ -1490,7 +1501,7 @@ static void show_pack_info(int stat_only)
>>  int cmd_index_pack(int argc, const char **argv, const char *prefix)
>>  {
>> int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
>> -   const char *curr_pack, *curr_index;
>> +   const char *curr_index;
>> const char *index_name = NULL, *p

Re: Make the git codebase thread-safe

2014-02-14 Thread Stefan Zager
On Fri, Feb 14, 2014 at 4:45 PM, Duy Nguyen  wrote:
> On Sat, Feb 15, 2014 at 2:16 AM, Zachary Turner  wrote:
>> (Gah, sorry if you're receiving multiple emails to your personal
>> addresses, I need to get used to manually setting Plain-text mode
>> every time I send a message).
>>
>> For the mixed read, we wouldn't be looking for another caller of
>> pread() (since it doesn't care what the file pointer is), but instead
>> a caller of read() or lseek() (since those do depend on the current
>> file pointer).  In index-pack.c, I see two possible culprits:
>>
>> 1) A call to xread() from inside fill()
>> 2) A call to lseek in parse_pack_objects()
>>
>> Do you think these could be related?  If so, maybe that opens up some
>> other solutions?
>
> For index-pack alone, what's wrong with open one file handle per thread?

Nothing wrong with that, except that it would mean either using
thread-local storage (which the code doesn't currently use); or
plumbing pack_fd through the call stack, which doesn't sound very fun.

Stefan

> --
> Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-14 Thread Stefan Zager
On Fri, Feb 14, 2014 at 11:04 AM, Karsten Blees  wrote:
>
> Damn...you're right, multi-threaded git-index-pack works fine, but some tests 
> fail badly. Mixed reads would have to be from git_mmap, which is the only 
> other caller of pread().

msysgit used git_mmap() as defined in compat/win32mmap.c, which does
not use pread.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-14 Thread Stefan Zager
On Fri, Feb 14, 2014 at 11:15 AM, Zachary Turner  wrote:
> For the mixed read, we wouldn't be looking for another caller of pread()
> (since it doesn't care what the file pointer is), but instead a caller of
> read() or lseek().  In index-pack.c, I see two possible culprits:
>
> 1) A call to xread() from inside fill()
> 2) A call to lseek in parse_pack_objects()
>
> Do you think these could be related?  If so, maybe that opens up some other
> solutions?

>From my observations, it's not that simple.  As you pointed out to me
before, fill() is only called before the threading part of the code,
and lseek is only called after the threading part; and the lseek() is
lseek(fd, 0, SEEK_CUR), so it's purely advisory.

Also, here is the error output we got before you added ReOpenFile:

remote: Total 2514467 (delta 1997300), reused 2513040 (delta 1997113)
Checking connectivity... error: packfile
d:/src/chromium2/_gclient_src_a6y1bf/.git/objects/pack/pack-3b8d06040ac37f14d0b43859926f1153fea61a7a.pack
does not match index
warning: packfile
d:/src/chromium2/_gclient_src_a6y1bf/.git/objects/pack/pack-3b8d06040ac37f14d0b43859926f1153fea61a7a.pack
cannot be accessed
error: packfile
d:/src/chromium2/_gclient_src_a6y1bf/.git/objects/pack/pack-3b8d06040ac37f14d0b43859926f1153fea61a7a.pack
does not match index
warning: packfile
d:/src/chromium2/_gclient_src_a6y1bf/.git/objects/pack/pack-3b8d06040ac37f14d0b43859926f1153fea61a7a.pack
cannot be accessed
error: packfile
d:/src/chromium2/_gclient_src_a6y1bf/.git/objects/pack/pack-3b8d06040ac37f14d0b43859926f1153fea61a7a.pack
does not match index
warning: packfile
d:/src/chromium2/_gclient_src_a6y1bf/.git/objects/pack/pack-3b8d06040ac37f14d0b43859926f1153fea61a7a.pack
cannot be accessed
fatal: bad object e0f9f23f765a45e6d80863a8f881ee735c9347fe


The 'Checking connectivity...' message comes from builtin/clone.c,
which runs in a separate process from builtin/index-pack.c.  What this
suggests to me is that file descriptors for the loose object files are
not being flushed or closed properly before index-pack finishes.


> BTW, the version you posted isn't thread safe.  Suppose thread A and thread
> B execute this function at the same time.  A executes through the
> ReadFile(), but does not yet reset the second lseek64.  B then executes the
> first lseek64(), storing off the modified file pointer.  Then A finishes,
> then B finishes.  At the end, the file pointer is still modified.

Yes, that.  I would also point out that in our experiments, ReOpenFile
is not nearly as expensive as its name might suggest.  Since the
solution using ReOpenFile is pretty solidly thread-safe (at least as
far as we can tell), I'm in favor of using it unless or until we
properly root-case the failure.


Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Make the global packed_git variable static to sha1_file.c.

2014-02-13 Thread Stefan Zager
I uploaded a new patch; a few comments inline below...

On Wed, Feb 12, 2014 at 1:19 PM, Junio C Hamano  wrote:
> sza...@chromium.org writes:
>
> Also I'd suggest s/pack_data_fn/collect_pack_data/ or something.
> "_fn" may be a good suffix for typedef'ed typename used in a
> callback function, but for a concrete function, it only adds noise
> without giving us anything new.

Fixed this everywhere.

>>  static struct cached_object *find_cached_object(const unsigned char *sha1)
>> @@ -468,7 +469,6 @@ static unsigned int pack_open_fds;
>>  static unsigned int pack_max_fds;
>>  static size_t peak_pack_mapped;
>>  static size_t pack_mapped;
>> -struct packed_git *packed_git;
>
> Hmm, any particular reason why only this variable and not others are
> moved up?

No, just need packed_git declared before use.  I moved all the static
variables up, for clarity.

>> + foreach_packed_git(find_pack_fn, NULL, &fpd);
>> + if (fpd.found_pack && !exclude &&
>> + (incremental ||
>> +  (local && fpd.found_non_local_pack) ||
>> +  (ignore_packed_keep && fpd.found_pack_keep)))
>> + return 0;
>
> When told to do --incremental, we used to return 0 from this
> function immediately once we find the object in one pack, without
> going thru the list of packs.  Now we let foreach to loop thru all
> of them and then return 0.  Does this difference matter?  A similar
> difference may exist for local/keep but I did not think it through.

Fixed.



Thanks,

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-13 Thread Stefan Zager
On Thu, Feb 13, 2014 at 2:51 PM, Karsten Blees  wrote:
> Am 13.02.2014 19:38, schrieb Zachary Turner:
>
>> The only reason ReOpenFile is necessary at
>> all is because some code somewhere is mixing read-styles against the same
>> fd.
>>
>
> I don't understand...ReadFile with OVERLAPPED parameter doesn't modify the 
> HANDLE's file position, so you should be able to mix read()/pread() however 
> you like (as long as read() is only called from one thread).

That is, apparently, a bald-faced lie in the ReadFile API doc.  First
implementation didn't use ReOpenFile, and it crashed all over the
place.  ReOpenFile fixed it.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-13 Thread Stefan Zager
On Thu, Feb 13, 2014 at 12:27 AM, Johannes Sixt  wrote:
> Am 2/12/2014 20:30, schrieb Stefan Zager:
>> On Wed, Feb 12, 2014 at 11:22 AM, Karsten Blees  
>> wrote:
>>> Am 12.02.2014 19:37, schrieb Erik Faye-Lund:
>>>> ReOpenFile, that's fantastic. Thanks a lot!
>>>
>>> ...but should be loaded dynamically via GetProcAddress, or are we ready to 
>>> drop XP support?
>>
>> Right, that is an issue.  From our perspective, it's well past time to
>> drop XP support.
>
> Not from mine.

All this really means is that the build config will test WIN_VER, and
there will need to be an additional binary distribution of msysgit for
newer Windows.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager
On Wed, Feb 12, 2014 at 12:06 PM, Junio C Hamano  wrote:
> Stefan Zager  writes:
>
>> ...  I used the Very Sleepy profiler
>> to see where all the time was spent on Windows: 55% of the time was
>> spent in OpenFile, and 25% in CloseFile (both in win32).
>
> This is somewhat interesting.
>
> When we check things out, checkout_paths() has a list of paths to be
> checked out, and iterates over them and call checkout_entry().
>
> I wonder if you can:
>
>  - introduce a version of checkout_entry() that takes file
>descriptors to write to;
>
>  - have an asynchronous helper threads that pre-open the paths to be
>written out and feed  to a
>queue;
>
>  - restructure that loop so that it reads the to be written> from the queue, performs the actual writing out,
>and then feeds  to another queue; and
>
>  - have another asynchronous helper threads that reads descriptor to be closed> from the queue and close them.
>
> Calls to write (and preparation of data to be written) will then
> remain single-threaded, but it sounds like that codepath is not the
> bottleneck in your measurement, so

Yes, I considered that as well.  At a minimum, that would still
require attr.c to implement thread locking, since attribute files must
be parsed to look for stream filters.  I have already done that work.

But I'm not sure it's the best long-term approach to add convoluted
custom threading solutions to each git operation as it appears on the
performance radar.  I'm hoping to make the entire code base more
thread-friendly, so that threading can be added in a more natural and
idiomatic (and less painful) way.

For example, the most natural way to add threading to checkout would
be in the loops over the index in check_updates() in unpack-trees.c.
If attr.c and sha1_file.c were thread-safe, then it would be possible
to thread checkout entirely in check_updates(), with a pretty compact
code change.  I have already done the work in attr.c; sha1_file.c is
hairier, but do-able.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager
On Wed, Feb 12, 2014 at 11:22 AM, Karsten Blees  wrote:
> Am 12.02.2014 19:37, schrieb Erik Faye-Lund:
>> On Wed, Feb 12, 2014 at 7:34 PM, Stefan Zager  wrote:
>>> On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund  
>>> wrote:
>>>> On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager  wrote:
>>>>>
>>>>> I don't want to steal the thunder of my coworker, who wrote the
>>>>> implementation.  He plans to submit it upstream soon-ish.  It relies
>>>>> on using the lpOverlapped argument to ReadFile(), with some additional
>>>>> tomfoolery to make sure that the implicit position pointer for the
>>>>> file descriptor doesn't get modified.
>>>>
>>>> Is the code available somewhere? I'm especially interested in the
>>>> "additional tomfoolery to make sure that the implicit position pointer
>>>> for the file descriptor doesn't get modified"-part, as this was what I
>>>> ended up butting my head into when trying to do this myself.
>>>
>>> https://chromium-review.googlesource.com/#/c/186104/
>>
>> ReOpenFile, that's fantastic. Thanks a lot!
>
> ...but should be loaded dynamically via GetProcAddress, or are we ready to 
> drop XP support?

Right, that is an issue.  From our perspective, it's well past time to
drop XP support.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager
On Wed, Feb 12, 2014 at 10:50 AM, David Kastrup  wrote:
> Stefan Zager  writes:
>
>> Anything on Windows that touches a lot of files is miserable due to
>> the usual file system slowness on Windows, and luafv.sys (the UAC file
>> virtualization driver) seems to make it much worse.
>
> There is an obvious solution here...  Dedicated hardware is not that
> expensive.  Virtualization will always have a price.

Not sure I follow you.  We need to support people developing,
building, and testing on natively Windows machines.  And we need to
support users with reasonable hardware, including spinning disks.  If
we were only interested in optimizing for Google employees, each of
whom has one or more small nuclear reactors under their desk, this
would be easy.

>> Blame is something that chromium and blink developers use heavily, and
>> it is not unusual for a blame invocation on the blink repository to
>> run for 30 seconds.  It seems like it should be possible to
>> parallelize blame, but it requires pack file operations to be
>> thread-safe.
>
> Really, give the above patch a try.  I am taking longer to finish it
> than anticipated (with a lot due to procrastination but that is,
> unfortunately, a large part of my workflow), and it's cutting into my
> "paychecks" (voluntary donations which to a good degree depend on timely
> and nontrivial progress reports for my freely available work on GNU
> LilyPond).

I will give that a try.  How much of a performance improvement have you clocked?

> Note that it looks like the majority of the remaining time on GNU/Linux
> tends to be spent in system time: I/O time, memory management.  And I
> have an SSD drive.  When using packed repositories of considerable size,
> decompression comes into play as well.  I don't think that you can hope
> to get noticeably higher I/O throughput by multithreading, so really,
> really, really consider dedicated hardware running on a native Linux
> file system.

I have a background in hardware, and I have much more faith in modern
disk schedulers :)

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager
On Wed, Feb 12, 2014 at 10:33 AM, Matthieu Moy
 wrote:
> Stefan Zager  writes:
>
>> I'm optimistic that parallelizing the stat calls will yield a further
>> improvement.
>
> It has already been mentionned in the thread, but in case you overlooked
> it: did you look at core.preloadindex? It seems at least very close to
> what you want.

Ah yes, sorry, I overlooked that.  We have indeed turned on
core.preloadindex, and it does indeed speed up status.  That speedup
is reflected in my previous comments about our observations working
with chromium and blink.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager
On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund  wrote:
> On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager  wrote:
>>
>> I don't want to steal the thunder of my coworker, who wrote the
>> implementation.  He plans to submit it upstream soon-ish.  It relies
>> on using the lpOverlapped argument to ReadFile(), with some additional
>> tomfoolery to make sure that the implicit position pointer for the
>> file descriptor doesn't get modified.
>
> Is the code available somewhere? I'm especially interested in the
> "additional tomfoolery to make sure that the implicit position pointer
> for the file descriptor doesn't get modified"-part, as this was what I
> ended up butting my head into when trying to do this myself.

https://chromium-review.googlesource.com/#/c/186104/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Make the global packed_git variable static to sha1_file.c.

2014-02-12 Thread Stefan Zager
On Tue, Feb 11, 2014 at 11:29 PM, Chris Packham  wrote:
> Hi,
>
> On 12/02/14 14:57, Stefan Zager wrote:
>> From b4796d9d99c03b0b7cddd50808a41413e45f1129 Mon Sep 17 00:00:00 2001
>> From: Stefan Zager 
>> Date: Mon, 10 Feb 2014 16:55:12 -0800
>> Subject: [PATCH] Make the global packed_git variable static to sha1_file.c.
>
> I'm not really qualified to comment on substance but there are some
> basic style issues w.r.t. whitespace namely using 4 spaces for indent
> and mixing tabs/spaces. This might seem pedantic for the first round of
> a patch but it does put off reviewers.
>
> From Documentation/CodingGuidelines:
>
>  - We use tabs to indent, and interpret tabs as taking up to
>8 spaces.

My bad, I will upload a fixed patch.  In my defense: I edited the code
in emacs and then ran "M-x tabify" over the entire file.  But that had
the unfortunate side effect of adding a bunch of whitespace-only
changes to the diff, illuminating the fact that there is already mixed
whitespace in the existing code.  So I had to go back and selectively
tabify my changes, and I clearly missed a bunch.

If anyone has a recommendation for a less labor-intensive way to do
this in emacs, I'd be very grateful.

Thanks,

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager
On Wed, Feb 12, 2014 at 3:59 AM, Erik Faye-Lund  wrote:
> On Wed, Feb 12, 2014 at 2:54 AM, Stefan Zager  wrote:
>>
>> We are particularly concerned with the performance of msysgit, and we
>> have already chalked up a significant performance gain by turning on
>> the threading code in pack-objects (which was already enabled for
>> posix platforms, but not on msysgit, owing to the lack of a correct
>> pread implementation).
>
> How did you manage to do this? I'm not aware of any way to implement
> pread on Windows (without going down the insanity-path of wrapping and
> potentially locking inside every IO operation)...

I don't want to steal the thunder of my coworker, who wrote the
implementation.  He plans to submit it upstream soon-ish.  It relies
on using the lpOverlapped argument to ReadFile(), with some additional
tomfoolery to make sure that the implicit position pointer for the
file descriptor doesn't get modified.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager
On Tue, Feb 11, 2014 at 7:43 PM, Duy Nguyen  wrote:
>
> From v1.9 shallow clone should work for all push/pull/clone... so
> history depth does not matter (on the client side). As for
> gentoo-x86's large worktree, using index v4 and avoid full-tree
> operations (e.g. "status .", not "status"..) should make all
> operations reasonably fast. I plan to make "status" fast even without
> path limiting with the help of inotify, but that's not going to be
> finished soon. Did I miss anything else?

Chromium developers frequently want to run status over their entire
checkout, and a lot of them run 'git commit -a'.  We want to do
everything possible to speed this up.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager
On Tue, Feb 11, 2014 at 6:11 PM, Duy Nguyen  wrote:
>
> I have no comments about thread safety improvements (well, not yet).
> If you have investigated about git performance on chromium
> repositories, could you please sum it up? Threading may be an option
> to improve performance, but it's probably not the only option.

Well, the painful operations that we use frequently are pack-objects,
checkout, status, and blame.  Anything on Windows that touches a lot
of files is miserable due to the usual file system slowness on
Windows, and luafv.sys (the UAC file virtualization driver) seems to
make it much worse.

With threading turned on, pack-objects on Windows now takes about
twice as long as on Linux, which is still more than a 2x improvement
over the non-threaded operation.

Checkout is really bad on Windows.  The blink repository is ~200K
files, and a full clean checkout from the index takes about 10 seconds
on Linux, and about 3:30 on Windows.  I used the Very Sleepy profiler
to see where all the time was spent on Windows: 55% of the time was
spent in OpenFile, and 25% in CloseFile (both in win32).  My immediate
goal is to add threading to checkout, so those file system calls can
be done in parallel.

Enabling the fscache speeds up status quite a bit.  I'm optimistic
that parallelizing the stat calls will yield a further improvement.
Beyond that, it may not be possible to do much more without using a
file system watcher daemon, like facebook does with mercurial.
(https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/)

Blame is something that chromium and blink developers use heavily, and
it is not unusual for a blame invocation on the blink repository to
run for 30 seconds.  It seems like it should be possible to
parallelize blame, but it requires pack file operations to be
thread-safe.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Make the global packed_git variable static to sha1_file.c.

2014-02-11 Thread Stefan Zager
>From b4796d9d99c03b0b7cddd50808a41413e45f1129 Mon Sep 17 00:00:00 2001
From: Stefan Zager 
Date: Mon, 10 Feb 2014 16:55:12 -0800
Subject: [PATCH] Make the global packed_git variable static to sha1_file.c.

This is a first step in making the codebase thread-safe.  By and
large, the operations which might benefit from threading are those
that work with pack files (e.g., checkout, blame), so the focus of
this patch is stop leaking the global list of pack files outside of
sha1_file.c.

The next step will be to control access to the list of pack files
with a mutex.  However, that alone is not enough to make pack file
access thread safe.  Even in a read-only operation, the window list
associated with each pack file will need to be controlled.
Additionally, the global counters in sha1_file.c will need to be
controlled.

This patch is a pure refactor with no functional changes, so it
shouldn't require any additional tests.  Adding the actual locks
will be a functional change, and will require additional tests.

Signed-off-by: Stefan Zager 
---
 builtin/count-objects.c  |  44 ++-
 builtin/fsck.c   |  46 +++-
 builtin/gc.c |  26 +++
 builtin/pack-objects.c   | 188 ---
 builtin/pack-redundant.c |  37 +++---
 cache.h  |  16 +++-
 fast-import.c|   4 +-
 http-backend.c   |  28 ---
 http-push.c  |   4 +-
 http-walker.c|   2 +-
 pack-revindex.c  |  20 ++---
 server-info.c|  35 +
 sha1_file.c  |  35 -
 sha1_name.c  |  18 -
 14 files changed, 315 insertions(+), 188 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index a7f70cb..5a8e487 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -83,14 +83,32 @@ static char const * const count_objects_usage[] = {
NULL
 };
 
+struct pack_data {
+unsigned long packed;
+off_t size_pack;
+unsigned long num_pack;
+};
+
+int pack_data_fn(struct packed_git *p, void *data)
+{
+struct pack_data *pd = (struct pack_data *) data;
+if (p->pack_local && !open_pack_index(p)) {
+   pd->packed += p->num_objects;
+   pd->size_pack += p->pack_size + p->index_size;
+   pd->num_pack++;
+}
+return 1;
+}
+
 int cmd_count_objects(int argc, const char **argv, const char *prefix)
 {
int i, verbose = 0, human_readable = 0;
const char *objdir = get_object_directory();
int len = strlen(objdir);
char *path = xmalloc(len + 50);
-   unsigned long loose = 0, packed = 0, packed_loose = 0;
+   unsigned long loose = 0, packed_loose = 0;
off_t loose_size = 0;
+struct pack_data pd = {0,0,0};
struct option opts[] = {
OPT__VERBOSE(&verbose, N_("be verbose")),
OPT_BOOL('H', "human-readable", &human_readable,
@@ -118,41 +136,29 @@ int cmd_count_objects(int argc, const char **argv, const 
char *prefix)
closedir(d);
}
if (verbose) {
-   struct packed_git *p;
-   unsigned long num_pack = 0;
-   off_t size_pack = 0;
struct strbuf loose_buf = STRBUF_INIT;
struct strbuf pack_buf = STRBUF_INIT;
struct strbuf garbage_buf = STRBUF_INIT;
-   if (!packed_git)
-   prepare_packed_git();
-   for (p = packed_git; p; p = p->next) {
-   if (!p->pack_local)
-   continue;
-   if (open_pack_index(p))
-   continue;
-   packed += p->num_objects;
-   size_pack += p->pack_size + p->index_size;
-   num_pack++;
-   }
+   prepare_packed_git();
+   foreach_packed_git(pack_data_fn, NULL, &pd);
 
if (human_readable) {
strbuf_humanise_bytes(&loose_buf, loose_size);
-   strbuf_humanise_bytes(&pack_buf, size_pack);
+   strbuf_humanise_bytes(&pack_buf, pd.size_pack);
strbuf_humanise_bytes(&garbage_buf, size_garbage);
} else {
strbuf_addf(&loose_buf, "%lu",
(unsigned long)(loose_size / 1024));
strbuf_addf(&pack_buf, "%lu",
-   (unsigned long)(size_pack / 1024));
+   (unsigned long)(pd.size_pack / 1024));
strbuf_addf(&garbage_buf, "%lu",
(unsigned long)(size_garbage / 1024));
}
 
printf(&quo

Make the git codebase thread-safe

2014-02-11 Thread Stefan Zager
We in the chromium project have a keen interest in adding threading to
git in the pursuit of performance for lengthy operations (checkout,
status, blame, ...).  Our motivation comes from hitting some
performance walls when working with repositories the size of chromium
and blink:

https://chromium.googlesource.com/chromium/src
https://chromium.googlesource.com/chromium/blink

We are particularly concerned with the performance of msysgit, and we
have already chalked up a significant performance gain by turning on
the threading code in pack-objects (which was already enabled for
posix platforms, but not on msysgit, owing to the lack of a correct
pread implementation).

To this end, I'd like to start submitting patches that make the code
base generally more thread-safe and thread-friendly.  Right after this
email, I'm going to send the first such patch, which makes the global
list of pack files (packed_git) internal to sha1_file.c.

I realize this may be a contentious topic, and I'd like to get
feedback on the general effort to add more threading to git.  I'd
appreciate any feedback you'd like to give up front.

Thanks!

Stefan Zager
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


static initializers

2014-01-06 Thread Stefan Zager
Howdy,

Is there any policy on making static initializers thread-safe?  I'm
working on an experimental patch to introduce threading, but I'm
running into a few non-thread-safe bits like this, in convert.c:

static const char *conv_attr_name[] = {
"crlf", "ident", "filter", "eol", "text",
};
#define NUM_CONV_ATTRS ARRAY_SIZE(conv_attr_name)

static void convert_attrs(struct conv_attrs *ca, const char *path)
{
int i;
static struct git_attr_check ccheck[NUM_CONV_ATTRS];

if (!ccheck[0].attr) {
for (i = 0; i < NUM_CONV_ATTRS; i++)
ccheck[i].attr = git_attr(conv_attr_name[i]);
user_convert_tail = &user_convert;
git_config(read_convert_config, NULL);
}
}



The easy fix would be to stick the initialization bit into an 'extern
int init_convert_config();' function, and invoke it prior to the
threaded part of my code.  That would be no worse than the current
state of affairs, which is to say that adding threading is rife with
hidden perils.

A more comprehensive fix might be:

#include 

static pthread_mutex_t convert_config_mutex = PTHREAD_MUTEX_INITIALIZER;

static void convert_attrs(struct conv_attrs *ca, const char *path)
{
int i;
static struct git_attr_check ccheck[NUM_CONV_ATTRS];

pthread_mutex_lock(&convert_config_mutex);
if (!ccheck[0].attr) {
for (i = 0; i < NUM_CONV_ATTRS; i++)
ccheck[i].attr = git_attr(conv_attr_name[i]);
user_convert_tail = &user_convert;
git_config(read_convert_config, NULL);
}
pthread_mutex_unlock(&convert_config_mutex);
}


Unfortunately, I don't think mingw/msys supports
PTHREAD_MUTEX_INITIALIZER.  A possible workaround would be:

static pthread_mutex_t convert_config_mutex;

static void init_convert_config_mutex() __attribute__((constructor));
static void init_convert_config_mutex()
{
pthread_mutex_init(&convert_config_mutex);
}


But then, I'm not whether mingw/msys supports __attribute__(constructor).


Can anyone give me some guidance before I go to much further into the
weeds (and I'm neck-deep as it is)?

Thanks,

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


static variables

2013-12-10 Thread Stefan Zager
This is probably a naive question, but: there are quite a lot of static
variables in the git code where it's really unnecessary.  Is that just a
historical artifact, or is there some reason to prefer them?  I'm working
on a patch that will introduce threading, so naturally I'm on the lookout
for static variables.  In general, can I get rid of static variables where
it seems straightforward to do so?

As an example, here's an excerpt from symlnks.c.  In addition to being
static, if I'm reading this right, it appears that the 'removal' variable
is used before it's initialized:

static struct removal_def {
  char path[PATH_MAX];
  int len;
} removal;

static void do_remove_scheduled_dirs(int new_len)
{
  while (removal.len > new_len) {
removal.path[removal.len] = '\0';
if (rmdir(removal.path))
  break;
do {
  removal.len--;
} while (removal.len > new_len &&
   removal.path[removal.len] != '/');
  }
  removal.len = new_len;
}

void schedule_dir_for_removal(const char *name, int len)
{
  int match_len, last_slash, i, previous_slash;

  match_len = last_slash = i =
longest_path_match(name, len, removal.path, removal.len,
   &previous_slash);
  /* Find last slash inside 'name' */
  while (i < len) {
if (name[i] == '/')
  last_slash = i;
i++;
  }

  /*
   * If we are about to go down the directory tree, we check if
   * we must first go upwards the tree, such that we then can
   * remove possible empty directories as we go upwards.
   */
  if (match_len < last_slash && match_len < removal.len)
do_remove_scheduled_dirs(match_len);
  /*
   * If we go deeper down the directory tree, we only need to
   * save the new path components as we go down.
   */
  if (match_len < last_slash) {
memcpy(&removal.path[match_len], &name[match_len],
   last_slash - match_len);
removal.len = last_slash;
  }
}

void remove_scheduled_dirs(void)
{
  do_remove_scheduled_dirs(0);
}

EOF



Thanks,

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows performance / threading file access

2013-10-10 Thread Stefan Zager
On Thu, Oct 10, 2013 at 5:51 PM, Karsten Blees wrote:

> >> I've noticed that when working with a very large repository using msys
> >> git, the initial checkout of a cloned repository is excruciatingly
> >> slow (80%+ of total clone time).  The root cause, I think, is that git
> >> does all the file access serially, and that's really slow on Windows.
> >>
>
> What exactly do you mean by "excruciatingly slow"?
>
> I just ran a few tests with a big repo (WebKit, ~2GB, ~200k files). A full
> checkout with git 1.8.4 on my SSD took 52s on Linux and 81s on Windows.
> Xcopy /s took ~4 minutes (so xcopy is much slower than git). On a 'real' HD
> (WD Caviar Green) the Windows checkout took ~9 minutes.

I'm using blink for my test, which should be more or less indistinguishable
from WebKit.  I'm using a standard spinning disk, no SSD.  For my purposes,
I need to optimize this for "standard"-ish hardware, not best-in-class.

For my test, I first run 'git clone -n ', and then measure the
running time of 'git checkout --force HEAD'.  On linux, the checkout
command runs in 0:12; on Windows, it's about 3:30.

> If your numbers are much slower, check for overeager virus scanners and
> probably the infamous "User Account Control" (On Vista/7 (8?), the
> luafv.sys driver slows down things on the system drive even with UAC turned
> off in control panel. The driver can be disabled with "sc config luafv
> start= disabled" + reboot. Reenable with "sc config luafv start= auto").

I confess that I am pretty ignorant about Windows, so I'll have to research
these.

>> Has anyone considered threading file access to speed this up?  In
> >> particular, I've got my eye on this loop in unpack-trees.c:
> >>
>
> Its probably worth a try, however, in my experience, doing disk IO in
> parallel tends to slow things down due to more disk seeks.

> I'd rather try to minimize seeks, ...
>

In my experience, modern disk controllers are very very good at this; it
rarely, if ever, makes sense to try and outsmart them.

But, from talking to Windows-savvy people, I believe the issue is not disk
seek time, but rather the fact that Windows doesn't cache file stat
information.  Instead, it goes all the way to the source of truth (i.e.,
the physical disk) every time it stats a file or directory.  That's what
causes the checkout to be so slow: all those file stats run serially.

Does that sound right?  I'm prepared to be wrong about this; but if no one
has tried it, then it's probably at least worth an experiment.

Thanks,

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Windows performance / threading file access

2013-10-10 Thread Stefan Zager
Hi folks,

I don't follow the mailing list carefully, so forgive me if this has
been discussed before, but:

I've noticed that when working with a very large repository using msys
git, the initial checkout of a cloned repository is excruciatingly
slow (80%+ of total clone time).  The root cause, I think, is that git
does all the file access serially, and that's really slow on Windows.

Has anyone considered threading file access to speed this up?  In
particular, I've got my eye on this loop in unpack-trees.c:

static struct checkout state;
static int check_updates(struct unpack_trees_options *o)
{
unsigned cnt = 0, total = 0;
struct progress *progress = NULL;
struct index_state *index = &o->result;
int i;
int errs = 0;

...

for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];

if (ce->ce_flags & CE_UPDATE) {
display_progress(progress, ++cnt);
ce->ce_flags &= ~CE_UPDATE;
if (o->update && !o->dry_run) {
errs |= checkout_entry(ce, &state, NULL);
}
}
}
stop_progress(&progress);
if (o->update)
git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
return errs != 0;
}


Any thoughts on adding threading around the call to checkout_entry?


Thanks in advance,

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


regression in multi-threaded git-pack-index

2013-03-15 Thread Stefan Zager
We have uncovered a regression in this commit:

b8a2486f1524947f232f657e9f2ebf44e3e7a243

The symptom is that 'git fetch' dies with:

error: index-pack died of signal 10
fatal: index-pack failed

I have only been able to reproduce it on a Mac thus far; will try ubuntu next.  
We can make it go away by running:

git config pack.threads 1

To reproduce it, download this working copy:

http://commondatastorage.googleapis.com/chromium-browser-snapshots/tmp/src.git.tar.gz

Then:

tar xvfz src.git.tar.gz
cd src.git
git fetch origin refs/heads/lkgr

(That is the shortest reproduction I could come up with; sorry).

Thanks,

Stefan

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-11-02 Thread Stefan Zager
ping?

On Tue, Oct 30, 2012 at 11:11 AM, Stefan Zager  wrote:
> This is a refresh of a conversation from a couple of months ago.
>
> I didn't try to implement all the desired features (e.g., smart logic
> for passing a -j parameter to recursive submodule invocations), but I
> did address the one issue that Junio insisted on: the code makes a
> best effort to detect whether xargs supports parallel execution on the
> host platform, and if it doesn't, then it prints a warning and falls
> back to serial execution.
>
> Stefan
>
> On Tue, Oct 30, 2012 at 11:03 AM,   wrote:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
>>
>> Signed-off-by: Stefan Zager 
>> ---
>>  Documentation/git-submodule.txt |8 ++-
>>  git-submodule.sh|   40 
>> ++-
>>  2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/git-submodule.txt 
>> b/Documentation/git-submodule.txt
>> index b4683bb..cb23ba7 100644
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -14,7 +14,8 @@ SYNOPSIS
>>  'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
>>  'git submodule' [--quiet] init [--] [...]
>>  'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
>> - [--reference ] [--merge] [--recursive] [--] 
>> [...]
>> + [--reference ] [--merge] [--recursive]
>> + [-j|--jobs [jobs]] [--] [...]
>>  'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) 
>> ]
>>   [commit] [--] [...]
>>  'git submodule' [--quiet] foreach [--recursive] 
>> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just 
>> want to use the
>>  setting as stored in .gitmodules, you can automatically initialize the
>>  submodule with the `--init` option.
>>  +
>> +By default, each submodule is treated serially.  You may specify a degree of
>> +parallel execution with the --jobs flag.  If a parameter is provided, it is
>> +the maximum number of jobs to run in parallel; without a parameter, all 
>> jobs are
>> +run in parallel.
>> ++
>>  If `--recursive` is specified, this command will recurse into the
>>  registered submodules, and update any nested submodules within.
>>  +
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index ab6b110..60a5f96 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>>  USAGE="[--quiet] add [-b branch] [-f|--force] [--reference ] 
>> [--]  []
>> or: $dashless [--quiet] status [--cached] [--recursive] [--] [...]
>> or: $dashless [--quiet] init [--] [...]
>> -   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
>> [--rebase] [--reference ] [--merge] [--recursive] [--] 
>> [...]
>> +   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
>> [--rebase] [--reference ] [--merge] [--recursive] [-j|--jobs 
>> [jobs]] [--] [...]
>> or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
>> [commit] [--] [...]
>> or: $dashless [--quiet] foreach [--recursive] 
>> or: $dashless [--quiet] sync [--] [...]"
>> @@ -500,6 +500,7 @@ cmd_update()
>>  {
>> # parse $args after "submodule ... update".
>> orig_flags=
>> +   jobs="1"
>> while test $# -ne 0
>> do
>> case "$1" in
>> @@ -518,6 +519,20 @@ cmd_update()
>> -r|--rebase)
>> update="rebase"
>> ;;
>> +   -j|--jobs)
>> +   case "$2" in
>> +   ''|-*)
>> +   jobs="0"
>> +   ;;
>> +   *)
>> +   jobs="$2"
>> +   shift
>> +   ;;
>> +   esac
>> +   # Don't preserve this arg.
>> +   shift
>> +   continue
>> +   ;;
>> --reference)
>> case "$2" in '') usage ;; esac
>>  

Re: [PATCH] Enable parallelism in git submodule update.

2012-10-30 Thread Stefan Zager
This is a refresh of a conversation from a couple of months ago.

I didn't try to implement all the desired features (e.g., smart logic
for passing a -j parameter to recursive submodule invocations), but I
did address the one issue that Junio insisted on: the code makes a
best effort to detect whether xargs supports parallel execution on the
host platform, and if it doesn't, then it prints a warning and falls
back to serial execution.

Stefan

On Tue, Oct 30, 2012 at 11:03 AM,   wrote:
> The --jobs parameter may be used to set the degree of per-submodule
> parallel execution.
>
> Signed-off-by: Stefan Zager 
> ---
>  Documentation/git-submodule.txt |8 ++-
>  git-submodule.sh|   40 
> ++-
>  2 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index b4683bb..cb23ba7 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -14,7 +14,8 @@ SYNOPSIS
>  'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
>  'git submodule' [--quiet] init [--] [...]
>  'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
> - [--reference ] [--merge] [--recursive] [--] 
> [...]
> + [--reference ] [--merge] [--recursive]
> + [-j|--jobs [jobs]] [--] [...]
>  'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) 
> ]
>   [commit] [--] [...]
>  'git submodule' [--quiet] foreach [--recursive] 
> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just 
> want to use the
>  setting as stored in .gitmodules, you can automatically initialize the
>  submodule with the `--init` option.
>  +
> +By default, each submodule is treated serially.  You may specify a degree of
> +parallel execution with the --jobs flag.  If a parameter is provided, it is
> +the maximum number of jobs to run in parallel; without a parameter, all jobs 
> are
> +run in parallel.
> ++
>  If `--recursive` is specified, this command will recurse into the
>  registered submodules, and update any nested submodules within.
>  +
> diff --git a/git-submodule.sh b/git-submodule.sh
> index ab6b110..60a5f96 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>  USAGE="[--quiet] add [-b branch] [-f|--force] [--reference ] 
> [--]  []
> or: $dashless [--quiet] status [--cached] [--recursive] [--] [...]
> or: $dashless [--quiet] init [--] [...]
> -   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
> [--rebase] [--reference ] [--merge] [--recursive] [--] [...]
> +   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
> [--rebase] [--reference ] [--merge] [--recursive] [-j|--jobs 
> [jobs]] [--] [...]
> or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
> [commit] [--] [...]
> or: $dashless [--quiet] foreach [--recursive] 
> or: $dashless [--quiet] sync [--] [...]"
> @@ -500,6 +500,7 @@ cmd_update()
>  {
> # parse $args after "submodule ... update".
> orig_flags=
> +   jobs="1"
> while test $# -ne 0
> do
> case "$1" in
> @@ -518,6 +519,20 @@ cmd_update()
> -r|--rebase)
> update="rebase"
> ;;
> +   -j|--jobs)
> +   case "$2" in
> +   ''|-*)
> +   jobs="0"
> +   ;;
> +   *)
> +   jobs="$2"
> +   shift
> +   ;;
> +   esac
> +   # Don't preserve this arg.
> +   shift
> +   continue
> +   ;;
> --reference)
> case "$2" in '') usage ;; esac
> reference="--reference=$2"
> @@ -551,11 +566,34 @@ cmd_update()
> shift
> done
>
> +   # Correctly handle the case where '-q' came before 'update' on the 
> command line.
> +   if test -n "$GIT_QUIET"
> +   then
> +   orig_flags="$orig_flags -q"
> +   fi
> +
> if test -n "$init"
> then
> cmd_init 

Re: Fix potential hang in https handshake.

2012-10-19 Thread Stefan Zager
On Fri, Oct 19, 2012 at 1:27 PM, Jeff King  wrote:
>
> On Fri, Oct 19, 2012 at 07:10:46AM -0700, Shawn O. Pearce wrote:
>
> > > IOW, it seems like we are _already_ following the advice referenced in
> > > curl's manpage. Is there some case I am missing? Confused...
> >
> > The issue with the current code is sometimes when libcurl is opening a
> > CONNECT style connection through an HTTP proxy it returns a crazy high
> > timeout (>240 seconds) and no fds. In this case Git waits forever.
> > Stefan observed that using a timeout of 50 ms in this situation to
> > poll libcurl is better, as it figures out a lot more quickly that it
> > is connected to the proxy and can issue the request.
>
> Ah. That sounds like a bug in curl to me. But either way, if we want to
> work around it, wouldn't the right thing be to override curl's timeout
> in that instance? Like:
>
> diff --git a/http.c b/http.c
> index df9bb71..cd07cdf 100644
> --- a/http.c
> +++ b/http.c
> @@ -631,6 +631,19 @@ void run_active_slot(struct active_request_slot *slot)
> FD_ZERO(&excfds);
> curl_multi_fdset(curlm, &readfds, &writefds, &excfds, 
> &max_fd);
>
> +   /*
> +* Sometimes curl will give a really long timeout for 
> a
> +* CONNECT when there are no fds to read, but we can
> +* get better results by running curl_multi_perform
> +* more frequently.
> +*/
> +   if (maxfd < 0 &&
> +   (select_timeout.tv_sec > 0 ||
> +select_timeout.tv_usec > 5)) {
> +   select_timeout.tv_sec = 0;
> +   select_timeout.tv_usec = 5;
> +   }
> +
> select(max_fd+1, &readfds, &writefds, &excfds, 
> &select_timeout);
> }
> }
>
> -Peff

I have no objection to this; any one else?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Make 'git submodule update --force' always check out submodules.

2012-08-22 Thread Stefan Zager
Currently, it will only do a checkout if the sha1 registered in the containing
repository doesn't match the HEAD of the submodule, regardless of whether the
submodule is dirty.  As discussed on the mailing list, the '--force' flag is a
strong indicator that the state of the submodule is suspect, and should be reset
to HEAD.

Signed-off-by: Stefan Zager 
---
 Documentation/git-submodule.txt |  9 -
 git-submodule.sh|  2 +-
 t/t7406-submodule-update.sh | 12 
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index fbbbcb2..2de7bf0 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -149,6 +149,11 @@ submodule with the `--init` option.
 +
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and update any nested submodules within.
++
+If `--force` is specified, the submodule will be checked out (using
+`git checkout --force` if appropriate), even if the commit specified in the
+index of the containing repository already matches the commit checked out in
+the submodule.
 
 summary::
Show commit summary between the given commit (defaults to HEAD) and
@@ -210,7 +215,9 @@ OPTIONS
This option is only valid for add and update commands.
When running add, allow adding an otherwise ignored submodule path.
When running update, throw away local changes in submodules when
-   switching to a different commit.
+   switching to a different commit; and always run a checkout operation
+   in the submodule, even if the commit listed in the index of the
+   containing repository matches the commit checked out in the submodule.
 
 --cached::
This option is only valid for status and summary commands.  These
diff --git a/git-submodule.sh b/git-submodule.sh
index aac575e..3aa7644 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -578,7 +578,7 @@ Maybe you want to use 'update --init'?")"
die "$(eval_gettext "Unable to find current revision in 
submodule path '\$sm_path'")"
fi
 
-   if test "$subsha1" != "$sha1"
+   if test "$subsha1" != "$sha1" -o -n "$force"
then
subforce=$force
# If we don't already have a -f flag and the submodule 
has never been checked out
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index ce61d4c..9706436 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -123,6 +123,18 @@ test_expect_success 'submodule update should throw away 
changes with --force ' '
)
 '
 
+test_expect_success 'submodule update --force forcibly checks out submodules' '
+   (cd super &&
+(cd submodule &&
+ rm -f file
+) &&
+git submodule update --force submodule &&
+(cd submodule &&
+ test "$(git status -s file)" = ""
+)
+   )
+'
+
 test_expect_success 'submodule update --rebase staying on master' '
(cd super/submodule &&
  git checkout master
-- 
1.7.12.2.gf3df7bf.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Enable parallelism in git submodule update.

2012-07-27 Thread Stefan Zager
The --jobs parameter may be used to set the degree of per-submodule
parallel execution.

Signed-off-by: Stefan Zager 
---
 Documentation/git-submodule.txt |  8 +++-
 git-submodule.sh| 23 ++-
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index fbbbcb2..34f81fb 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
 'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
 'git submodule' [--quiet] init [--] [...]
 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
- [--reference ] [--merge] [--recursive] [--] 
[...]
+ [--reference ] [--merge] [--recursive]
+ [-j|--jobs [jobs]] [--] [...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) ]
  [commit] [--] [...]
 'git submodule' [--quiet] foreach [--recursive] 
@@ -147,6 +148,11 @@ If the submodule is not yet initialized, and you just want 
to use the
 setting as stored in .gitmodules, you can automatically initialize the
 submodule with the `--init` option.
 +
+By default, each submodule is treated serially.  You may specify a degree of
+parallel execution with the --jobs flag.  If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs 
are
+run in parallel.
++
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and update any nested submodules within.
 
diff --git a/git-submodule.sh b/git-submodule.sh
index dba4d39..761420a 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
 USAGE="[--quiet] add [-b branch] [-f|--force] [--reference ] [--] 
 []
or: $dashless [--quiet] status [--cached] [--recursive] [--] [...]
or: $dashless [--quiet] init [--] [...]
-   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
[--rebase] [--reference ] [--merge] [--recursive] [--] [...]
+   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
[--rebase] [--reference ] [--merge] [--recursive] [-j|--jobs 
[jobs]] [--] [...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
[commit] [--] [...]
or: $dashless [--quiet] foreach [--recursive] 
or: $dashless [--quiet] sync [--] [...]"
@@ -473,6 +473,7 @@ cmd_update()
 {
# parse $args after "submodule ... update".
orig_flags=
+   jobs="1"
while test $# -ne 0
do
case "$1" in
@@ -491,6 +492,20 @@ cmd_update()
-r|--rebase)
update="rebase"
;;
+   -j|--jobs)
+   case "$2" in
+   ''|-*)
+   jobs="0"
+   ;;
+   *)
+   jobs="$2"
+   shift
+   ;;
+   esac
+   # Don't preserve this arg.
+   shift
+   continue
+   ;;
--reference)
case "$2" in '') usage ;; esac
reference="--reference=$2"
@@ -529,6 +544,12 @@ cmd_update()
cmd_init "--" "$@" || return
fi
 
+   if test "$jobs" != "1"
+   then
+   module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git 
submodule update $orig_args
+   return
+   fi
+
cloned_modules=
module_list "$@" | {
err=
-- 
1.7.11.rc2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Make 'git submodule update --force' always check out submodules.

2012-07-25 Thread Stefan Zager
Currently, it will only do a checkout if the sha1 registered in the containing
repository doesn't match the HEAD of the submodule, regardless of whether the
submodule is dirty.  As discussed on the mailing list, the '--force' flag is a
strong indicator that the state of the submodule is suspect, and should be reset
to HEAD.

Signed-off-by: Stefan Zager 
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index dba4d39..621eff7 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -575,7 +575,7 @@ Maybe you want to use 'update --init'?")"
die "$(eval_gettext "Unable to find current revision in 
submodule path '\$sm_path'")"
fi
 
-   if test "$subsha1" != "$sha1"
+   if test "$subsha1" != "$sha1" -o -n "$force"
then
subforce=$force
# If we don't already have a -f flag and the submodule 
has never been checked out
-- 
1.7.11.rc2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html