Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-11 Thread Jeff King
On Wed, Jul 10, 2013 at 10:10:16AM -0700, Brandon Casey wrote: > > On the linux.git repo, with about 3M objects to sort, this > > yields a 400% speedup. Here are the best-of-five numbers for > > running "echo HEAD | git cat-file --batch-disk-size", which > > is dominated by time spent building the

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-11 Thread Jeff King
On Wed, Jul 10, 2013 at 06:47:49PM +0530, Ramkumar Ramachandra wrote: > > For a 64-bit off_t, using 16-bit "digits" gives us k=4. > > Wait, isn't off_t a signed data type? Did you account for that in > your algorithm? It is signed, but the values we are storing in the revindex are all positive

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-10 Thread Brandon Casey
On Wed, Jul 10, 2013 at 4:55 AM, Jeff King wrote: > The pack revindex stores the offsets of the objects in the > pack in sorted order, allowing us to easily find the on-disk > size of each object. To compute it, we populate an array > with the offsets from the sha1-sorted idx file, and then use >

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-10 Thread Ramkumar Ramachandra
Jeff King wrote: > That does O(n log n) offset comparisons, and profiling shows > that we spend most of our time in cmp_offset. However, since > we are sorting on a simple off_t, we can use numeric sorts > that perform better. A radix sort can run in O(k*n), where k > is the number of "digits" in o

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-10 Thread Jeff King
On Wed, Jul 10, 2013 at 07:55:57AM -0400, Jeff King wrote: > 5. We use memcpy instead of an open-coded loop to copy the whole array > at the end. The individual bucket-assignment is still done by > struct assignment. I haven't timed if memcpy would make a > difference there. I ju