Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-11 Thread Jeff King
On Wed, Jul 10, 2013 at 06:47:49PM +0530, Ramkumar Ramachandra wrote: For a 64-bit off_t, using 16-bit digits gives us k=4. Wait, isn't off_t a signed data type? Did you account for that in your algorithm? It is signed, but the values we are storing in the revindex are all positive file

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-11 Thread Jeff King
On Wed, Jul 10, 2013 at 10:10:16AM -0700, Brandon Casey wrote: On the linux.git repo, with about 3M objects to sort, this yields a 400% speedup. Here are the best-of-five numbers for running echo HEAD | git cat-file --batch-disk-size, which is dominated by time spent building the pack

[PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-10 Thread Jeff King
The pack revindex stores the offsets of the objects in the pack in sorted order, allowing us to easily find the on-disk size of each object. To compute it, we populate an array with the offsets from the sha1-sorted idx file, and then use qsort to order it by offsets. That does O(n log n) offset

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-10 Thread Jeff King
On Wed, Jul 10, 2013 at 07:55:57AM -0400, Jeff King wrote: 5. We use memcpy instead of an open-coded loop to copy the whole array at the end. The individual bucket-assignment is still done by struct assignment. I haven't timed if memcpy would make a difference there. I just

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-10 Thread Ramkumar Ramachandra
Jeff King wrote: That does O(n log n) offset comparisons, and profiling shows that we spend most of our time in cmp_offset. However, since we are sorting on a simple off_t, we can use numeric sorts that perform better. A radix sort can run in O(k*n), where k is the number of digits in our

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-10 Thread Brandon Casey
On Wed, Jul 10, 2013 at 4:55 AM, Jeff King p...@peff.net wrote: The pack revindex stores the offsets of the objects in the pack in sorted order, allowing us to easily find the on-disk size of each object. To compute it, we populate an array with the offsets from the sha1-sorted idx file, and