On Wed, Jul 10, 2013 at 06:47:49PM +0530, Ramkumar Ramachandra wrote:
For a 64-bit off_t, using 16-bit digits gives us k=4.
Wait, isn't off_t a signed data type? Did you account for that in
your algorithm?
It is signed, but the values we are storing in the revindex are all
positive file
On Wed, Jul 10, 2013 at 10:10:16AM -0700, Brandon Casey wrote:
On the linux.git repo, with about 3M objects to sort, this
yields a 400% speedup. Here are the best-of-five numbers for
running echo HEAD | git cat-file --batch-disk-size, which
is dominated by time spent building the pack
The pack revindex stores the offsets of the objects in the
pack in sorted order, allowing us to easily find the on-disk
size of each object. To compute it, we populate an array
with the offsets from the sha1-sorted idx file, and then use
qsort to order it by offsets.
That does O(n log n) offset
On Wed, Jul 10, 2013 at 07:55:57AM -0400, Jeff King wrote:
5. We use memcpy instead of an open-coded loop to copy the whole array
at the end. The individual bucket-assignment is still done by
struct assignment. I haven't timed if memcpy would make a
difference there.
I just
Jeff King wrote:
That does O(n log n) offset comparisons, and profiling shows
that we spend most of our time in cmp_offset. However, since
we are sorting on a simple off_t, we can use numeric sorts
that perform better. A radix sort can run in O(k*n), where k
is the number of digits in our
On Wed, Jul 10, 2013 at 4:55 AM, Jeff King p...@peff.net wrote:
The pack revindex stores the offsets of the objects in the
pack in sorted order, allowing us to easily find the on-disk
size of each object. To compute it, we populate an array
with the offsets from the sha1-sorted idx file, and
6 matches
Mail list logo