On Wed, Jul 10, 2013 at 10:10:16AM -0700, Brandon Casey wrote:
> > On the linux.git repo, with about 3M objects to sort, this
> > yields a 400% speedup. Here are the best-of-five numbers for
> > running "echo HEAD | git cat-file --batch-disk-size", which
> > is dominated by time spent building the
On Wed, Jul 10, 2013 at 06:47:49PM +0530, Ramkumar Ramachandra wrote:
> > For a 64-bit off_t, using 16-bit "digits" gives us k=4.
>
> Wait, isn't off_t a signed data type? Did you account for that in
> your algorithm?
It is signed, but the values we are storing in the revindex are all
positive
On Wed, Jul 10, 2013 at 4:55 AM, Jeff King wrote:
> The pack revindex stores the offsets of the objects in the
> pack in sorted order, allowing us to easily find the on-disk
> size of each object. To compute it, we populate an array
> with the offsets from the sha1-sorted idx file, and then use
>
Jeff King wrote:
> That does O(n log n) offset comparisons, and profiling shows
> that we spend most of our time in cmp_offset. However, since
> we are sorting on a simple off_t, we can use numeric sorts
> that perform better. A radix sort can run in O(k*n), where k
> is the number of "digits" in o
On Wed, Jul 10, 2013 at 07:55:57AM -0400, Jeff King wrote:
> 5. We use memcpy instead of an open-coded loop to copy the whole array
> at the end. The individual bucket-assignment is still done by
> struct assignment. I haven't timed if memcpy would make a
> difference there.
I ju
5 matches
Mail list logo