[PATCH 13/19] pack-objects: implement bitmap writing

2013-10-24 Thread Jeff King
From: Vicent Marti tan...@gmail.com

This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.

If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.

Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:

1. `bitmap_writer_set_checksum`: this call stores the partial
   checksum for the packfile being written; the checksum will be
   written in the resulting bitmap index to verify its integrity

2. `bitmap_writer_build_type_index`: this call uses the array of
   `struct object_entry` that has just been sorted when writing out
   the actual packfile index to disk to generate 4 type-index bitmaps
   (one for each object type).

   These bitmaps have their nth bit set if the given object is of
   the bitmap's type. E.g. the nth bit of the Commits bitmap will be
   1 if the nth object in the packfile index is a commit.

   This is a very cheap operation because the bitmap writing code has
   access to the metadata stored in the `struct object_entry` array,
   and hence the real type for each object in the packfile.

3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
   index for one of the packfiles we're trying to repack, this call
   will efficiently rebuild the existing bitmaps so they can be
   reused on the new index. All the existing bitmaps will be stored
   in a `reuse` hash table, and the commit selection phase will
   prioritize these when selecting, as they can be written directly
   to the new index without having to perform a revision walk to
   fill the bitmap. This can greatly speed up the repack of a
   repository that already has bitmaps.

4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
   a given `pack-objects` run, the sequence of commits generated
   during the Counting Objects phase will be stored in an array.

   We then use that array to build up the list of selected commits.
   Writing a bitmap in the index for each object in the repository
   would be cost-prohibitive, so we use a simple heuristic to pick
   the commits that will be indexed with bitmaps.

   The current heuristics are a simplified version of JGit's
   original implementation. We select a higher density of commits
   depending on their age: the 100 most recent commits are always
   selected, after that we pick 1 commit of each 100, and the gap
   increases as the commits grow older. On top of that, we make sure
   that every single branch that has not been merged (all the tips
   that would be required from a clone) gets their own bitmap, and
   when selecting commits between a gap, we tend to prioritize the
   commit with the most parents.

   Do note that there is no right/wrong way to perform commit
   selection; different selection algorithms will result in
   different commits being selected, but there's no such thing as
   missing a commit. The bitmap walker algorithm implemented in
   `prepare_bitmap_walk` is able to adapt to missing bitmaps by
   performing manual walks that complete the bitmap: the ideal
   selection algorithm, however, would select the commits that are
   more likely to be used as roots for a walk in the future (e.g.
   the tips of each branch, and so on) to ensure a bitmap for them
   is always available.

5. `bitmap_writer_build`: this is the computationally expensive part
   of bitmap generation. Based on the list of commits that were
   selected in the previous step, we perform several incremental
   walks to generate the bitmap for each commit.

   The walks begin from the oldest commit, and are built up
   incrementally for each branch. E.g. consider this dag where A, B,
   C, D, E, F are the selected commits, and a, b, c, e are a chunk
   of simplified history that will not receive bitmaps.

A---a---B--b--C--c--D
 \
  E--e--F

   We start by building the bitmap for A, using A as the root for a
   revision walk and marking all the objects that are reachable
   until the walk is over. Once this bitmap is stored, we reuse the
   bitmap walker to perform the walk for B, assuming that once we
   reach A again, the walk will be terminated because A has already
   been SEEN on the previous walk.

   This process 

Re: [PATCH 13/19] pack-objects: implement bitmap writing

2013-10-24 Thread Duy Nguyen
On Fri, Oct 25, 2013 at 1:06 AM, Jeff King p...@peff.net wrote:
 From: Vicent Marti tan...@gmail.com

 This commit extends more the functionality of `pack-objects` by allowing
 it to write out a `.bitmap` index next to any written packs, together
 with the `.idx` index that currently gets written.

 If bitmap writing is enabled for a given repository (either by calling
 `pack-objects` with the `--write-bitmap-index` flag or by having
 `pack.writebitmaps` set to `true` in the config) and pack-objects is
 writing a packfile that would normally be indexed (i.e. not piping to
 stdout), we will attempt to write the corresponding bitmap index for the
 packfile.

I haven't read the actual patch yet, but the diffstat says user
documentation is missing..
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/19] pack-objects: implement bitmap writing

2013-10-24 Thread Jeff King
On Fri, Oct 25, 2013 at 08:21:12AM +0700, Nguyen Thai Ngoc Duy wrote:

  If bitmap writing is enabled for a given repository (either by calling
  `pack-objects` with the `--write-bitmap-index` flag or by having
  `pack.writebitmaps` set to `true` in the config) and pack-objects is
  writing a packfile that would normally be indexed (i.e. not piping to
  stdout), we will attempt to write the corresponding bitmap index for the
  packfile.
 
 I haven't read the actual patch yet, but the diffstat says user
 documentation is missing..

I'll work on that for the re-roll.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html