Re: [PATCH v2] Document pack v4 format
On Fri, 6 Sep 2013, Duy Nguyen wrote: > On Thu, Sep 5, 2013 at 11:52 PM, Duy Nguyen wrote: > > On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre wrote: > >> Now the pack index v3 probably needs to be improved a little, again to > >> accommodate completion of thin packs. Given that the main SHA1 table is > >> now in the main pack file, it should be possible to still carry a small > >> SHA1 table in the index file that corresponds to the appended objects > >> only. This means that a SHA1 search will have to first use the main SHA1 > >> table in the pack file as it is done now, and if not found then use the > >> SHA1 table in the index file if it exists. And of course > >> nth_packed_object_sha1() will have to be adjusted accordingly. > > > > What if the sender prepares the sha-1 table to contain missing objects > > in advance? The sender should know what base objects are missing. Then > > we only need to append objects at the receiving end and verify that > > all new objects are also present in the sha-1 table. > > One minor detail to sort out: the size of sha-1 table. Previously it's > the number of objects in the pack. Now it's not true because the table > may have more entries. So how should we record the table size? We > could use null sha-1 as the end of table marker. Or we could make > pack-objects to write nr_objects as the number of entries _after_ pack > completion, not the true number of objects in thin pack. I like the > latter (no more rehashing the entire pack after completion) but then > we need a cue to know that we have reached the end of the pack.. See the amendment I made to your documentation patch. I opted for the later. To mark the end of the transmitted objects a zero byte (object type 0) is used. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, Sep 5, 2013 at 11:52 PM, Duy Nguyen wrote: > On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre wrote: >> Now the pack index v3 probably needs to be improved a little, again to >> accommodate completion of thin packs. Given that the main SHA1 table is >> now in the main pack file, it should be possible to still carry a small >> SHA1 table in the index file that corresponds to the appended objects >> only. This means that a SHA1 search will have to first use the main SHA1 >> table in the pack file as it is done now, and if not found then use the >> SHA1 table in the index file if it exists. And of course >> nth_packed_object_sha1() will have to be adjusted accordingly. > > What if the sender prepares the sha-1 table to contain missing objects > in advance? The sender should know what base objects are missing. Then > we only need to append objects at the receiving end and verify that > all new objects are also present in the sha-1 table. One minor detail to sort out: the size of sha-1 table. Previously it's the number of objects in the pack. Now it's not true because the table may have more entries. So how should we record the table size? We could use null sha-1 as the end of table marker. Or we could make pack-objects to write nr_objects as the number of entries _after_ pack completion, not the true number of objects in thin pack. I like the latter (no more rehashing the entire pack after completion) but then we need a cue to know that we have reached the end of the pack.. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, 5 Sep 2013, Junio C Hamano wrote: > Nicolas Pitre writes: > > > On Thu, 5 Sep 2013, Duy Nguyen wrote: > > > >> On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre wrote: > >> > Now the pack index v3 probably needs to be improved a little, again to > >> > accommodate completion of thin packs. Given that the main SHA1 table is > >> > now in the main pack file, it should be possible to still carry a small > >> > SHA1 table in the index file that corresponds to the appended objects > >> > only. This means that a SHA1 search will have to first use the main SHA1 > >> > table in the pack file as it is done now, and if not found then use the > >> > SHA1 table in the index file if it exists. And of course > >> > nth_packed_object_sha1() will have to be adjusted accordingly. > >> > >> What if the sender prepares the sha-1 table to contain missing objects > >> in advance? The sender should know what base objects are missing. Then > >> we only need to append objects at the receiving end and verify that > >> all new objects are also present in the sha-1 table. > > > > I do like this idea very much. And that doesn't increase the thin pack > > size as the larger SHA1 table will be compensated by a smaller sha1ref > > encoding in those objects referring to the missing ones. > > Let me see if I understand the proposal correctly. Compared to a > normal pack-v4 stream, a thin pack-v4 stream: > > - has all the SHA-1 object names involved in the stream in its main >object name table---most importantly, names of objects that >"thin" optimization omits from the pack data body are included; > > - uses the SHA-1 object name table offset to refer to other >objects, even to ones that thin stream will not transfer in the >pack data body; > > - is completed at the receiving end by appending the data for the >objects that were not transferred due to the "thin" optimization. > > So the invariant "all objects contained in the pack" in: > > - A table of sorted SHA-1 object names for all objects contained in >the pack. > > that appears in Documentation/technical/pack-format.txt is still > kept at the end, and more importantly, any object that is mentioned > in this table can be reconstructed by using pack data in the same > packfile without referencing anything else. Most importantly, if we > were to build a v2 .idx file for the resulting .pack, the list of > object names in the .idx file would be identical to the object names > in this table in the .pack file. That is right. > If that is the case, I too like this. > > I briefly wondered if it makes sense to mention objects that are > often referred to that do not exist in the pack in this table > (e.g. new commits included in this pack refer to a tree object that > has not changed for ages---their trees mention this subtree using a > "SHA-1 reference encoding" and being able to name the old, > unchanging tree with an index to the object table may save space), > but that would break the above invariant in a big way---some objects > mentioned in the table may not exist in the packfile itself---and it > probably is not a good idea. Yet, if an old tree that doesn't change is often referred to, it should be possible to have only one such reference in the whole pack and all the other trees can use a delta copy sequence to refer to it. At this point whether or not the tree being referred to is listed inline or in the SHA1 table doesn't make a big difference. > Unlike that broken idea, "include > names of the objects that will be appended anyway" approach to help > fattening a thin-pack makes very good sense to me. > > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
Nicolas Pitre writes: > On Thu, 5 Sep 2013, Duy Nguyen wrote: > >> On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre wrote: >> > Now the pack index v3 probably needs to be improved a little, again to >> > accommodate completion of thin packs. Given that the main SHA1 table is >> > now in the main pack file, it should be possible to still carry a small >> > SHA1 table in the index file that corresponds to the appended objects >> > only. This means that a SHA1 search will have to first use the main SHA1 >> > table in the pack file as it is done now, and if not found then use the >> > SHA1 table in the index file if it exists. And of course >> > nth_packed_object_sha1() will have to be adjusted accordingly. >> >> What if the sender prepares the sha-1 table to contain missing objects >> in advance? The sender should know what base objects are missing. Then >> we only need to append objects at the receiving end and verify that >> all new objects are also present in the sha-1 table. > > I do like this idea very much. And that doesn't increase the thin pack > size as the larger SHA1 table will be compensated by a smaller sha1ref > encoding in those objects referring to the missing ones. Let me see if I understand the proposal correctly. Compared to a normal pack-v4 stream, a thin pack-v4 stream: - has all the SHA-1 object names involved in the stream in its main object name table---most importantly, names of objects that "thin" optimization omits from the pack data body are included; - uses the SHA-1 object name table offset to refer to other objects, even to ones that thin stream will not transfer in the pack data body; - is completed at the receiving end by appending the data for the objects that were not transferred due to the "thin" optimization. So the invariant "all objects contained in the pack" in: - A table of sorted SHA-1 object names for all objects contained in the pack. that appears in Documentation/technical/pack-format.txt is still kept at the end, and more importantly, any object that is mentioned in this table can be reconstructed by using pack data in the same packfile without referencing anything else. Most importantly, if we were to build a v2 .idx file for the resulting .pack, the list of object names in the .idx file would be identical to the object names in this table in the .pack file. If that is the case, I too like this. I briefly wondered if it makes sense to mention objects that are often referred to that do not exist in the pack in this table (e.g. new commits included in this pack refer to a tree object that has not changed for ages---their trees mention this subtree using a "SHA-1 reference encoding" and being able to name the old, unchanging tree with an index to the object table may save space), but that would break the above invariant in a big way---some objects mentioned in the table may not exist in the packfile itself---and it probably is not a good idea. Unlike that broken idea, "include names of the objects that will be appended anyway" approach to help fattening a thin-pack makes very good sense to me. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre wrote: > Now the pack index v3 probably needs to be improved a little, again to > accommodate completion of thin packs. Given that the main SHA1 table is > now in the main pack file, it should be possible to still carry a small > SHA1 table in the index file that corresponds to the appended objects > only. This means that a SHA1 search will have to first use the main SHA1 > table in the pack file as it is done now, and if not found then use the > SHA1 table in the index file if it exists. And of course > nth_packed_object_sha1() will have to be adjusted accordingly. What if the sender prepares the sha-1 table to contain missing objects in advance? The sender should know what base objects are missing. Then we only need to append objects at the receiving end and verify that all new objects are also present in the sha-1 table. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, 5 Sep 2013, Duy Nguyen wrote: > On Thu, Sep 5, 2013 at 12:39 PM, Nicolas Pitre wrote: > > Now the pack index v3 probably needs to be improved a little, again to > > accommodate completion of thin packs. Given that the main SHA1 table is > > now in the main pack file, it should be possible to still carry a small > > SHA1 table in the index file that corresponds to the appended objects > > only. This means that a SHA1 search will have to first use the main SHA1 > > table in the pack file as it is done now, and if not found then use the > > SHA1 table in the index file if it exists. And of course > > nth_packed_object_sha1() will have to be adjusted accordingly. > > What if the sender prepares the sha-1 table to contain missing objects > in advance? The sender should know what base objects are missing. Then > we only need to append objects at the receiving end and verify that > all new objects are also present in the sha-1 table. I do like this idea very much. And that doesn't increase the thin pack size as the larger SHA1 table will be compensated by a smaller sha1ref encoding in those objects referring to the missing ones. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, 5 Sep 2013, Duy Nguyen wrote: > On Thu, Sep 5, 2013 at 11:40 AM, Nicolas Pitre wrote: > > On Thu, 5 Sep 2013, Duy Nguyen wrote: > > > >> On Thu, Sep 5, 2013 at 11:12 AM, Nicolas Pitre wrote: > >> > Many other bugs have now been fixed. A git.git repository with packs > >> > version 4 appears to be functional and passes git-fsck --full --strict. > >> > >> Yeah I was looking at the diff some minutes ago, saw changes in > >> pack-check.c and wondering if fsck was working. I'll add v4 support to > >> index-pack. > > > > Beware that the tree delta encoding has changed a little. This saved up > > to 2% on some repos. > > Thanks for the heads up. > > > I'll probably change the encoding to incorporate the escape hatch > > for path and name references as discussed previously. this is now committed. I don't think there should be any more pack format changes at this point. Now the pack index v3 probably needs to be improved a little, again to accommodate completion of thin packs. Given that the main SHA1 table is now in the main pack file, it should be possible to still carry a small SHA1 table in the index file that corresponds to the appended objects only. This means that a SHA1 search will have to first use the main SHA1 table in the pack file as it is done now, and if not found then use the SHA1 table in the index file if it exists. And of course nth_packed_object_sha1() will have to be adjusted accordingly. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, Sep 5, 2013 at 11:40 AM, Nicolas Pitre wrote: > On Thu, 5 Sep 2013, Duy Nguyen wrote: > >> On Thu, Sep 5, 2013 at 11:12 AM, Nicolas Pitre wrote: >> > Many other bugs have now been fixed. A git.git repository with packs >> > version 4 appears to be functional and passes git-fsck --full --strict. >> >> Yeah I was looking at the diff some minutes ago, saw changes in >> pack-check.c and wondering if fsck was working. I'll add v4 support to >> index-pack. > > Beware that the tree delta encoding has changed a little. This saved up > to 2% on some repos. Thanks for the heads up. > I'll probably change the encoding to incorporate the escape hatch > for path and name references as discussed previously. > >> Waiting to see the new, v4-aware tree walker interface >> with good "rev-list --all --objects" numbers from you. > > Well, unfortunately I've put more time than I really had available into > this project lately. I'm about to call for other people to take over it > and pursue this work further. > > I really wanted to set the pack format direction since I've been toying > with this for so many years. Now the tool to convert a pack is there, > and the read side is also there, proving that the format does work and > the encoding and decoding code is functional and may serve as reference. > So that's about the extent of what I can contribute at this point. > > I'll be happy to provide design assistance and code review comments of > course. But I won't be able to put the time to do the actual coding > myself much longer. You've done a great job in designing v4 and getting basic support in place. I think you'll need to post your series again so Junio can pick it up. Then we (at least I) will try to continue from there. I have high hopes that this will not drop out like the spit-blob series. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, 5 Sep 2013, Duy Nguyen wrote: > On Thu, Sep 5, 2013 at 11:12 AM, Nicolas Pitre wrote: > > Many other bugs have now been fixed. A git.git repository with packs > > version 4 appears to be functional and passes git-fsck --full --strict. > > Yeah I was looking at the diff some minutes ago, saw changes in > pack-check.c and wondering if fsck was working. I'll add v4 support to > index-pack. Beware that the tree delta encoding has changed a little. This saved up to 2% on some repos. I'll probably change the encoding to incorporate the escape hatch for path and name references as discussed previously. > Waiting to see the new, v4-aware tree walker interface > with good "rev-list --all --objects" numbers from you. Well, unfortunately I've put more time than I really had available into this project lately. I'm about to call for other people to take over it and pursue this work further. I really wanted to set the pack format direction since I've been toying with this for so many years. Now the tool to convert a pack is there, and the read side is also there, proving that the format does work and the encoding and decoding code is functional and may serve as reference. So that's about the extent of what I can contribute at this point. I'll be happy to provide design assistance and code review comments of course. But I won't be able to put the time to do the actual coding myself much longer. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Thu, Sep 5, 2013 at 11:12 AM, Nicolas Pitre wrote: > Many other bugs have now been fixed. A git.git repository with packs > version 4 appears to be functional and passes git-fsck --full --strict. Yeah I was looking at the diff some minutes ago, saw changes in pack-check.c and wondering if fsck was working. I'll add v4 support to index-pack. Waiting to see the new, v4-aware tree walker interface with good "rev-list --all --objects" numbers from you. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Tue, 3 Sep 2013, Duy Nguyen wrote: > On Tue, Sep 3, 2013 at 6:49 PM, Duy Nguyen wrote: > > On Tue, Sep 3, 2013 at 1:46 PM, Nicolas Pitre wrote: > >> So... looks like pack v4 is now "functional". > >> > >> However something is still wrong as it operates about 6 times slower > >> than pack v3. > >> > >> Anyone wishes to investigate? > > > > You recurse in decode_entries too deep.I check the first 1 > > decode_entries() calls in pv4_get_tree(). The deepest level is 3491. > > And I was wrong, the call depth is not that deep, but the number of > decode_entries calls triggered by one pv4_get_tree() is that many. > This is on git.git and the tree being processed is "t", which has 672 > entries.. There are funny access patterns. This is the output of > >fprintf(stderr, "[%d] %d - %d %u\n", call_depth, copy_start, > copy_count, copy_objoffset); > > [1] 0 - 1 48838573 > [2] 0 - 1 48826699 > [3] 0 - 1 48820760 > [4] 0 - 1 48814812 > [5] 0 - 1 48805904 > [6] 0 - 1 48797000 > [7] 0 - 1 48794034 > [8] 0 - 1 48791067 > [9] 0 - 1 48788100 > [10] 0 - 1 48785134 > [11] 0 - 1 48776221 > [12] 0 - 1 48764321 > [13] 0 - 1 48503227 > [14] 0 - 1 48485415 > [15] 0 - 1 48473512 > [16] 0 - 1 48443621 > [17] 0 - 1 48401788 > [18] 0 - 1 48377834 > [19] 0 - 1 48371841 > [20] 0 - 1 48341809 > [21] 0 - 1 48260734 > [22] 0 - 1 48236635 > [23] 0 - 1 46845105 > [24] 0 - 1 14603061 > [25] 2 - 1 48838573 > [2] 0 - 1 48826699 > > It goes through 20+ base trees just to get one tree entry, I think.. Yeah... that's true. The encoding should refer to the deepest tree directly in that case. Better delta heuristics will have to be worked out here. The code as it is now can't do that. There was also a bug that prevented larger copy sequences to be created which is now fixed. I added to packv4-create the ability to specify the minimum range of consecutive entries that can be represented by a copy sequence to allow experiments. However, even when the tree deltas are completely disabled (using --min-tree-copy=0 achieves that) the CPU usage is still much higher which is rather unexpected. In theory this shouldn't be the case. Many other bugs have now been fixed. A git.git repository with packs version 4 appears to be functional and passes git-fsck --full --strict. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Tue, Sep 3, 2013 at 6:49 PM, Duy Nguyen wrote: > On Tue, Sep 3, 2013 at 1:46 PM, Nicolas Pitre wrote: >> So... looks like pack v4 is now "functional". >> >> However something is still wrong as it operates about 6 times slower >> than pack v3. >> >> Anyone wishes to investigate? > > You recurse in decode_entries too deep.I check the first 1 > decode_entries() calls in pv4_get_tree(). The deepest level is 3491. And I was wrong, the call depth is not that deep, but the number of decode_entries calls triggered by one pv4_get_tree() is that many. This is on git.git and the tree being processed is "t", which has 672 entries.. There are funny access patterns. This is the output of fprintf(stderr, "[%d] %d - %d %u\n", call_depth, copy_start, copy_count, copy_objoffset); [1] 0 - 1 48838573 [2] 0 - 1 48826699 [3] 0 - 1 48820760 [4] 0 - 1 48814812 [5] 0 - 1 48805904 [6] 0 - 1 48797000 [7] 0 - 1 48794034 [8] 0 - 1 48791067 [9] 0 - 1 48788100 [10] 0 - 1 48785134 [11] 0 - 1 48776221 [12] 0 - 1 48764321 [13] 0 - 1 48503227 [14] 0 - 1 48485415 [15] 0 - 1 48473512 [16] 0 - 1 48443621 [17] 0 - 1 48401788 [18] 0 - 1 48377834 [19] 0 - 1 48371841 [20] 0 - 1 48341809 [21] 0 - 1 48260734 [22] 0 - 1 48236635 [23] 0 - 1 46845105 [24] 0 - 1 14603061 [25] 2 - 1 48838573 [2] 0 - 1 48826699 It goes through 20+ base trees just to get one tree entry, I think.. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Tue, Sep 3, 2013 at 1:46 PM, Nicolas Pitre wrote: > So... looks like pack v4 is now "functional". > > However something is still wrong as it operates about 6 times slower > than pack v3. > > Anyone wishes to investigate? You recurse in decode_entries too deep.I check the first 1 decode_entries() calls in pv4_get_tree(). The deepest level is 3491. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Document pack v4 format
On Tue, 3 Sep 2013, Nicolas Pitre wrote: > On Sat, 31 Aug 2013, Nguyễn Thái Ngọc Duy wrote: > > > > > Signed-off-by: Nguyễn Thái Ngọc Duy > > --- > > Incorporated suggestions by Nico and Junio. I went ahead and added > > escape hatches for converting thin packs to full ones so the document > > does not really match the code (I've been watching Nico's repository, > > commit reading is added, good stuff!) > > Now tree reading is added. multiple encoding bug fixes trickled down to > their originating commits as well. > > Something is still wrong with deltas though. Deltas fixed now. So... looks like pack v4 is now "functional". However something is still wrong as it operates about 6 times slower than pack v3. Anyone wishes to investigate? Nicolas
Re: [PATCH v2] Document pack v4 format
On Sat, 31 Aug 2013, Nguyễn Thái Ngọc Duy wrote: > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > Incorporated suggestions by Nico and Junio. I went ahead and added > escape hatches for converting thin packs to full ones so the document > does not really match the code (I've been watching Nico's repository, > commit reading is added, good stuff!) Now tree reading is added. multiple encoding bug fixes trickled down to their originating commits as well. Something is still wrong with deltas though. Nicolas
[PATCH v2] Document pack v4 format
Signed-off-by: Nguyễn Thái Ngọc Duy --- Incorporated suggestions by Nico and Junio. I went ahead and added escape hatches for converting thin packs to full ones so the document does not really match the code (I've been watching Nico's repository, commit reading is added, good stuff!) The proposal is, value 0 in the index to ident table is reserved, followed by the ident string. The real index to ident table is idx-1. Similarly, the value 1 in the index to path name table is reserved (value 0 is already used for referring back to base tree) so the actual index is idx-2. Documentation/technical/pack-format.txt | 128 +++- 1 file changed, 127 insertions(+), 1 deletion(-) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 8e5bf60..c866287 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -1,7 +1,7 @@ Git pack format === -== pack-*.pack files have the following format: +== pack-*.pack files version 2 and 3 have the following format: - A header appears at the beginning and consists of the following: @@ -36,6 +36,127 @@ Git pack format - The trailer records 20-byte SHA-1 checksum of all of the above. +== pack-*.pack files version 4 have the following format: + + - A header appears at the beginning and consists of the following: + + 4-byte signature: + The signature is: {'P', 'A', 'C', 'K'} + + 4-byte version number (network byte order): must be 4 + + 4-byte number of objects contained in the pack (network byte order) + + - A series of tables, described separately. + + - The tables are followed by number of object entries, each of + which looks like below: + + (undeltified representation) + n-byte type and length (4-bit type, (n-1)*7+4-bit length) + data + + (deltified representation) + n-byte type and length (4-bit type, (n-1)*7+4-bit length) + base object name in SHA-1 reference encoding + compressed delta data + + In undeltified format, blobs and tags ares compressed. Trees are + not compressed at all. Some headers in commits are stored + uncompressed, the rest is compressed. Tree and commit + representations are described in detail separately. + + Blobs and tags are deltified and compressed the same way in + v3. Commits are not delitifed. Trees are deltified using + undeltified representation. + + - The trailer records 20-byte SHA-1 checksum of all of the above. + +=== Pack v4 tables + + - A table of sorted SHA-1 object names for all objects contained in + the pack. + + This table can be referred to using "SHA-1 reference encoding": + It's an index number in variable length encoding. If it's + non-zero, its value minus one is the index in this table. If it's + zero, 20 bytes of SHA-1 is followed. + + - Ident table: the uncompressed length in variable encoding, + followed by zlib-compressed dictionary. Each entry consists of + two prefix bytes storing timezone followed by a NUL-terminated + string. + + Entries should be sorted by frequency so that the most frequent + entry has the smallest index, thus most efficient variable + encoding. + + The table can be referred to using "ident reference encoding": + It's an index number in variable length encoding. If it's + non-zero, its value minus one is the index in this table. If it's + zero, a new entry in the same format is followed: two prefix + bytes and a NUL-terminated string. + + - Tree path table: the same format to ident table. Each entry + consists of two prefix bytes storing tree entry mode, then a + NUL-terminated path name. Same sort order recommendation applies. + +=== Commit representation + + - n-byte type and length (4-bit type, (n-1)*7+4-bit length) + + - Tree SHA-1 in SHA-1 reference encoding + + - Parent count in variable length encoding + + - Parent SHA-1s in SHA-1 reference encoding + + - Author reference in ident reference encoding + + - Author timestamp in variable length encoding + + - Committer reference in ident reference encoding + + - Committer timestamp in variable length encoding + + - Compressed data of remaining header and the body + +=== Tree representation + + - n-byte type and length (4-bit type, (n-1)*7+4-bit length) + + - Number of tree entries in variable length encoding + + - A number of entries, each starting with path component reference: +an number, in variable length encoding. + +If the path component reference is greater than 1, its value minus +two is the index in tree path table. The path component reference +is followed by the tree entry SHA-1 in SHA-1 reference encoding. + +If the path component reference is 1, it's followed by + +- two prefix bytes representing tree entry mode + +- NUL-terminated path name + +- tree entry SHA-1 in SHA-1 reference encoding + +If the path compone