Re: Understanding pack format
On Tue, Nov 6, 2018 at 3:23 AM Farhan Khan wrote: > To follow-up from the other day, I have been reading the code that > retrieves the pack entry for the past 3 days now without much success. > But there are quite a few abstractions and I get lost half-way down > the line. Jeff already gave you some pointers. This is just a side note. I think it's easier to run the code under a debugger and see what it does than just reading it. You can create a repo with just one blob to have better control over it (small packs also make it possible to examine with a hex editor in parallel), e.g. git init foo cd foo echo hello >file git add file git repack -ad gdb --args git show :./file then put a breakpoint in some interesting functions (perhaps one of those Jeff pointed out) -- Duy
Re: Understanding pack format
On Mon, Nov 05, 2018 at 09:23:45PM -0500, Farhan Khan wrote: > I am trying to identify where the content from a pack comes from. I > traced it back to sha1-file.c:read_object(), which will return the > 'content'. I want to know where the 'content' comes from, which seems > to come from sha1-file.c:oid_object_info_extended. This goes into > packfile.c:find_pack_entry(), but from here I get lost. I do not > understand what is happening. > > How does it retrieve the pack content? I am lost here. Please assist. > This is in the technical git documentation, but it was not clear. After find_pack_entry() tells us the object is in a pack, we end up in packed_object_info(). Depending what the caller is asking for, there are a couple different strategies (because we try to avoid loading the whole object if we don't need it). Probably the one you're interested in is just grabbing the content, which happens via cache_or_unpack_entry(). The cached case is less interesting, so try unpack_entry(), which is what actually reads the bytes out of the packfile. -Peff
Re: Understanding pack format
On Fri, Nov 2, 2018 at 12:00 PM Duy Nguyen wrote: > > On Fri, Nov 2, 2018 at 7:19 AM Junio C Hamano wrote: > > > > Farhan Khan writes: > > > > > ...Where is this in the git code? That might > > > serve as a good guide. > > > > There are two major codepaths. One is used at runtime, giving us > > random access into the packfile with the help with .idx file. The > > other is used when receiving a new packstream to create an .idx > > file. > > The third path is copying/reusing objects in > builtin/pack-objects.c::write_reuse_object(). Since it's mostly > encoding the header of new objects in pack, it could also be a good > starting point. Then you can move to write_no_reuse_object() and get > how the data is encoded, deltified or not (yeah not parsed, but I > think it's more or less the same thing conceptually). > -- > Duy Hi all, To follow-up from the other day, I have been reading the code that retrieves the pack entry for the past 3 days now without much success. But there are quite a few abstractions and I get lost half-way down the line. I am trying to identify where the content from a pack comes from. I traced it back to sha1-file.c:read_object(), which will return the 'content'. I want to know where the 'content' comes from, which seems to come from sha1-file.c:oid_object_info_extended. This goes into packfile.c:find_pack_entry(), but from here I get lost. I do not understand what is happening. How does it retrieve the pack content? I am lost here. Please assist. This is in the technical git documentation, but it was not clear. Thank you, -- Farhan Khan PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE
Re: Understanding pack format
On Fri, Nov 2, 2018 at 7:19 AM Junio C Hamano wrote: > > Farhan Khan writes: > > > ...Where is this in the git code? That might > > serve as a good guide. > > There are two major codepaths. One is used at runtime, giving us > random access into the packfile with the help with .idx file. The > other is used when receiving a new packstream to create an .idx > file. The third path is copying/reusing objects in builtin/pack-objects.c::write_reuse_object(). Since it's mostly encoding the header of new objects in pack, it could also be a good starting point. Then you can move to write_no_reuse_object() and get how the data is encoded, deltified or not (yeah not parsed, but I think it's more or less the same thing conceptually). -- Duy
Re: Understanding pack format
On Fri, Nov 2, 2018 at 6:26 AM Farhan Khan wrote: > > Hi all, > > I am trying to understand the pack file format and have been reading > the documentation, specifically https://git-scm.com/docs/pack-format > (which is in git's own git repository as > "Documentation/technical/pack-format.txt"). I see that the file starts > with the "PACK" signature, followed by the 4 byte version and 4 byte > number of objects. After this, the documentation speaks about > Undeltified and Deltified representations. I understand conceptually > what each is, but do not know specifically how git parses it out. If by "it" you mean the deltified representations, I think it's actually documented in pack-format.txt. If you prefer C over English, look at patch-delta.c -- Duy
Re: Understanding pack format
Farhan Khan writes: > ...Where is this in the git code? That might > serve as a good guide. There are two major codepaths. One is used at runtime, giving us random access into the packfile with the help with .idx file. The other is used when receiving a new packstream to create an .idx file. Personally I find the latter a bit too dense for those who are new to the codebase, and the former would probably be easier to grok. Start from sha1-file.c::read_object(), which will eventually lead you to oid_object_info_extended() that essentially boils down to - a call to find_pack_entry() with the object name, and then - a call to packed_object_info() with the pack entry found earlier. Following packfile.c::packed_object_info() will lead you to cache_or_unpack_entry(); the unpack_entry() function is where all the action to read from the packstream for one object's worth of data and to reconstruct the object out of its deltified representation takes place.
Understanding pack format
Hi all, I am trying to understand the pack file format and have been reading the documentation, specifically https://git-scm.com/docs/pack-format (which is in git's own git repository as "Documentation/technical/pack-format.txt"). I see that the file starts with the "PACK" signature, followed by the 4 byte version and 4 byte number of objects. After this, the documentation speaks about Undeltified and Deltified representations. I understand conceptually what each is, but do not know specifically how git parses it out. Can someone please explain this to me? Is there any sample code of how to interpret each entry? Where is this in the git code? That might serve as a good guide. I see a few references to "PACK_SIGNATURE", but not certain which actually reads the data. Thanks! -- Farhan Khan PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE