Re: Exact format of tree objets
Thanks! By the way, where can I find this kind of specification? I couldn't find the spec of tree objects here: https://github.com/git/git/tree/master/Documentation -- Chico Sokol On Wed, Jun 12, 2013 at 11:06 AM, Jakub Narebski jna...@gmail.com wrote: Junio C Hamano gitster at pobox.com writes: Chico Sokol chico.sokol at gmail.com writes: Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? We're suspecting that there is some kind of special format or encoding, because the command git cat-file -p sha show me ... While git cat-file tree sha generate ... cat-file -p is meant to be human-readable form. The latter gives the exact byte contents read_sha1_file() sees, which is a binary format. Essentially, it is a sequence of: - mode of the entry encoded in octal, without any leading '0' pad; - pathname component of the entry, terminated with NUL; - 20-byte SHA-1 object name. I always wondered why this is the sole object format where SHA-1 is in 20- byte binary format and not 40-chars hexadecimal string format... -- Jakub Narębski -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
What is the encoding of the filename? -- Chico Sokol On Tue, Jun 11, 2013 at 3:26 PM, Ilari Liusvaara ilari.liusva...@elisanet.fi wrote: On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote: Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? Tree object consists of entries, each concatenation of: - Octal mode (using ASCII digits 0-7). - Single SPACE (0x20) - Filename - Single NUL (0x00) - 20-byte binary SHA-1 of referenced object. At least following octal modes are known: 4: Directory (tree). 100644: Regular file (blob). 100755: Executable file (blob). 12: Symbolic link (blob). 16: Submodule (commit). The entries are always sorted in (bytewise) lexicographical order, except directories sort like there was impiled '/' at the end. So e.g.: ! 0 9 a a- a- (directory) a (directory) a0 ab b z. The idea of sorting directories specially is that if one recurses upon hitting a directory and uses '/' as path separator, then the full filenames are in bytewise lexicographical order. -Ilari -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
Junio C Hamano gitster at pobox.com writes: Chico Sokol chico.sokol at gmail.com writes: Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? We're suspecting that there is some kind of special format or encoding, because the command git cat-file -p sha show me ... While git cat-file tree sha generate ... cat-file -p is meant to be human-readable form. The latter gives the exact byte contents read_sha1_file() sees, which is a binary format. Essentially, it is a sequence of: - mode of the entry encoded in octal, without any leading '0' pad; - pathname component of the entry, terminated with NUL; - 20-byte SHA-1 object name. I always wondered why this is the sole object format where SHA-1 is in 20- byte binary format and not 40-chars hexadecimal string format... -- Jakub Narębski -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Exact format of tree objets
Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? We're suspecting that there is some kind of special format or encoding, because the command git cat-file -p sha show me the expected output, something like: 100644 blob 2beae51a0e14b3167fd7e81119972caef95779f4.gitignore 100644 blob 7c817960e954f0278a6eee8d58611f61445167e8LICENSE.txt 100644 blob 30e849cba985d74bfd29696f6dee5a40abaacb03README ... While git cat-file tree sha generate an strange output, which indicate some kink of encoding problem. Something like: 100644 .gitignore+��▒,��Wy�100644 LICENSE.txt|�y`�T�'�n��XaaDQg�100644 README0�I˩��K�) Thanks, -- Chico Sokol -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote: Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? Tree object consists of entries, each concatenation of: - Octal mode (using ASCII digits 0-7). - Single SPACE (0x20) - Filename - Single NUL (0x00) - 20-byte binary SHA-1 of referenced object. At least following octal modes are known: 4: Directory (tree). 100644: Regular file (blob). 100755: Executable file (blob). 12: Symbolic link (blob). 16: Submodule (commit). The entries are always sorted in (bytewise) lexicographical order, except directories sort like there was impiled '/' at the end. So e.g.: ! 0 9 a a- a- (directory) a (directory) a0 ab b z. The idea of sorting directories specially is that if one recurses upon hitting a directory and uses '/' as path separator, then the full filenames are in bytewise lexicographical order. -Ilari -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
Chico Sokol chico.so...@gmail.com writes: Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? We're suspecting that there is some kind of special format or encoding, because the command git cat-file -p sha show me ... While git cat-file tree sha generate ... cat-file -p is meant to be human-readable form. The latter gives the exact byte contents read_sha1_file() sees, which is a binary format. Essentially, it is a sequence of: - mode of the entry encoded in octal, without any leading '0' pad; - pathname component of the entry, terminated with NUL; - 20-byte SHA-1 object name. sorted in a particular order. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html