Re: Exact format of tree objets
Thanks! By the way, where can I find this kind of specification? I couldn't find the spec of tree objects here: https://github.com/git/git/tree/master/Documentation -- Chico Sokol On Wed, Jun 12, 2013 at 11:06 AM, Jakub Narebski jna...@gmail.com wrote: Junio C Hamano gitster at pobox.com writes: Chico Sokol chico.sokol at gmail.com writes: Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? We're suspecting that there is some kind of special format or encoding, because the command git cat-file -p sha show me ... While git cat-file tree sha generate ... cat-file -p is meant to be human-readable form. The latter gives the exact byte contents read_sha1_file() sees, which is a binary format. Essentially, it is a sequence of: - mode of the entry encoded in octal, without any leading '0' pad; - pathname component of the entry, terminated with NUL; - 20-byte SHA-1 object name. I always wondered why this is the sole object format where SHA-1 is in 20- byte binary format and not 40-chars hexadecimal string format... -- Jakub Narębski -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
What is the encoding of the filename? -- Chico Sokol On Tue, Jun 11, 2013 at 3:26 PM, Ilari Liusvaara ilari.liusva...@elisanet.fi wrote: On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote: Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? Tree object consists of entries, each concatenation of: - Octal mode (using ASCII digits 0-7). - Single SPACE (0x20) - Filename - Single NUL (0x00) - 20-byte binary SHA-1 of referenced object. At least following octal modes are known: 4: Directory (tree). 100644: Regular file (blob). 100755: Executable file (blob). 12: Symbolic link (blob). 16: Submodule (commit). The entries are always sorted in (bytewise) lexicographical order, except directories sort like there was impiled '/' at the end. So e.g.: ! 0 9 a a- a- (directory) a (directory) a0 ab b z. The idea of sorting directories specially is that if one recurses upon hitting a directory and uses '/' as path separator, then the full filenames are in bytewise lexicographical order. -Ilari -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Exact format of tree objets
Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? We're suspecting that there is some kind of special format or encoding, because the command git cat-file -p sha show me the expected output, something like: 100644 blob 2beae51a0e14b3167fd7e81119972caef95779f4.gitignore 100644 blob 7c817960e954f0278a6eee8d58611f61445167e8LICENSE.txt 100644 blob 30e849cba985d74bfd29696f6dee5a40abaacb03README ... While git cat-file tree sha generate an strange output, which indicate some kink of encoding problem. Something like: 100644 .gitignore+��▒,��Wy�100644 LICENSE.txt|�y`�T�'�n��XaaDQg�100644 README0�I˩��K�) Thanks, -- Chico Sokol -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
I'm not criticizing JGit, guys. It simply doesn't fit into our needs. We're not interested in mapping git commands in java and don't have the same RAM limitations. I know JGit team is doing a great job and we do not intend to build a library with such completeness. Are you guys contributors of JGit? Can you guys point me out to the code that unpacks git objects? The closest I could get was that class: https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java It seems to be a standard and a non standard format of the packed object, as I read the comments of this method: https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java#L272 I suspect that the default inflater class of java api expect the object to be in the standard format. What the following comment mean? What's the Experimental pack-based format? Is there any docs on the specs of that? We must determine if the buffer contains the standard zlib-deflated stream or the experimental format based on the in-pack object format. Compare the header byte for each format: RFC1950 zlib w/ deflate : 0www1000 : 0 = www = 7 Experimental pack-based : Sttt : ttt = 1,2,3,4 -- Chico Sokol On Wed, May 22, 2013 at 2:59 AM, Shawn Pearce spea...@spearce.org wrote: On Tue, May 21, 2013 at 3:18 PM, Chico Sokol chico.so...@gmail.com wrote: Ok, we discovered that the commit object actually contains the tree object's sha1, by reading its contents with python zlib library. So the bug must be with our java code (we're building a java lib). Is there any non-standard issue in git's zlib compression? We're decompressing its contents with java default zlib api, so it should work normally, here's our code, that's printing that wrong output: import java.io.File; import java.io.FileInputStream; import java.util.zip.InflaterInputStream; import org.apache.commons.io.IOUtils; ... File obj = new File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(obj)); System.out.println(IOUtils.readLines(inflaterInputStream)); ... Currently, we're trying to parse commit objects. After decompressing the contents of a commit object file we got the following output: commit 191 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300 first commit Your code is broken. IOUtils is probably corrupting what you get back. After inflating the stream you should see the object type (commit), space, its length in bytes as a base 10 string, and then a NUL ('\0'). Following that is the tree line, and parent(s) if any. I wonder if IOUtils discarded the remainder of the line after the NUL and did not consider the tree line. And you wonder why JGit code is confusing. We can't rely on standard Java APIs to do the right thing, because commonly used libraries have made assumptions that disagree with the way Git works. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
Your code is broken. IOUtils is probably corrupting what you get back. After inflating the stream you should see the object type (commit), space, its length in bytes as a base 10 string, and then a NUL ('\0'). Following that is the tree line, and parent(s) if any. I wonder if IOUtils discarded the remainder of the line after the NUL and did not consider the tree line. Maybe you're right, Shawn. I've also tried the following code: File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(dotGit)); ByteArrayOutputStream os = new ByteArrayOutputStream(); IOUtils.copyLarge(inflaterInputStream, os); System.out.println(new String(os.toByteArray())); But we got the same result, I'll try to read the bytes by myself (without apache IOUtils). Is the contents of a unpacked object utf-8 encoded? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
Solved! It was exaclty the problem pointed by Shawn. Here is the working code: File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(dotGit)); Integer read = inflaterInputStream.read(); while(read != 0) { //reading the bytes from 'commit lenght\0' read = inflaterInputStream.read(); System.out.println((char)read.byteValue()); } ByteArrayOutputStream os = new ByteArrayOutputStream(); IOUtils.copyLarge(inflaterInputStream, os); System.out.println(new String(os.toByteArray())); Thank you all! -- Chico Sokol On Wed, May 22, 2013 at 11:25 AM, Chico Sokol chico.so...@gmail.com wrote: Your code is broken. IOUtils is probably corrupting what you get back. After inflating the stream you should see the object type (commit), space, its length in bytes as a base 10 string, and then a NUL ('\0'). Following that is the tree line, and parent(s) if any. I wonder if IOUtils discarded the remainder of the line after the NUL and did not consider the tree line. Maybe you're right, Shawn. I've also tried the following code: File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(dotGit)); ByteArrayOutputStream os = new ByteArrayOutputStream(); IOUtils.copyLarge(inflaterInputStream, os); System.out.println(new String(os.toByteArray())); But we got the same result, I'll try to read the bytes by myself (without apache IOUtils). Is the contents of a unpacked object utf-8 encoded? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Reading commit objects
Hello, I'm building a library to manipulate git repositories (interacting directly with the filesystem). Currently, we're trying to parse commit objects. After decompressing the contents of a commit object file we got the following output: commit 191 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300 first commit We hoped to get the same output of a git cat-file -p sha1, but that didn't happened. From a commit object, how can I find tree object hash of this commit? Thanks, -- Chico Sokol -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
Ok, we discovered that the commit object actually contains the tree object's sha1, by reading its contents with python zlib library. So the bug must be with our java code (we're building a java lib). Is there any non-standard issue in git's zlib compression? We're decompressing its contents with java default zlib api, so it should work normally, here's our code, that's printing that wrong output: import java.io.File; import java.io.FileInputStream; import java.util.zip.InflaterInputStream; import org.apache.commons.io.IOUtils; ... File obj = new File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(obj)); System.out.println(IOUtils.readLines(inflaterInputStream)); I know that here it's not the right place to ask about java issues, but we would appreciate any help any help. -- Chico Sokol On Tue, May 21, 2013 at 6:37 PM, John Szakmeister j...@szakmeister.net wrote: On Tue, May 21, 2013 at 5:21 PM, Chico Sokol chico.so...@gmail.com wrote: Hello, I'm building a library to manipulate git repositories (interacting directly with the filesystem). Currently, we're trying to parse commit objects. After decompressing the contents of a commit object file we got the following output: commit 191 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300 first commit Does `git cat-file -p sha1` show a tree object? FWIW, I expected to see a tree line there, so maybe this object was created without a tree? I also don't see a parent listed. I did this on one of my repos: buf = open('.git/objects/cd/da219e4d7beceae55af73c44cb3c9e1ec56802', 'rb').read() import zlib zlib.decompress(buf) 'commit 246\x00tree 2abfe1a7bedb29672a223a5c5f266b7dc70a8d87\nparent 0636e7ff6b79470b0cd53ceacea88e7796f202ce\nauthor John Szakmeister j...@szakmeister.net 1369168481 -0400\ncommitter John Szakmeister j...@szakmeister.net 1369168481 -0400\n\nGot a file listing.\n' So at least creating the commits with Git, I see a tree. How was the commit you're referencing created? Perhaps something is wrong with that process? We hoped to get the same output of a git cat-file -p sha1, but that didn't happened. From a commit object, how can I find tree object hash of this commit? I'd expect that too. -John -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
It was git who created that object. We're trying to build a improved java library focused in our needs (jgit has a really confusing api focused in solving egit needs). But we're about to get into their code to discover how to decompress git objects. -- Chico Sokol On Tue, May 21, 2013 at 7:22 PM, Junio C Hamano gits...@pobox.com wrote: Chico Sokol chico.so...@gmail.com writes: Ok, we discovered that the commit object actually contains the tree object's sha1, by reading its contents with python zlib library. So the bug must be with our java code (we're building a java lib). Why aren't you using jgit? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html