Re: Exact format of tree objets

2013-06-18 Thread Chico Sokol
Thanks!

By the way, where can I find this kind of specification? I couldn't
find the spec of tree objects here:
https://github.com/git/git/tree/master/Documentation


--
Chico Sokol


On Wed, Jun 12, 2013 at 11:06 AM, Jakub Narebski jna...@gmail.com wrote:
 Junio C Hamano gitster at pobox.com writes:
 Chico Sokol chico.sokol at gmail.com writes:

  Is there any official documentation of tree objets format? Are tree
  objects encoded specially in some way? How can I parse the inflated
  contents of a tree object?
 
  We're suspecting that there is some kind of special format or
  encoding, because the command git cat-file -p sha show me ...
  While git cat-file tree sha generate ...

 cat-file -p is meant to be human-readable form.  The latter gives
 the exact byte contents read_sha1_file() sees, which is a binary
 format.  Essentially, it is a sequence of:

  - mode of the entry encoded in octal, without any leading '0' pad;
  - pathname component of the entry, terminated with NUL;
  - 20-byte SHA-1 object name.

 I always wondered why this is the sole object format where SHA-1 is in 20-
 byte binary format and not 40-chars hexadecimal string format...

 --
 Jakub Narębski




 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Exact format of tree objets

2013-06-18 Thread Chico Sokol
What is the encoding of the filename?


--
Chico Sokol


On Tue, Jun 11, 2013 at 3:26 PM, Ilari Liusvaara
ilari.liusva...@elisanet.fi wrote:
 On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote:
 Is there any official documentation of tree objets format? Are tree
 objects encoded specially in some way? How can I parse the inflated
 contents of a tree object?

 Tree object consists of entries, each concatenation of:
 - Octal mode (using ASCII digits 0-7).
 - Single SPACE (0x20)
 - Filename
 - Single NUL (0x00)
 - 20-byte binary SHA-1 of referenced object.

 At least following octal modes are known:
 4: Directory (tree).
 100644: Regular file (blob).
 100755: Executable file (blob).
 12: Symbolic link (blob).
 16: Submodule (commit).

 The entries are always sorted in (bytewise) lexicographical order,
 except directories sort like there was impiled '/' at the end.

 So e.g.:
 !  0  9  a  a-  a- (directory)  a (directory)  a0  ab  b  z.


 The idea of sorting directories specially is that if one recurses
 upon hitting a directory and uses '/' as path separator, then the
 full filenames are in bytewise lexicographical order.

 -Ilari
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Exact format of tree objets

2013-06-11 Thread Chico Sokol
Is there any official documentation of tree objets format? Are tree
objects encoded specially in some way? How can I parse the inflated
contents of a tree object?

We're suspecting that there is some kind of special format or
encoding, because the command git cat-file -p sha show me the
expected output, something like:

100644 blob 2beae51a0e14b3167fd7e81119972caef95779f4.gitignore
100644 blob 7c817960e954f0278a6eee8d58611f61445167e8LICENSE.txt
100644 blob 30e849cba985d74bfd29696f6dee5a40abaacb03README
...


While git cat-file tree sha generate an strange output, which
indicate some kink of encoding problem. Something like:

100644 .gitignore+��▒,��Wy�100644
LICENSE.txt|�y`�T�'�n��XaaDQg�100644 README0�I˩��K�)


Thanks,







--
Chico Sokol
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-22 Thread Chico Sokol
I'm not criticizing JGit, guys. It simply doesn't fit into our needs.
We're not interested in mapping git commands in java and don't have
the same RAM limitations.

I know JGit team is doing a great job and we do not intend to build a
library with such completeness.

Are you guys contributors of JGit? Can you guys point me out to the
code that unpacks git objects? The closest I could get was that class:
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java

It seems to be a standard and a non standard format of the packed
object, as I read the comments of this method:
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java#L272

I suspect that the default inflater class of java api expect the
object to be in the standard format.

What the following comment mean? What's the Experimental pack-based
format? Is there any docs on the specs of that?

We must determine if the buffer contains the standard
zlib-deflated stream or the experimental format based
on the in-pack object format. Compare the header byte
for each format:
RFC1950 zlib w/ deflate : 0www1000 : 0 = www = 7
Experimental pack-based : Sttt : ttt = 1,2,3,4


--
Chico Sokol


On Wed, May 22, 2013 at 2:59 AM, Shawn Pearce spea...@spearce.org wrote:
 On Tue, May 21, 2013 at 3:18 PM, Chico Sokol chico.so...@gmail.com wrote:
 Ok, we discovered that the commit object actually contains the tree
 object's sha1, by reading its contents with python zlib library.

 So the bug must be with our java code (we're building a java lib).

 Is there any non-standard issue in git's zlib compression? We're
 decompressing its contents with java default zlib api, so it should
 work normally, here's our code, that's printing that wrong output:

 import java.io.File;
 import java.io.FileInputStream;
 import java.util.zip.InflaterInputStream;
 import org.apache.commons.io.IOUtils;
 ...
 File obj = new 
 File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
 InflaterInputStream inflaterInputStream = new InflaterInputStream(new
 FileInputStream(obj));
 System.out.println(IOUtils.readLines(inflaterInputStream));
 ...
 Currently, we're trying to parse commit objects. After decompressing
 the contents of a commit object file we got the following output:

 commit 191
 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300
 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300

 first commit

 Your code is broken. IOUtils is probably corrupting what you get back.
 After inflating the stream you should see the object type (commit),
 space, its length in bytes as a base 10 string, and then a NUL ('\0').
 Following that is the tree line, and parent(s) if any. I wonder if
 IOUtils discarded the remainder of the line after the NUL and did not
 consider the tree line.

 And you wonder why JGit code is confusing. We can't rely on standard
 Java APIs to do the right thing, because commonly used libraries have
 made assumptions that disagree with the way Git works.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-22 Thread Chico Sokol
 Your code is broken. IOUtils is probably corrupting what you get back.
 After inflating the stream you should see the object type (commit),
 space, its length in bytes as a base 10 string, and then a NUL ('\0').
 Following that is the tree line, and parent(s) if any. I wonder if
 IOUtils discarded the remainder of the line after the NUL and did not
 consider the tree line.


Maybe you're right, Shawn. I've also tried the following code:

File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
InflaterInputStream inflaterInputStream = new InflaterInputStream(new
FileInputStream(dotGit));
ByteArrayOutputStream os = new ByteArrayOutputStream();
IOUtils.copyLarge(inflaterInputStream, os);
System.out.println(new String(os.toByteArray()));

But we got the same result, I'll try to read the bytes by myself
(without apache IOUtils). Is the contents of a unpacked object utf-8
encoded?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-22 Thread Chico Sokol
Solved! It was exaclty the problem pointed by Shawn.

Here is the working code:

File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
InflaterInputStream inflaterInputStream = new InflaterInputStream(new
FileInputStream(dotGit));
Integer read = inflaterInputStream.read();
while(read != 0) { //reading the bytes from 'commit lenght\0'
read = inflaterInputStream.read();
System.out.println((char)read.byteValue());
}
ByteArrayOutputStream os = new ByteArrayOutputStream();
IOUtils.copyLarge(inflaterInputStream, os);
System.out.println(new String(os.toByteArray()));

Thank you all!



--
Chico Sokol


On Wed, May 22, 2013 at 11:25 AM, Chico Sokol chico.so...@gmail.com wrote:
 Your code is broken. IOUtils is probably corrupting what you get back.
 After inflating the stream you should see the object type (commit),
 space, its length in bytes as a base 10 string, and then a NUL ('\0').
 Following that is the tree line, and parent(s) if any. I wonder if
 IOUtils discarded the remainder of the line after the NUL and did not
 consider the tree line.


 Maybe you're right, Shawn. I've also tried the following code:

 File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
 InflaterInputStream inflaterInputStream = new InflaterInputStream(new
 FileInputStream(dotGit));
 ByteArrayOutputStream os = new ByteArrayOutputStream();
 IOUtils.copyLarge(inflaterInputStream, os);
 System.out.println(new String(os.toByteArray()));

 But we got the same result, I'll try to read the bytes by myself
 (without apache IOUtils). Is the contents of a unpacked object utf-8
 encoded?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Reading commit objects

2013-05-21 Thread Chico Sokol
Hello,

I'm building a library to manipulate git repositories (interacting
directly with the filesystem).

Currently, we're trying to parse commit objects. After decompressing
the contents of a commit object file we got the following output:

commit 191
author Francisco Sokol chico.so...@gmail.com 1369140112 -0300
committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300

first commit

We hoped to get the same output of a git cat-file -p sha1, but
that didn't happened. From a commit object, how can I find tree object
hash of this commit?

Thanks,


--
Chico Sokol
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread Chico Sokol
Ok, we discovered that the commit object actually contains the tree
object's sha1, by reading its contents with python zlib library.

So the bug must be with our java code (we're building a java lib).

Is there any non-standard issue in git's zlib compression? We're
decompressing its contents with java default zlib api, so it should
work normally, here's our code, that's printing that wrong output:

import java.io.File;
import java.io.FileInputStream;
import java.util.zip.InflaterInputStream;
import org.apache.commons.io.IOUtils;
...
File obj = new File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
InflaterInputStream inflaterInputStream = new InflaterInputStream(new
FileInputStream(obj));
System.out.println(IOUtils.readLines(inflaterInputStream));


I know that here it's not the right place to ask about java issues,
but we would appreciate any help any help.



--
Chico Sokol


On Tue, May 21, 2013 at 6:37 PM, John Szakmeister j...@szakmeister.net wrote:
 On Tue, May 21, 2013 at 5:21 PM, Chico Sokol chico.so...@gmail.com wrote:
 Hello,

 I'm building a library to manipulate git repositories (interacting
 directly with the filesystem).

 Currently, we're trying to parse commit objects. After decompressing
 the contents of a commit object file we got the following output:

 commit 191
 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300
 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300

 first commit

 Does `git cat-file -p sha1` show a tree object?  FWIW, I expected to
 see a tree line there, so maybe this object was created without a
 tree?  I also don't see a parent listed.

 I did this on one of my repos:

 buf = open('.git/objects/cd/da219e4d7beceae55af73c44cb3c9e1ec56802', 
 'rb').read()
 import zlib
 zlib.decompress(buf)
 'commit 246\x00tree 2abfe1a7bedb29672a223a5c5f266b7dc70a8d87\nparent
 0636e7ff6b79470b0cd53ceacea88e7796f202ce\nauthor John Szakmeister
 j...@szakmeister.net 1369168481 -0400\ncommitter John Szakmeister
 j...@szakmeister.net 1369168481 -0400\n\nGot a file listing.\n'

 So at least creating the commits with Git, I see a tree.  How was the
 commit you're referencing created?  Perhaps something is wrong with
 that process?

 We hoped to get the same output of a git cat-file -p sha1, but
 that didn't happened. From a commit object, how can I find tree object
 hash of this commit?

 I'd expect that too.

 -John
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread Chico Sokol
It was git who created that object.

We're trying to build a improved java library focused in our needs
(jgit has a really confusing api focused in solving egit needs). But
we're about to get into their code to discover how to decompress git
objects.


--
Chico Sokol


On Tue, May 21, 2013 at 7:22 PM, Junio C Hamano gits...@pobox.com wrote:
 Chico Sokol chico.so...@gmail.com writes:

 Ok, we discovered that the commit object actually contains the tree
 object's sha1, by reading its contents with python zlib library.

 So the bug must be with our java code (we're building a java lib).

 Why aren't you using jgit?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html