Re: Reading commit objects

2013-05-22 Thread Shawn Pearce
On Tue, May 21, 2013 at 3:18 PM, Chico Sokol chico.so...@gmail.com wrote:
 Ok, we discovered that the commit object actually contains the tree
 object's sha1, by reading its contents with python zlib library.

 So the bug must be with our java code (we're building a java lib).

 Is there any non-standard issue in git's zlib compression? We're
 decompressing its contents with java default zlib api, so it should
 work normally, here's our code, that's printing that wrong output:

 import java.io.File;
 import java.io.FileInputStream;
 import java.util.zip.InflaterInputStream;
 import org.apache.commons.io.IOUtils;
 ...
 File obj = new File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
 InflaterInputStream inflaterInputStream = new InflaterInputStream(new
 FileInputStream(obj));
 System.out.println(IOUtils.readLines(inflaterInputStream));
...
 Currently, we're trying to parse commit objects. After decompressing
 the contents of a commit object file we got the following output:

 commit 191
 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300
 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300

 first commit

Your code is broken. IOUtils is probably corrupting what you get back.
After inflating the stream you should see the object type (commit),
space, its length in bytes as a base 10 string, and then a NUL ('\0').
Following that is the tree line, and parent(s) if any. I wonder if
IOUtils discarded the remainder of the line after the NUL and did not
consider the tree line.

And you wonder why JGit code is confusing. We can't rely on standard
Java APIs to do the right thing, because commonly used libraries have
made assumptions that disagree with the way Git works.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-22 Thread Chico Sokol
I'm not criticizing JGit, guys. It simply doesn't fit into our needs.
We're not interested in mapping git commands in java and don't have
the same RAM limitations.

I know JGit team is doing a great job and we do not intend to build a
library with such completeness.

Are you guys contributors of JGit? Can you guys point me out to the
code that unpacks git objects? The closest I could get was that class:
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java

It seems to be a standard and a non standard format of the packed
object, as I read the comments of this method:
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java#L272

I suspect that the default inflater class of java api expect the
object to be in the standard format.

What the following comment mean? What's the Experimental pack-based
format? Is there any docs on the specs of that?

We must determine if the buffer contains the standard
zlib-deflated stream or the experimental format based
on the in-pack object format. Compare the header byte
for each format:
RFC1950 zlib w/ deflate : 0www1000 : 0 = www = 7
Experimental pack-based : Sttt : ttt = 1,2,3,4


--
Chico Sokol


On Wed, May 22, 2013 at 2:59 AM, Shawn Pearce spea...@spearce.org wrote:
 On Tue, May 21, 2013 at 3:18 PM, Chico Sokol chico.so...@gmail.com wrote:
 Ok, we discovered that the commit object actually contains the tree
 object's sha1, by reading its contents with python zlib library.

 So the bug must be with our java code (we're building a java lib).

 Is there any non-standard issue in git's zlib compression? We're
 decompressing its contents with java default zlib api, so it should
 work normally, here's our code, that's printing that wrong output:

 import java.io.File;
 import java.io.FileInputStream;
 import java.util.zip.InflaterInputStream;
 import org.apache.commons.io.IOUtils;
 ...
 File obj = new 
 File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
 InflaterInputStream inflaterInputStream = new InflaterInputStream(new
 FileInputStream(obj));
 System.out.println(IOUtils.readLines(inflaterInputStream));
 ...
 Currently, we're trying to parse commit objects. After decompressing
 the contents of a commit object file we got the following output:

 commit 191
 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300
 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300

 first commit

 Your code is broken. IOUtils is probably corrupting what you get back.
 After inflating the stream you should see the object type (commit),
 space, its length in bytes as a base 10 string, and then a NUL ('\0').
 Following that is the tree line, and parent(s) if any. I wonder if
 IOUtils discarded the remainder of the line after the NUL and did not
 consider the tree line.

 And you wonder why JGit code is confusing. We can't rely on standard
 Java APIs to do the right thing, because commonly used libraries have
 made assumptions that disagree with the way Git works.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-22 Thread Chico Sokol
 Your code is broken. IOUtils is probably corrupting what you get back.
 After inflating the stream you should see the object type (commit),
 space, its length in bytes as a base 10 string, and then a NUL ('\0').
 Following that is the tree line, and parent(s) if any. I wonder if
 IOUtils discarded the remainder of the line after the NUL and did not
 consider the tree line.


Maybe you're right, Shawn. I've also tried the following code:

File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
InflaterInputStream inflaterInputStream = new InflaterInputStream(new
FileInputStream(dotGit));
ByteArrayOutputStream os = new ByteArrayOutputStream();
IOUtils.copyLarge(inflaterInputStream, os);
System.out.println(new String(os.toByteArray()));

But we got the same result, I'll try to read the bytes by myself
(without apache IOUtils). Is the contents of a unpacked object utf-8
encoded?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-22 Thread Chico Sokol
Solved! It was exaclty the problem pointed by Shawn.

Here is the working code:

File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
InflaterInputStream inflaterInputStream = new InflaterInputStream(new
FileInputStream(dotGit));
Integer read = inflaterInputStream.read();
while(read != 0) { //reading the bytes from 'commit lenght\0'
read = inflaterInputStream.read();
System.out.println((char)read.byteValue());
}
ByteArrayOutputStream os = new ByteArrayOutputStream();
IOUtils.copyLarge(inflaterInputStream, os);
System.out.println(new String(os.toByteArray()));

Thank you all!



--
Chico Sokol


On Wed, May 22, 2013 at 11:25 AM, Chico Sokol chico.so...@gmail.com wrote:
 Your code is broken. IOUtils is probably corrupting what you get back.
 After inflating the stream you should see the object type (commit),
 space, its length in bytes as a base 10 string, and then a NUL ('\0').
 Following that is the tree line, and parent(s) if any. I wonder if
 IOUtils discarded the remainder of the line after the NUL and did not
 consider the tree line.


 Maybe you're right, Shawn. I've also tried the following code:

 File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
 InflaterInputStream inflaterInputStream = new InflaterInputStream(new
 FileInputStream(dotGit));
 ByteArrayOutputStream os = new ByteArrayOutputStream();
 IOUtils.copyLarge(inflaterInputStream, os);
 System.out.println(new String(os.toByteArray()));

 But we got the same result, I'll try to read the bytes by myself
 (without apache IOUtils). Is the contents of a unpacked object utf-8
 encoded?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-22 Thread Shawn Pearce
On Wed, May 22, 2013 at 7:25 AM, Chico Sokol chico.so...@gmail.com wrote:
 Your code is broken. IOUtils is probably corrupting what you get back.
 After inflating the stream you should see the object type (commit),
 space, its length in bytes as a base 10 string, and then a NUL ('\0').
 Following that is the tree line, and parent(s) if any. I wonder if
 IOUtils discarded the remainder of the line after the NUL and did not
 consider the tree line.

...
 Is the contents of a unpacked object utf-8
 encoded?

Its more complicated than that. Commit objects are usually in utf-8,
unless a repository configuration setting told you otherwise, or an
encoding header appears in the commit. And sometimes that data lies
anyway. ISO-8859-1 is one of the safer forms of reading a commit, but
that also isn't always accurate.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-22 Thread Shawn Pearce
On Wed, May 22, 2013 at 7:20 AM, Chico Sokol chico.so...@gmail.com wrote:
 I'm not criticizing JGit, guys. It simply doesn't fit into our needs.
 We're not interested in mapping git commands in java and don't have
 the same RAM limitations.

I guess you aren't trying to process the WebKit or Linux kernel
repositories. Or you can afford more RAM than I can[1]. :-)

[1] $DAY_JOB has lots of RAM.  Lots.

 Are you guys contributors of JGit?

Not really. I had nothing to do with JGit.  :-)

 Can you guys point me out to the
 code that unpacks git objects? The closest I could get was that class:
 https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java

This class handles the loose object format in $GIT_DIR/objects, but
does not handle objects contained in pack files. That is elsewhere,
and well, more complex. Look at PackFile.java.

 It seems to be a standard and a non standard format of the packed
 object, as I read the comments of this method:
 https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java#L272

There are two formats, the official format that is used, and an
experimental format that was discarded but is still supported for
legacy reasons.

 I suspect that the default inflater class of java api expect the
 object to be in the standard format.

 What the following comment mean? What's the Experimental pack-based
 format? Is there any docs on the specs of that?

Read the code. This is the dead format that is no longer written, but
is still supported.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread Felipe Contreras
On Tue, May 21, 2013 at 4:21 PM, Chico Sokol chico.so...@gmail.com wrote:
 Hello,

 I'm building a library to manipulate git repositories (interacting
 directly with the filesystem).

 Currently, we're trying to parse commit objects. After decompressing
 the contents of a commit object file we got the following output:

 commit 191
 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300
 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300

 first commit

 We hoped to get the same output of a git cat-file -p sha1, but
 that didn't happened. From a commit object, how can I find tree object
 hash of this commit?

git rev-parse sha1:

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread John Szakmeister
On Tue, May 21, 2013 at 5:21 PM, Chico Sokol chico.so...@gmail.com wrote:
 Hello,

 I'm building a library to manipulate git repositories (interacting
 directly with the filesystem).

 Currently, we're trying to parse commit objects. After decompressing
 the contents of a commit object file we got the following output:

 commit 191
 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300
 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300

 first commit

Does `git cat-file -p sha1` show a tree object?  FWIW, I expected to
see a tree line there, so maybe this object was created without a
tree?  I also don't see a parent listed.

I did this on one of my repos:

 buf = open('.git/objects/cd/da219e4d7beceae55af73c44cb3c9e1ec56802', 
 'rb').read()
 import zlib
 zlib.decompress(buf)
'commit 246\x00tree 2abfe1a7bedb29672a223a5c5f266b7dc70a8d87\nparent
0636e7ff6b79470b0cd53ceacea88e7796f202ce\nauthor John Szakmeister
j...@szakmeister.net 1369168481 -0400\ncommitter John Szakmeister
j...@szakmeister.net 1369168481 -0400\n\nGot a file listing.\n'

So at least creating the commits with Git, I see a tree.  How was the
commit you're referencing created?  Perhaps something is wrong with
that process?

 We hoped to get the same output of a git cat-file -p sha1, but
 that didn't happened. From a commit object, how can I find tree object
 hash of this commit?

I'd expect that too.

-John
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread Chico Sokol
Ok, we discovered that the commit object actually contains the tree
object's sha1, by reading its contents with python zlib library.

So the bug must be with our java code (we're building a java lib).

Is there any non-standard issue in git's zlib compression? We're
decompressing its contents with java default zlib api, so it should
work normally, here's our code, that's printing that wrong output:

import java.io.File;
import java.io.FileInputStream;
import java.util.zip.InflaterInputStream;
import org.apache.commons.io.IOUtils;
...
File obj = new File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21);
InflaterInputStream inflaterInputStream = new InflaterInputStream(new
FileInputStream(obj));
System.out.println(IOUtils.readLines(inflaterInputStream));


I know that here it's not the right place to ask about java issues,
but we would appreciate any help any help.



--
Chico Sokol


On Tue, May 21, 2013 at 6:37 PM, John Szakmeister j...@szakmeister.net wrote:
 On Tue, May 21, 2013 at 5:21 PM, Chico Sokol chico.so...@gmail.com wrote:
 Hello,

 I'm building a library to manipulate git repositories (interacting
 directly with the filesystem).

 Currently, we're trying to parse commit objects. After decompressing
 the contents of a commit object file we got the following output:

 commit 191
 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300
 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300

 first commit

 Does `git cat-file -p sha1` show a tree object?  FWIW, I expected to
 see a tree line there, so maybe this object was created without a
 tree?  I also don't see a parent listed.

 I did this on one of my repos:

 buf = open('.git/objects/cd/da219e4d7beceae55af73c44cb3c9e1ec56802', 
 'rb').read()
 import zlib
 zlib.decompress(buf)
 'commit 246\x00tree 2abfe1a7bedb29672a223a5c5f266b7dc70a8d87\nparent
 0636e7ff6b79470b0cd53ceacea88e7796f202ce\nauthor John Szakmeister
 j...@szakmeister.net 1369168481 -0400\ncommitter John Szakmeister
 j...@szakmeister.net 1369168481 -0400\n\nGot a file listing.\n'

 So at least creating the commits with Git, I see a tree.  How was the
 commit you're referencing created?  Perhaps something is wrong with
 that process?

 We hoped to get the same output of a git cat-file -p sha1, but
 that didn't happened. From a commit object, how can I find tree object
 hash of this commit?

 I'd expect that too.

 -John
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread Junio C Hamano
Chico Sokol chico.so...@gmail.com writes:

 Hello,

 I'm building a library to manipulate git repositories (interacting
 directly with the filesystem).

 Currently, we're trying to parse commit objects. After decompressing
 the contents of a commit object file we got the following output:

Who wrote this commit object you are trying to read?  Us, or your
library (this question is to see if you are chasing the right
problem)?

 commit 191
 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300
 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300

 first commit

 We hoped to get the same output of a git cat-file -p sha1, but
 that didn't happened. From a commit object, how can I find tree object
 hash of this commit?

If you care about the byte-for-byte compatibility, never use
cat-file -p.  That is meant for human consumption.

git cat-file commit sha1 gives you the raw representation after
inflating and stripping out the first type SP length LF line.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread Junio C Hamano
Chico Sokol chico.so...@gmail.com writes:

 Ok, we discovered that the commit object actually contains the tree
 object's sha1, by reading its contents with python zlib library.

 So the bug must be with our java code (we're building a java lib).

Why aren't you using jgit?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread Chico Sokol
It was git who created that object.

We're trying to build a improved java library focused in our needs
(jgit has a really confusing api focused in solving egit needs). But
we're about to get into their code to discover how to decompress git
objects.


--
Chico Sokol


On Tue, May 21, 2013 at 7:22 PM, Junio C Hamano gits...@pobox.com wrote:
 Chico Sokol chico.so...@gmail.com writes:

 Ok, we discovered that the commit object actually contains the tree
 object's sha1, by reading its contents with python zlib library.

 So the bug must be with our java code (we're building a java lib).

 Why aren't you using jgit?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread Jonathan Nieder
Chico Sokol wrote:

 We're trying to build a improved java library focused in our needs
 (jgit has a really confusing api focused in solving egit needs).

JGit is also open to contributions, including contributions that
add less confusing API calls. :)  See

 http://wiki.eclipse.org/JGit/User_Guide
 http://wiki.eclipse.org/EGit/Contributor_Guide#JGit
 
http://wiki.eclipse.org/EGit/Contributor_Guide#Using_Gerrit_at_https:.2F.2Fgit.eclipse.org.2Fr
 https://dev.eclipse.org/mailman/listinfo/jgit-dev

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading commit objects

2013-05-21 Thread Shawn Pearce
On Tue, May 21, 2013 at 3:33 PM, Chico Sokol chico.so...@gmail.com wrote:
 It was git who created that object.

 We're trying to build a improved java library focused in our needs
 (jgit has a really confusing api focused in solving egit needs).

JGit code... is confusing because its fast. We spent a lot of time
trying to make things fast on the JVM, and somewhat comparable with C
Git even though its not in C. Some of the low-level APIs are fast
because they bypass conventional Java wisdom and just tell the #@!*
machine what to do, with no pretty bits about it. Make it pretty, it
goes slower. Or uses more RAM. Java likes RAM.

Good luck making an improved library. JGit of course is also
interested in contributions. The api package has been trying to make a
simpler calling convention for common use cases that match the command
line interface user are familiar with, but its still incomplete and
hides some optimizations that are possible with the lower-level calls.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html