Re: [PATCH] rev-list: add --full-objects flag.

2005-07-11 Thread Linus Torvalds


On Mon, 11 Jul 2005, Eric W. Biederman wrote:
 
 I guess I was expecting to pull from one tree into another unrelated
 tree.  Getting a tree with two heads and then be able to merge them
 together.

You can do it, but you have to do it by hand. It's a valid operation, but 
it's not an operation I want people to do by mistake, so it's not 
something the trivial helper scripts help with.

The way to do it by hand is to just use something stupid that doesn't
understand what it's doing anyway, and just copy the files over. cp -a 
or rsync works fine. Then just do git resolve by hand. It's not very 
hard at all, but it's definitely something that should be a special case.

 A couple of questions.
 
 1) Does git-clone-script when packed copy the entire repository
or just take a couple of slices of the tree where you have
references?

It only gets the objects needed for the references, nothing more.

So if you only get one branch, it will leave the objects that are specific 
to other branches alone.

 2) Is there a way for a pack to create deltas against objects
that are not in the tree?  For a dumb repository making incremental
changes this is ideal.

A pack can only have deltas against objects in that pack. It caan't even 
have deltas to other objects in the same tree, it literally is only 
_within_ a pack. This is so that each pack is totally independent: you can 
always unpack (and verify) the objects in a pack _without_ having anything 
else (of course, the end result is often not a full project, and you won't 
have any references, but at least the _objects_ are valid).

I don't want to have deltas to outside the pack, because while it's 
obviously very nice from a size packing standpoint, it's totally horrid 
from an infrastructure standpoint. It would make it possible to have 
circular dependencies (ie deltas against each other) that could only be 
resolved by having a third pack (or the unpacked object).

It would also means that you may have to have two packs mapped at the same
time to unpack them, which was very much against what I was aiming for: I
think that in the long run, for truly huge projects, you'd want to have a
history of packs, each maybe a gigabyte in size, and you may be in the 
situation that you simply cannot have two packs mapped at the same time 
because you don't have enough virtual memory for it.

So then inter-pack deltas would mean that you'd have to have partial pack 
mapping etc horrid special case logic. Right now, because a pack is 
always self-sufficient, you know that in order to unpack an object, if you 
find it in the index file, you will be able to unpack it by just mapping 
that pack and going off..

So the rule is: don't pack too often. The unpacked objects are actually 
working really really well as long as you don't have tens of thousands of 
them. Having a few hundred (or even a few thousand) unpacked objects is 
not a problem at all. Then you do a git repack when it starts getting 
uncomfortable, and you you continue.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rev-list: add --full-objects flag.

2005-07-11 Thread Linus Torvalds


On Mon, 11 Jul 2005, Eric W. Biederman wrote:
 
 I'm having the worst time putting together a mental model of how git
 works, and the documentation is spotty enough that it hasn't been
 helpful.  So I am wading through the code.  It seems every time I turn
 a corner there is another rough spot.

Btw, I know I'm bad at writing docs, but what I _do_ enjoy doing is
answering reasonably specific technical questions, and maybe somebody else
can write docs by taking advantage of me that way.

I tried to write the tutorial in a way that it also tries to explain how
git works (not just a do this, but a you update the index file and then
write the result out as a tree object), but it obviously covers a fairly
limited part of what git actually can do, and at the same time it doesn't
go into a lot of detail.

And part of that is not just my inability to write documentation, it's
also that I just have the wrong view of the project, ie I probably just
take a lot of things for granted and consider them obvious, even though
they aren't, and then I probably occasionally explain things that aren't
worth explaining, because either they _are_ obvious, or people just don't
care and they are irrelevant.

I'd love to see somebody write up more of a this is how you use git kind
of tutorial, _and_ on the other hand more of a low-level explanation of
the notion of an object store where objects refer to each other by their
SHA1 names, and how that is represented in the filesystem and/or in packs. 

Something with a few pictures would be great (ie screenshots of gitk, but
also something that tries to just visually show hot tags point to commits
that point to parents and trees, and trees pointing to other trees and
then blobs).

All things that I'm a complete idiot at, but that would help users 
visualize what the heck git is actually _doing_, so that they don't just 
parrot some magic command line that they don't understand, but can 
actually reason about what they are doing.

I think a lot of people do understand this, but yes, the docs are kind of 
lacking.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rev-list: add --full-objects flag.

2005-07-11 Thread Linus Torvalds


On Mon, 11 Jul 2005, Eric W. Biederman wrote:

 Ok.  Only the dumb methods are allowed.

Well, no, you can actually do git-clone-pack by hand in that git archive,
and it will use the smart packing to get the other end, even if it is
totally unrelated to the current project.

But you have to do it by hand in the sense that none of the nice helper
scripts will help you to do this. Merging two unrelated projects really is
a very special operation. I've done it once (gitk into git), and I don't
think we'll see it done very many times again.

  So if you only get one branch, it will leave the objects that are specific 
  to other branches alone.
 
 Hmm.  As I recall reading the code it grabs everything that is
 in .git/refs/*.

Only by default.

If you specify a branch (or five) git-clone-pack will grab only that
branch.

However, I don't think git clone (the script) even exposes that, so
right now you'd not even see it - git clone only exposes the get all
the branches by default behaviour.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rev-list: add --full-objects flag.

2005-07-11 Thread Linus Torvalds


On Mon, 11 Jul 2005, Eric W. Biederman wrote:
 
 The question:
 Does git-upload-pack which gets it's list of objects
 with git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3
 get any history beyond the top of tree of each branch.  
 
 As I read the code it does not.  

It does. It gets all the history necessary for each branch. git-rev-list
will walk the whole history until it hits commits that as been marked as
uninteresting (or the parents of commits that have been marked as
uninteresting), and those are the ones that the receiver already has, of
course.

So after you get a pack, you have all the history for all the branches you 
got.

A branch you _didn't_ get, you don't get any history for, of course, but 
that doesn't matter. You'll get that history if you ever pull the branch 
later.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rev-list: add --full-objects flag.

2005-07-11 Thread Eric W. Biederman
Linus Torvalds [EMAIL PROTECTED] writes:

 On Mon, 11 Jul 2005, Eric W. Biederman wrote:
 
 The question:
 Does git-upload-pack which gets it's list of objects
 with git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3
 get any history beyond the top of tree of each branch.  
 
 As I read the code it does not.  

 It does. It gets all the history necessary for each branch. git-rev-list
 will walk the whole history until it hits commits that as been marked as
 uninteresting (or the parents of commits that have been marked as
 uninteresting), and those are the ones that the receiver already has, of
 course.

Ok.  So the intention is sane then.

Looking closer it appears that commit_list_insert is recursive
and that is what I missed.

 So after you get a pack, you have all the history for all the branches you 
 got.

 A branch you _didn't_ get, you don't get any history for, of course, but 
 that doesn't matter. You'll get that history if you ever pull the branch 
 later.

Right.  Things work well if you have all of the history.


Eric
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rev-list: add --full-objects flag.

2005-07-10 Thread Sven Verdoolaege
On Sat, Jul 09, 2005 at 03:09:02PM -0600, Eric W. Biederman wrote:
 The current intelligent fetch currently has a problem that it cannot
 be used to bootstrap a repository.  If you don't have an ancestor
 of what you are fetching you can't fetch it.
 

Not sure if this is what you want, but you could use the
following gitweb patch (to be applied on top of my previous
patches) to get a git tree snapshot for bootstrapping.

http://www.liacs.nl/~sverdool/gitweb.cgi?p=gitweb.git;a=summary
http://www.liacs.nl/~sverdool/gitweb.git/

skimo
--
Support pack snapshots.

---
commit f76a442a0e2166b3f17db0e496545a600a33f94c
tree f8f089ab738864e69e0155b10262dbec832b4a11
parent 8392280de17a89a451c1f7db4e268f2047d4aa83
author Sven Verdoolaege [EMAIL PROTECTED] Sun, 10 Jul 2005 23:56:42 +0200
committer Sven Verdoolaege [EMAIL PROTECTED] Sun, 10 Jul 2005 23:56:42 +0200

 gitweb.cgi |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gitweb.cgi b/gitweb.cgi
--- a/gitweb.cgi
+++ b/gitweb.cgi
@@ -2058,8 +2058,9 @@ sub git_snapshot {
  th/th\n .
  /tr\n;
my %types = (
-   'Bzipped tar archive' = 'tar.bz2',
-   'Gzipped tar archive' = 'tar.gz',
+   'Source tree (bzipped tar archive)' = 'tar.bz2',
+   'Source tree (gzipped tar archive)' = 'tar.gz',
+   'Git tree (pack file)' = 'pack',
);
my $alternate = 0;
for my $type (sort keys %types) {
@@ -2094,6 +2095,7 @@ sub git_serve_snapshot {
my %info = (
'tar.bz2' = [ 'application/x-bzip2', 'bzip2' ],
'tar.gz' = [ 'application/x-gzip', 'gzip' ],
+   'pack' = [ 'application/x-git-pack' ],
);
if (!exists $info{$st}) {
die_error(undef, Unknown snapshot type.);
@@ -2101,7 +2103,10 @@ sub git_serve_snapshot {
my ($type, $zip) = @{$info{$st}};
print $cgi-header(-type = $type, 
   -attachment = $project-$hash.$st);
-   open my $fd, -|, $gitbin/git-tar-tree $hash '$project-$hash' | $zip 
+   open my $fd, -|, ($st eq 'pack' ?
+   $gitbin/git-rev-list --max-count=1 --objects $hash | . 
+   $gitbin/git-pack-objects --stdout :
+   $gitbin/git-tar-tree $hash '$project-$hash' | $zip)
or return;
undef $/;
print $fd;
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rev-list: add --full-objects flag.

2005-07-10 Thread Linus Torvalds


On Sat, 9 Jul 2005, Eric W. Biederman wrote:
 
 The current intelligent fetch currently has a problem that it cannot
 be used to bootstrap a repository.  If you don't have an ancestor
 of what you are fetching you can't fetch it.

Sure you can.

See the current git clone. It's actually quite good, it's a pleasure to 
use now that it gives updates on how much it has done.

Just do

git clone src dest

to try it out. It starts out silent (for big repositories) because it 
takes a while to get the whole rev list, but once it gets going it's quite 
nice and gives a nice progress report..

It uses the exact same server side code that git-fetch-pack does (ie it
just starts git-upload-pack on the server).

Now, one thing you cannot do is to start a totally new _project_ on the
server side. In order to do a git-send-pack, you need to first create a
directory and do a git-init-db on the remote side.

So to create a new project, what you need to do is

src$ ssh target

target$ mkdir new-project
target$ cd new-project
target$ git-init-db
target$ exit

src$ git-send-pack target:new-project master

and you've now sent your master branch to the new project at 
target:new-project.

You can even populate multiple branches at a time: just list them all (you
do have to list them, because by default git-send-pack will update the
_common_ branches, and since the other end is empty, there obviously are
no common branches to start with).

Ahh, you should even be able to automate the sending of all branches by
doing

git-send-pack target:new-project $(cd .git ; find refs -type f)

I think - that will end up being equivalent to a reverse clone.

The smart clients are doing pretty damn well, I think.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rev-list: add --full-objects flag.

2005-07-07 Thread Linus Torvalds


On Thu, 7 Jul 2005, Junio C Hamano wrote:
 
 Again you are right.  How about --full-objects instead?

I don't mind the --objects=xxx format per se, but it would need to 
verify that the =xxx was either valid or wasn't there at all. So what I 
objected to was not that it was easy to mis-spell, but that if misspelled, 
the program wouldn't point it out as an error, but silently just do the 
wrong thing.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rev-list: add --full-objects flag.

2005-07-07 Thread Linus Torvalds


On Thu, 7 Jul 2005, Junio C Hamano wrote:
 
 However it does not automatically mean that the avenue I have
 been pursuing would not work; the server side preparation needs
 to be a bit more careful than what I sent, which unconditionally
 runs prune-packed.  It instead should leave the files that
 --whole-trees would have packed as plain SHA1 files, so that
 the bulk is obtained by statically generated packs and the rest
 can be handled in the commit-chain walker as before.

I really think the commit-chain walker needs to run locally (ie at the 
server end, or after fetching all the objects from the server).

I don't know how much you've tried out the git-http-pull and git-ssh-pull 
things, but their performance was quite horrid for anything half-way 
bigger, because of the totally synchronized IO.

The fetch one object, parse it, fetch the next one, parse that.. 
approach is just horrible.

I ended up preferring the rsync thing even though rsync sucked badly on
big object stores too, if only because when rsync got working, it at least
nicely pipelined the transfers, and would transfer things ten times faster
than git-ssh-pull did (maybe I'm exaggerating, but I don't think so, it
really felt that way).

And the thing is, if you purely follow one tree (which is likely the
common case for a lot of users), then you are actually always likely
better off with the mirror it model. Which is _not_ a good model for
developers (for example, me rsync'ing from Jeff's kernel repository always
got me hundreds of useless objects), but it's fine for somebody who
actually just wants to track somebody else.

And then you really can use just rsync or wget or ncftpget or anything
else that has a fetch recursively, optimizing existing objects mode.

Now, re-packing ends up causing some double transmissions, but I bet the
cost of those are going to be less than the cost of the ping-pong for
each object approach. Especially as most of the repacked objects will be 
deltas if the repacking is done properly.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html