Re: [PATCH] rev-list: add --full-objects flag.
On Mon, 11 Jul 2005, Eric W. Biederman wrote: I guess I was expecting to pull from one tree into another unrelated tree. Getting a tree with two heads and then be able to merge them together. You can do it, but you have to do it by hand. It's a valid operation, but it's not an operation I want people to do by mistake, so it's not something the trivial helper scripts help with. The way to do it by hand is to just use something stupid that doesn't understand what it's doing anyway, and just copy the files over. cp -a or rsync works fine. Then just do git resolve by hand. It's not very hard at all, but it's definitely something that should be a special case. A couple of questions. 1) Does git-clone-script when packed copy the entire repository or just take a couple of slices of the tree where you have references? It only gets the objects needed for the references, nothing more. So if you only get one branch, it will leave the objects that are specific to other branches alone. 2) Is there a way for a pack to create deltas against objects that are not in the tree? For a dumb repository making incremental changes this is ideal. A pack can only have deltas against objects in that pack. It caan't even have deltas to other objects in the same tree, it literally is only _within_ a pack. This is so that each pack is totally independent: you can always unpack (and verify) the objects in a pack _without_ having anything else (of course, the end result is often not a full project, and you won't have any references, but at least the _objects_ are valid). I don't want to have deltas to outside the pack, because while it's obviously very nice from a size packing standpoint, it's totally horrid from an infrastructure standpoint. It would make it possible to have circular dependencies (ie deltas against each other) that could only be resolved by having a third pack (or the unpacked object). It would also means that you may have to have two packs mapped at the same time to unpack them, which was very much against what I was aiming for: I think that in the long run, for truly huge projects, you'd want to have a history of packs, each maybe a gigabyte in size, and you may be in the situation that you simply cannot have two packs mapped at the same time because you don't have enough virtual memory for it. So then inter-pack deltas would mean that you'd have to have partial pack mapping etc horrid special case logic. Right now, because a pack is always self-sufficient, you know that in order to unpack an object, if you find it in the index file, you will be able to unpack it by just mapping that pack and going off.. So the rule is: don't pack too often. The unpacked objects are actually working really really well as long as you don't have tens of thousands of them. Having a few hundred (or even a few thousand) unpacked objects is not a problem at all. Then you do a git repack when it starts getting uncomfortable, and you you continue. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rev-list: add --full-objects flag.
On Mon, 11 Jul 2005, Eric W. Biederman wrote: I'm having the worst time putting together a mental model of how git works, and the documentation is spotty enough that it hasn't been helpful. So I am wading through the code. It seems every time I turn a corner there is another rough spot. Btw, I know I'm bad at writing docs, but what I _do_ enjoy doing is answering reasonably specific technical questions, and maybe somebody else can write docs by taking advantage of me that way. I tried to write the tutorial in a way that it also tries to explain how git works (not just a do this, but a you update the index file and then write the result out as a tree object), but it obviously covers a fairly limited part of what git actually can do, and at the same time it doesn't go into a lot of detail. And part of that is not just my inability to write documentation, it's also that I just have the wrong view of the project, ie I probably just take a lot of things for granted and consider them obvious, even though they aren't, and then I probably occasionally explain things that aren't worth explaining, because either they _are_ obvious, or people just don't care and they are irrelevant. I'd love to see somebody write up more of a this is how you use git kind of tutorial, _and_ on the other hand more of a low-level explanation of the notion of an object store where objects refer to each other by their SHA1 names, and how that is represented in the filesystem and/or in packs. Something with a few pictures would be great (ie screenshots of gitk, but also something that tries to just visually show hot tags point to commits that point to parents and trees, and trees pointing to other trees and then blobs). All things that I'm a complete idiot at, but that would help users visualize what the heck git is actually _doing_, so that they don't just parrot some magic command line that they don't understand, but can actually reason about what they are doing. I think a lot of people do understand this, but yes, the docs are kind of lacking. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rev-list: add --full-objects flag.
On Mon, 11 Jul 2005, Eric W. Biederman wrote: Ok. Only the dumb methods are allowed. Well, no, you can actually do git-clone-pack by hand in that git archive, and it will use the smart packing to get the other end, even if it is totally unrelated to the current project. But you have to do it by hand in the sense that none of the nice helper scripts will help you to do this. Merging two unrelated projects really is a very special operation. I've done it once (gitk into git), and I don't think we'll see it done very many times again. So if you only get one branch, it will leave the objects that are specific to other branches alone. Hmm. As I recall reading the code it grabs everything that is in .git/refs/*. Only by default. If you specify a branch (or five) git-clone-pack will grab only that branch. However, I don't think git clone (the script) even exposes that, so right now you'd not even see it - git clone only exposes the get all the branches by default behaviour. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rev-list: add --full-objects flag.
On Mon, 11 Jul 2005, Eric W. Biederman wrote: The question: Does git-upload-pack which gets it's list of objects with git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3 get any history beyond the top of tree of each branch. As I read the code it does not. It does. It gets all the history necessary for each branch. git-rev-list will walk the whole history until it hits commits that as been marked as uninteresting (or the parents of commits that have been marked as uninteresting), and those are the ones that the receiver already has, of course. So after you get a pack, you have all the history for all the branches you got. A branch you _didn't_ get, you don't get any history for, of course, but that doesn't matter. You'll get that history if you ever pull the branch later. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rev-list: add --full-objects flag.
Linus Torvalds [EMAIL PROTECTED] writes: On Mon, 11 Jul 2005, Eric W. Biederman wrote: The question: Does git-upload-pack which gets it's list of objects with git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3 get any history beyond the top of tree of each branch. As I read the code it does not. It does. It gets all the history necessary for each branch. git-rev-list will walk the whole history until it hits commits that as been marked as uninteresting (or the parents of commits that have been marked as uninteresting), and those are the ones that the receiver already has, of course. Ok. So the intention is sane then. Looking closer it appears that commit_list_insert is recursive and that is what I missed. So after you get a pack, you have all the history for all the branches you got. A branch you _didn't_ get, you don't get any history for, of course, but that doesn't matter. You'll get that history if you ever pull the branch later. Right. Things work well if you have all of the history. Eric - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rev-list: add --full-objects flag.
On Sat, Jul 09, 2005 at 03:09:02PM -0600, Eric W. Biederman wrote: The current intelligent fetch currently has a problem that it cannot be used to bootstrap a repository. If you don't have an ancestor of what you are fetching you can't fetch it. Not sure if this is what you want, but you could use the following gitweb patch (to be applied on top of my previous patches) to get a git tree snapshot for bootstrapping. http://www.liacs.nl/~sverdool/gitweb.cgi?p=gitweb.git;a=summary http://www.liacs.nl/~sverdool/gitweb.git/ skimo -- Support pack snapshots. --- commit f76a442a0e2166b3f17db0e496545a600a33f94c tree f8f089ab738864e69e0155b10262dbec832b4a11 parent 8392280de17a89a451c1f7db4e268f2047d4aa83 author Sven Verdoolaege [EMAIL PROTECTED] Sun, 10 Jul 2005 23:56:42 +0200 committer Sven Verdoolaege [EMAIL PROTECTED] Sun, 10 Jul 2005 23:56:42 +0200 gitweb.cgi | 11 --- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/gitweb.cgi b/gitweb.cgi --- a/gitweb.cgi +++ b/gitweb.cgi @@ -2058,8 +2058,9 @@ sub git_snapshot { th/th\n . /tr\n; my %types = ( - 'Bzipped tar archive' = 'tar.bz2', - 'Gzipped tar archive' = 'tar.gz', + 'Source tree (bzipped tar archive)' = 'tar.bz2', + 'Source tree (gzipped tar archive)' = 'tar.gz', + 'Git tree (pack file)' = 'pack', ); my $alternate = 0; for my $type (sort keys %types) { @@ -2094,6 +2095,7 @@ sub git_serve_snapshot { my %info = ( 'tar.bz2' = [ 'application/x-bzip2', 'bzip2' ], 'tar.gz' = [ 'application/x-gzip', 'gzip' ], + 'pack' = [ 'application/x-git-pack' ], ); if (!exists $info{$st}) { die_error(undef, Unknown snapshot type.); @@ -2101,7 +2103,10 @@ sub git_serve_snapshot { my ($type, $zip) = @{$info{$st}}; print $cgi-header(-type = $type, -attachment = $project-$hash.$st); - open my $fd, -|, $gitbin/git-tar-tree $hash '$project-$hash' | $zip + open my $fd, -|, ($st eq 'pack' ? + $gitbin/git-rev-list --max-count=1 --objects $hash | . + $gitbin/git-pack-objects --stdout : + $gitbin/git-tar-tree $hash '$project-$hash' | $zip) or return; undef $/; print $fd; - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rev-list: add --full-objects flag.
On Sat, 9 Jul 2005, Eric W. Biederman wrote: The current intelligent fetch currently has a problem that it cannot be used to bootstrap a repository. If you don't have an ancestor of what you are fetching you can't fetch it. Sure you can. See the current git clone. It's actually quite good, it's a pleasure to use now that it gives updates on how much it has done. Just do git clone src dest to try it out. It starts out silent (for big repositories) because it takes a while to get the whole rev list, but once it gets going it's quite nice and gives a nice progress report.. It uses the exact same server side code that git-fetch-pack does (ie it just starts git-upload-pack on the server). Now, one thing you cannot do is to start a totally new _project_ on the server side. In order to do a git-send-pack, you need to first create a directory and do a git-init-db on the remote side. So to create a new project, what you need to do is src$ ssh target target$ mkdir new-project target$ cd new-project target$ git-init-db target$ exit src$ git-send-pack target:new-project master and you've now sent your master branch to the new project at target:new-project. You can even populate multiple branches at a time: just list them all (you do have to list them, because by default git-send-pack will update the _common_ branches, and since the other end is empty, there obviously are no common branches to start with). Ahh, you should even be able to automate the sending of all branches by doing git-send-pack target:new-project $(cd .git ; find refs -type f) I think - that will end up being equivalent to a reverse clone. The smart clients are doing pretty damn well, I think. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rev-list: add --full-objects flag.
On Thu, 7 Jul 2005, Junio C Hamano wrote: Again you are right. How about --full-objects instead? I don't mind the --objects=xxx format per se, but it would need to verify that the =xxx was either valid or wasn't there at all. So what I objected to was not that it was easy to mis-spell, but that if misspelled, the program wouldn't point it out as an error, but silently just do the wrong thing. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rev-list: add --full-objects flag.
On Thu, 7 Jul 2005, Junio C Hamano wrote: However it does not automatically mean that the avenue I have been pursuing would not work; the server side preparation needs to be a bit more careful than what I sent, which unconditionally runs prune-packed. It instead should leave the files that --whole-trees would have packed as plain SHA1 files, so that the bulk is obtained by statically generated packs and the rest can be handled in the commit-chain walker as before. I really think the commit-chain walker needs to run locally (ie at the server end, or after fetching all the objects from the server). I don't know how much you've tried out the git-http-pull and git-ssh-pull things, but their performance was quite horrid for anything half-way bigger, because of the totally synchronized IO. The fetch one object, parse it, fetch the next one, parse that.. approach is just horrible. I ended up preferring the rsync thing even though rsync sucked badly on big object stores too, if only because when rsync got working, it at least nicely pipelined the transfers, and would transfer things ten times faster than git-ssh-pull did (maybe I'm exaggerating, but I don't think so, it really felt that way). And the thing is, if you purely follow one tree (which is likely the common case for a lot of users), then you are actually always likely better off with the mirror it model. Which is _not_ a good model for developers (for example, me rsync'ing from Jeff's kernel repository always got me hundreds of useless objects), but it's fine for somebody who actually just wants to track somebody else. And then you really can use just rsync or wget or ncftpget or anything else that has a fetch recursively, optimizing existing objects mode. Now, re-packing ends up causing some double transmissions, but I bet the cost of those are going to be less than the cost of the ping-pong for each object approach. Especially as most of the repacked objects will be deltas if the repacking is done properly. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html