Re: [3/5] Add http-pull

2005-04-23 Thread Daniel Barkalow
On Sat, 23 Apr 2005, Petr Baudis wrote:

 Dear diary, on Sat, Apr 23, 2005 at 01:00:33AM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...
  On Sat, 23 Apr 2005, Petr Baudis wrote:
  
   Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter
   where Daniel Barkalow [EMAIL PROTECTED] told me that...
   
   Huh. Why? You just go back to history until you find a commit you
   already have. If you did it the way as Tony described, if you have that
   commit, you can be sure that you have everything it depends on too.
  
  But if you download 1000 files of the 1010 you need, and then your network
  goes down, you will need to download those 1000 again when it comes back,
  because you can't save them unless you have the full history. 
 
 Why can't I? I think I can do that perfectly fine. The worst thing that
 can happen is that fsck-cache will complain a bit.

Not if you're using the fact that you don't have them to tell you that you
still need the other 10, which is what tony's scheme would do.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [3/5] Add http-pull

2005-04-22 Thread Luck, Tony

But if you download 1000 files of the 1010 you need, and then your network
goes down, you will need to download those 1000 again when it comes back,
because you can't save them unless you have the full history. 

So you could make the temporary object repository persistant between pulls
to avoid reloading them across the wire.  Something like:

get_commit(sha1)
{
if (sha1 in real_repo) - done
if (!(sha1 in tmp_repo))
load sha1 to tmp_repo
get_tree(sha1-tree)
for each parent
get_commit(sha1-parent)
move sha1 from tmp_repo to real_repo
}

get_tree(sha1)
{
if (sha1 in real_repo) - done
if (!(sha1 in tmp_repo))
load sha1 to tmp repo
for_each (sha1-entry) {
  case blob: if (!sha1 in real_repo) load to real_repo
  case tree: get_tree()
}
move sha1 from tmp_repo to real_repo
}

The load sha1 to xxx_repo needs to be smarter than my dumb wget
based script ... it must confirm the sha1 of the object being loaded
before installing (even into the tmp_repo).

-Tony

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-22 Thread Petr Baudis
Dear diary, on Sat, Apr 23, 2005 at 01:00:33AM CEST, I got a letter
where Daniel Barkalow [EMAIL PROTECTED] told me that...
 On Sat, 23 Apr 2005, Petr Baudis wrote:
 
  Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter
  where Daniel Barkalow [EMAIL PROTECTED] told me that...
  
  Huh. Why? You just go back to history until you find a commit you
  already have. If you did it the way as Tony described, if you have that
  commit, you can be sure that you have everything it depends on too.
 
 But if you download 1000 files of the 1010 you need, and then your network
 goes down, you will need to download those 1000 again when it comes back,
 because you can't save them unless you have the full history. 

Why can't I? I think I can do that perfectly fine. The worst thing that
can happen is that fsck-cache will complain a bit.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-22 Thread Petr Baudis
Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter
where Daniel Barkalow [EMAIL PROTECTED] told me that...
 On Thu, 21 Apr 2005 [EMAIL PROTECTED] wrote:
 
  On Wed, 20 Apr 2005, Brad Roberts wrote:
   How about fetching in the inverse order.  Ie, deepest parents up towards
   current.  With that method the repository is always self consistent, even
   if not yet current.
  
  Daniel Barkalow replied:
   You don't know the deepest parents to fetch until you've read everything
   more recent, since the history you'd have to walk is the history you're
   downloading.
  
  You just need to defer adding tree/commit objects to the repository until
  after you have inserted all objects on which they depend.  That's what my
  wget based version does ... it's very crude, in that it loads all tree
   commit objects into a temporary repository (.gittmp) ... since you can
  only use cat-file and ls-tree on things if they live in 
  objects/xx/xxx..xxx
  The blobs can go directly into the real repo (but to be really safe you'd
  have to ensure that the whole blob had been pulled from the network before
  inserting it ... it's probably a good move to validate everything that you
  pull from the outside world too).
 
 The problem with this general scheme is that it means that you have to
 start over if something goes wrong, rather than resuming from where you
 left off (and being able to use what you got until then).

Huh. Why? You just go back to history until you find a commit you
already have. If you did it the way as Tony described, if you have that
commit, you can be sure that you have everything it depends on too.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-21 Thread tony . luck
On Wed, 20 Apr 2005, Brad Roberts wrote:
 How about fetching in the inverse order.  Ie, deepest parents up towards
 current.  With that method the repository is always self consistent, even
 if not yet current.

Daniel Barkalow replied:
 You don't know the deepest parents to fetch until you've read everything
 more recent, since the history you'd have to walk is the history you're
 downloading.

You just need to defer adding tree/commit objects to the repository until
after you have inserted all objects on which they depend.  That's what my
wget based version does ... it's very crude, in that it loads all tree
 commit objects into a temporary repository (.gittmp) ... since you can
only use cat-file and ls-tree on things if they live in objects/xx/xxx..xxx
The blobs can go directly into the real repo (but to be really safe you'd
have to ensure that the whole blob had been pulled from the network before
inserting it ... it's probably a good move to validate everything that you
pull from the outside world too).

-Tony
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-20 Thread Brad Roberts
On Sun, 17 Apr 2005, Petr Baudis wrote:

 Date: Sun, 17 Apr 2005 21:59:00 +0200
 From: Petr Baudis [EMAIL PROTECTED]
 To: Daniel Barkalow [EMAIL PROTECTED]
 Cc: git@vger.kernel.org
 Subject: Re: [3/5] Add http-pull

 Dear diary, on Sun, Apr 17, 2005 at 09:24:27PM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...
  On Sun, 17 Apr 2005, Petr Baudis wrote:
 
   Dear diary, on Sun, Apr 17, 2005 at 08:49:11PM CEST, I got a letter
   where Daniel Barkalow [EMAIL PROTECTED] told me that...
There's some trickiness for the history of commits thing for stopping at
the point where you have everything, but also behaving appropriately if
you try once, fail partway through, and then try again. It's on my queue
of things to think about.
  
   Can't you just stop the recursion when you hit a commit you already
   have?
 
  The problem is that, if you've fetched the final commit already, and then
  the server dies, and you try again later, you already have the last one,
  and so you think you've got everything.

 Hmm, some kind of journaling? ;-)

How about fetching in the inverse order.  Ie, deepest parents up towards
current.  With that method the repository is always self consistent, even
if not yet current.

Later,
Brad

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[3/5] Add http-pull

2005-04-17 Thread Daniel Barkalow
http-pull is a program that downloads from a (normal) HTTP server a commit
and all of the tree and blob objects it refers to (but not other commits,
etc.). Options could be used to make it download a larger or different
selection of objects. It depends on libcurl, which I forgot to mention in
the README again.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]
Index: Makefile
===
--- d662b707e11391f6cfe597fd4d0bf9c41d34d01a/Makefile  (mode:100644 
sha1:b2ce7c5b63fffca59653b980d98379909f893d44)
+++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/Makefile  (mode:100644 
sha1:940ef8578cf469354002cd8feaec25d907015267)
@@ -14,7 +14,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-   check-files ls-tree merge-base
+   check-files ls-tree http-pull merge-base
 
 SCRIPT=parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
@@ -35,6 +35,7 @@
 
 LIBS= -lssl -lz
 
+http-pull: LIBS += -lcurl
 
 $(PROG):%: %.o $(COMMON)
$(CC) $(CFLAGS) -o $@ $^ $(LIBS)
Index: http-pull.c
===
--- /dev/null  (tree:d662b707e11391f6cfe597fd4d0bf9c41d34d01a)
+++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/http-pull.c  (mode:100644 
sha1:106ca31239e6afe6784e7c592234406f5c149e44)
@@ -0,0 +1,126 @@
+#include fcntl.h
+#include unistd.h
+#include string.h
+#include stdlib.h
+#include cache.h
+#include revision.h
+#include errno.h
+#include stdio.h
+
+#include curl/curl.h
+#include curl/easy.h
+
+static CURL *curl;
+
+static char *base;
+
+static int fetch(unsigned char *sha1)
+{
+   char *hex = sha1_to_hex(sha1);
+   char *filename = sha1_file_name(sha1);
+
+   char *url;
+   char *posn;
+   FILE *local;
+   struct stat st;
+
+   if (!stat(filename, st)) {
+   return 0;
+   }
+
+   local = fopen(filename, w);
+
+   if (!local) {
+   fprintf(stderr, Couldn't open %s\n, filename);
+   return -1;
+   }
+
+   curl_easy_setopt(curl, CURLOPT_FILE, local);
+
+   url = malloc(strlen(base) + 50);
+   strcpy(url, base);
+   posn = url + strlen(base);
+   strcpy(posn, objects/);
+   posn += 8;
+   memcpy(posn, hex, 2);
+   posn += 2;
+   *(posn++) = '/';
+   strcpy(posn, hex + 2);
+
+   curl_easy_setopt(curl, CURLOPT_URL, url);
+
+   curl_easy_perform(curl);
+
+   fclose(local);
+   
+   return 0;
+}
+
+static int process_tree(unsigned char *sha1)
+{
+   void *buffer;
+unsigned long size;
+char type[20];
+
+buffer = read_sha1_file(sha1, type, size);
+   if (!buffer)
+   return -1;
+   if (strcmp(type, tree))
+   return -1;
+   while (size) {
+   int len = strlen(buffer) + 1;
+   unsigned char *sha1 = buffer + len;
+   unsigned int mode;
+   int retval;
+
+   if (size  len + 20 || sscanf(buffer, %o, mode) != 1)
+   return -1;
+
+   buffer = sha1 + 20;
+   size -= len + 20;
+
+   retval = fetch(sha1);
+   if (retval)
+   return -1;
+
+   if (S_ISDIR(mode)) {
+   retval = process_tree(sha1);
+   if (retval)
+   return -1;
+   }
+   }
+   return 0;
+}
+
+static int process_commit(unsigned char *sha1)
+{
+   struct revision *rev = lookup_rev(sha1);
+   if (parse_commit_object(rev))
+   return -1;
+   
+   fetch(rev-tree);
+   process_tree(rev-tree);
+   return 0;
+}
+
+int main(int argc, char **argv)
+{
+   char *commit_id = argv[1];
+   char *url = argv[2];
+
+   unsigned char sha1[20];
+
+   get_sha1_hex(commit_id, sha1);
+
+   curl_global_init(CURL_GLOBAL_ALL);
+
+   curl = curl_easy_init();
+
+   base = url;
+
+   fetch(sha1);
+   process_commit(sha1);
+
+   curl_global_cleanup();
+   return 0;
+}

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-17 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Petr Baudis wrote:

  Index: Makefile
  ===
  --- d662b707e11391f6cfe597fd4d0bf9c41d34d01a/Makefile  (mode:100644 
  sha1:b2ce7c5b63fffca59653b980d98379909f893d44)
  +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/Makefile  (mode:100644 
  sha1:940ef8578cf469354002cd8feaec25d907015267)
  @@ -35,6 +35,7 @@
   
   LIBS= -lssl -lz
   
  +http-pull: LIBS += -lcurl
   
   $(PROG):%: %.o $(COMMON)
  $(CC) $(CFLAGS) -o $@ $^ $(LIBS)
 
 Whew. Looks like an awful trick, you say this works?! :-)
 
 At times, I wouldn't want to be a GNU make parser.

Yup. GNU make is big on the features which do the obvious thing, even when
you can't believe they work. This is probably why nobody's managed to
replace it.

  Index: http-pull.c
  ===
  --- /dev/null  (tree:d662b707e11391f6cfe597fd4d0bf9c41d34d01a)
  +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/http-pull.c  (mode:100644 
  sha1:106ca31239e6afe6784e7c592234406f5c149e44)
  +   url = malloc(strlen(base) + 50);
 
 Off-by-one. What about the trailing NUL?

I get length(base) + object/=8 + 40 SHA1 + 1 for '/' and 1 for NUL = 50.

 I think you should have at least two disjunct modes - either you are
 downloading everything related to the given commit, or you are
 downloading all commit records for commit predecessors.
 
 Even if you might not want all the intermediate trees, you definitively
 want the intermediate commits, to keep the history graph contignuous.
 
 So in git pull, I'd imagine to do
 
   http-pull -c $new_head
   http-pull -t $(tree-id $new_head)
 
 So, -c would fetch a given commit and all its predecessors until it hits
 what you already have on your side. -t would fetch a given tree with all
 files and subtrees and everything. http-pull shouldn't default on
 either, since they are mutually exclusive.
 
 What do you think?

I think I'd rather keep the current behavior and add a -c for getting the
history of commits, and maybe a -a for getting the history of commits and
their tress.

There's some trickiness for the history of commits thing for stopping at
the point where you have everything, but also behaving appropriately if
you try once, fail partway through, and then try again. It's on my queue
of things to think about.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-17 Thread Petr Baudis
Dear diary, on Sun, Apr 17, 2005 at 08:49:11PM CEST, I got a letter
where Daniel Barkalow [EMAIL PROTECTED] told me that...
 On Sun, 17 Apr 2005, Petr Baudis wrote:
   Index: http-pull.c
   ===
   --- /dev/null  (tree:d662b707e11391f6cfe597fd4d0bf9c41d34d01a)
   +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/http-pull.c  (mode:100644 
   sha1:106ca31239e6afe6784e7c592234406f5c149e44)
   + url = malloc(strlen(base) + 50);
  
  Off-by-one. What about the trailing NUL?
 
 I get length(base) + object/=8 + 40 SHA1 + 1 for '/' and 1 for NUL = 50.

Sorry, counted one '/' more. :-)

  I think you should have at least two disjunct modes - either you are
  downloading everything related to the given commit, or you are
  downloading all commit records for commit predecessors.
  
  Even if you might not want all the intermediate trees, you definitively
  want the intermediate commits, to keep the history graph contignuous.
  
  So in git pull, I'd imagine to do
  
  http-pull -c $new_head
  http-pull -t $(tree-id $new_head)
  
  So, -c would fetch a given commit and all its predecessors until it hits
  what you already have on your side. -t would fetch a given tree with all
  files and subtrees and everything. http-pull shouldn't default on
  either, since they are mutually exclusive.
  
  What do you think?
 
 I think I'd rather keep the current behavior and add a -c for getting the
 history of commits, and maybe a -a for getting the history of commits and
 their tress.

I'm not too kind at this. Either make it totally separate commands, or
make a required switch specifying what to do. Otherwise it implies the
switches would just modify what it does, but they make it do something
completely different.

-a would be fine too - basically a combination of -c and -t. I'd imagine
that is what Linus would want to use, e.g.

 There's some trickiness for the history of commits thing for stopping at
 the point where you have everything, but also behaving appropriately if
 you try once, fail partway through, and then try again. It's on my queue
 of things to think about.

Can't you just stop the recursion when you hit a commit you already
have?

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-17 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Petr Baudis wrote:

 Dear diary, on Sun, Apr 17, 2005 at 08:49:11PM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...
 
 I'm not too kind at this. Either make it totally separate commands, or
 make a required switch specifying what to do. Otherwise it implies the
 switches would just modify what it does, but they make it do something
 completely different.

That's a good point. I'll require a -t for now, and add more later.

 -a would be fine too - basically a combination of -c and -t. I'd imagine
 that is what Linus would want to use, e.g.

Well, -c -t would give you the current tree and the whole commit log, but
not old trees. -a would additionally give you old trees.

  There's some trickiness for the history of commits thing for stopping at
  the point where you have everything, but also behaving appropriately if
  you try once, fail partway through, and then try again. It's on my queue
  of things to think about.
 
 Can't you just stop the recursion when you hit a commit you already
 have?

The problem is that, if you've fetched the final commit already, and then
the server dies, and you try again later, you already have the last one,
and so you think you've got everything.

At this point, I also want to put off doing much further with recursion
and commits until revision.h and such are sorted out.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/5] Add http-pull

2005-04-17 Thread Petr Baudis
Dear diary, on Sun, Apr 17, 2005 at 09:24:27PM CEST, I got a letter
where Daniel Barkalow [EMAIL PROTECTED] told me that...
 On Sun, 17 Apr 2005, Petr Baudis wrote:
 
  Dear diary, on Sun, Apr 17, 2005 at 08:49:11PM CEST, I got a letter
  where Daniel Barkalow [EMAIL PROTECTED] told me that...
   There's some trickiness for the history of commits thing for stopping at
   the point where you have everything, but also behaving appropriately if
   you try once, fail partway through, and then try again. It's on my queue
   of things to think about.
  
  Can't you just stop the recursion when you hit a commit you already
  have?
 
 The problem is that, if you've fetched the final commit already, and then
 the server dies, and you try again later, you already have the last one,
 and so you think you've got everything.

Hmm, some kind of journaling? ;-)

 At this point, I also want to put off doing much further with recursion
 and commits until revision.h and such are sorted out.

Agreed.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html