Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Martin Mares
Hello!

 This adds a program to download a commit, the trees, and the blobs in them
 from a remote repository using HTTP. It skips anything you already have.

Is it really necessary to write your own HTTP downloader? If so, is it
necessary to forget basic stuff like the Host: header? ;-)

If you feel that it should be optimized for speed, then at least use
persistent connections.

 + if (memcmp(target, http://;, 7))
 + return -1;

Can crash if the string is too short.

 + entry = gethostbyname(name);
 + memcpy(sockad.sin_addr.s_addr,
 +((struct in_addr *)entry-h_addr)-s_addr, 4);

Can crash if the host doesn't exist or if you feed it with an URL containing
port number.

 +static int get_connection()

(void)

 + local = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);

What if it fails?

Have a nice fortnight
-- 
Martin `MJ' Mares   [EMAIL PROTECTED]   http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
A student who changes the course of history is probably taking an exam.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Daniel Barkalow
On Sat, 16 Apr 2005, Tony Luck wrote:

 On 4/16/05, Daniel Barkalow [EMAIL PROTECTED] wrote:
  +buffer = read_sha1_file(sha1, type, size);
 
 You never free this buffer.

Ideally, this should all be rearranged to share the code with
read-tree, and it should be fixed in common.

 It would also be nice if you saved tree objects in some temporary file
 and did not install them until after you had fetched all the blobs and
 trees that this tree references.  Then if your connection is interrupted
 you can just restart it.

It looks over everything relevant, even if it doesn't need to download
anything, so it should work to continue if it stops in between.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Adam Kropelin
Tony Luck wrote:
Otherwise this looks really nice.  I was going to script something
similar using wget ... but that would have made zillions of seperate
connections.  Not so kind to the server.
How about building a file list and doing a batch download via 'wget -i 
/tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses 
connectionss when successive URLs point to the same server.

Writing yet another http client does seem a bit pointless, what with wget 
and curl available. The real win lies in creating the smarts to get the 
minimum number of files.

--Adam
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Martin Mares wrote:

 Hello!
 
  This adds a program to download a commit, the trees, and the blobs in them
  from a remote repository using HTTP. It skips anything you already have.
 
 Is it really necessary to write your own HTTP downloader? If so, is it
 necessary to forget basic stuff like the Host: header? ;-)

I wanted to get something hacked quickly; can you suggest a good one to
use?

 If you feel that it should be optimized for speed, then at least use
 persistent connections.

That's the next step.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Daniel Barkalow
On Sat, 16 Apr 2005, Adam Kropelin wrote:

 Tony Luck wrote:
  Otherwise this looks really nice.  I was going to script something
  similar using wget ... but that would have made zillions of seperate
  connections.  Not so kind to the server.
 
 How about building a file list and doing a batch download via 'wget -i 
 /tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses 
 connectionss when successive URLs point to the same server.

You need to look at some of the files before you know what other files to
get. You could do it in waves, but that would be excessively complicated
to code and not the most efficient anyway.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Adam Kropelin
Daniel Barkalow wrote:
On Sat, 16 Apr 2005, Adam Kropelin wrote:
How about building a file list and doing a batch download via 'wget
-i /tmp/foo'? A quick test (on my ancient wget-1.7) indicates that
it reuses connectionss when successive URLs point to the same server.
You need to look at some of the files before you know what other
files to get. You could do it in waves, but that would be excessively
complicated to code and not the most efficient anyway.
Ah, yes. Makes sense. How about libcurl or another http client library, 
then? Minimizing dependencies on external libraries is good, but writing a 
really robust http client is a tricky business. (Not that you aren't up to 
it; I just wonder if it's the best way to spend your time.)

--Adam
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread tony . luck
How about building a file list and doing a batch download via 'wget -i 
/tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses 
connectionss when successive URLs point to the same server.

Here's a script that does just that.  So there is a burst of individual
wget commands to get HEAD, the top commit object, and all the tree
objects.  The just one to get all the missing blobs.

Subsequent runs will do far less work as many of the tree objects will
not have changed, so we don't descend into any tree that we already have.

-Tony

Not a patch ... it is a whole file.  I called it git-wget, but it might
also want to be called git-pulltop.

Signed-off-by: Tony Luck [EMAIL PROTECTED]

-- script starts here -
#!/bin/sh

# Copyright (C) 2005 Tony Luck

REMOTE=http://www.kernel.org/pub/linux/kernel/people/torvalds/linux-2.6.git/

rm -rf .gittmp
# set up a temp git repository so that we can use cat-file and ls-tree on the
# objects we pull without installing them into our tree. This allows us to
# restart if the download is interrupted
mkdir .gittmp
cd .gittmp
init-db

wget -q $REMOTE/HEAD

if cmp -s ../.git/HEAD HEAD
then
echo Already have HEAD = `cat ../.git/HEAD`
cd ..
rm -rf .gittmp
exit 0
fi

sha1=`cat HEAD`
sha1file=${sha1:0:2}/${sha1:2}

if [ -f ../.git/objects/$sha1file ]
then
echo Already have most recent commit. Update HEAD to $sha1
cd ..
rm -rf .gittmp
exit 0
fi

wget -q $REMOTE/objects/$sha1file -O .git/objects/$sha1file

treesha1=`cat-file commit $sha1 | (read tag tree ; echo $tree)`

get_tree()
{
treesha1file=${1:0:2}/${1:2}
if [ -f ../.git/objects/$treesha1file ]
then
return
fi
wget -q $REMOTE/objects/$treesha1file -O .git/objects/$treesha1file
ls-tree $1 | while read mode tag sha1 name
do
subsha1file=${sha1:0:2}/${sha1:2}
if [  -f ../.git/objects/$subsha1file ]
then
continue
fi
if [ $mode = 4 ]
then
get_tree $sha1 `expr $2 + 1`
else
echo objects/$subsha1file  needbloblist
fi
done
}

# get all the tree objects to our .gittmp area, and create list of needed blobs
get_tree $treesha1

# now get the blobs
cd ../.git
if [ -s ../.gittmp/needbloblist ]
then
wget -q -r -nH  --cut-dirs=6 --base=$REMOTE -i ../.gittmp/needbloblist
fi

# Now we have the blobs, move the trees and commit from .gitttmp
cd ../.gittmp/.git/objects
find ?? -type f -print | while read f
do
mv $f ../../../.git/objects/$f
done

# update HEAD
cd ../..
mv HEAD ../.git

cd ..
rm -rf .gittmp
-- script ends here -
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html