Re: Borrowing objects from nearby repositories

2014-03-28 Thread Andrew Keller
On Mar 26, 2014, at 1:29 PM, Junio C Hamano  wrote:

> Andrew Keller  writes:
> 
>> On Mar 25, 2014, at 6:17 PM, Junio C Hamano  wrote:
>> ...
 I think that the standard practice with the existing toolset is to
 clone with reference and then repack.  That is:
 
   $ git clone --reference  git://over/there mine
   $ cd mine
   $ git repack -a -d
 
 And then you can try this:
 
   $ mv .git/objects/info/alternates .git/objects/info/alternates.disabled
   $ git fsck
 
 to make sure that you are no longer borrowing anything from the
 borrowee.  Once you are satisfied, you can remove the saved-away
 alternates.disabled file.
>>> 
>>> Oh, I forgot to say that I am not opposed if somebody wants to teach
>>> "git clone" a new option to copy its objects from two places,
>>> (hopefully) the majority from near-by reference repository and the
>>> remainder over the network, without permanently relying on the
>>> former via the alternates mechanism.  The implementation of such a
>>> feature could even literally be "clone with reference first and then
>>> repack" at least initially but even in the final version.
> 
> [Administrivia: please wrap your lines to a reasonable length]
> 
>> That was actually one of my first ideas - adding some sort of
>> '--auto-repack' option to git-clone.  It's a relatively small
>> change, and would work.  However, keeping in mind my end goal of
>> automating the feature to the point where you could run simply
>> 'git clone ', an '--auto-repack' option is more difficult to
>> undo.  You would need a new parameter to disable the automatic
>> adding of reference repositories, and a new parameter to undo
>> '--auto-repack', and you'd have to remember to actually undo both
>> of those settings.
>> 
>> In contrast, if the new feature was '--borrow', and the evolution
>> of the feature was a global configuration 'fetch.autoBorrow', then
>> to turn it off temporarily, one only needs a single new parameter
>> '--no-auto-borrow'.  I think this is a cleaner approach than the
>> former, although much more work.
> 
> I think you may have misread me.  With the "new option", I was
> hinting that the "clone --reference && repack && rm alternates"
> will be an acceptable internal implementation of the "--borrow"
> option that was mentioned in the thread.  I am not sure where you
> got the "auto-repack" from.

Ah, yes - that is better than what I was thinking.  I was thinking a bit
too low-level, and using two arguments in the place of your one.

> One of the reasons you may have misread me may be because I made it
> sound as if "this may work and when it works you will be happy, but
> if it does not work you did not lose very much" by mentioning "mv &&
> fsck".  That wasn't what I meant.
> 
> The "repack -a" procedure is to make the borrower repository no
> longer dependent on the borrowee, and it is supposed to always work.
> In fact, this behaviour was the whole reason why "repack" later
> learned its "-l" option to disable it, because people who cloned
> with "--reference" in order to reduce the disk footprint by sharing
> older and more common objects [*1*] were rightfully surprised to see
> that the borrowed objects were copied over to their borrower
> repository when they ran "repack" [*2*].
> 
> Because this is "clone", there is nothing complex to "undo".  Either
> it succeeds, or you remove the whole new directory if anything
> fails.
> 
> I said "even in the final version" for a simple reason: you cannot
> cannot do realistically any better than the "clone --reference &&
> repack -a d && rm alternates" sequence.

Wow, that's very insightful - thanks!  So, it sounds like I was right about
the general areas of concern when trying to do this during a fetch, but
I underestimated just how complicated it would be.

Okay, so to re-frame my idea, like you said, the goal is to find a user-
friendly way for the user to tell git-clone to set up the alternates file
(or perhaps just use the --alternates parameter), and run a repack,
and disconnect the alternate.  And yet, we still want to be able to use
--reference on its own, because there are existing use cases for that.

Thanks!
 - Andrew Keller

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Borrowing objects from nearby repositories

2014-03-28 Thread Junio C Hamano
Andrew Keller  writes:

> Okay, so to re-frame my idea, like you said, the goal is to find a user-
> friendly way for the user to tell git-clone to set up the alternates file
> (or perhaps just use the --alternates parameter), and run a repack,
> and disconnect the alternate.  And yet, we still want to be able to use
> --reference on its own, because there are existing use cases for that.

Here are a few possible action items that came out of this
discussion:

 1. Introduce a new "--borrow" option to "git clone".

The updates to the SYNOPSIS section may go like this:

-'git clone' [--reference ] ...other options...
+'git clone' [[--reference|--borrow] ] ...other options...

The new option can be used instead of "--reference" and they
will be mutually incompatible.  The first implementation of the
"--borrow" option would do the following:

  (1) run the same "git clone" with the same command line but
  replacing "--borrow" with "--reference"; if this fails, exit
  with the same failure.

  (2) in the resulting repository, run "git repack -a -d"; if this
  fails, remove the entire directory the first step created,
  and exit with failure.

  (3) remove .git/objects/info/alternates from the resulting
  repository and exit with success.

and it may be acceptable as the final implementation as well.


 2. Make "git repack" safer for the users of "clone --reference" who
want to keep sharing objects from the original.

- Introduce the "repack.local" configuration variable that can
  be set to either true or false.  Missing variable defaults to
  "false".  

- A "repack" that is run without "-l" option on the command line
  will pretend as if it was given "-l" from the command line if
  "repack.local" is set to "true".  Add "repack --no-local"
  option to countermand this configuration variable from the
  command line.

- Teach "git clone --reference" (but not "git clone --borrow")
  to set "repack.local = true" in the configuration of the
  resulting repository.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Borrowing objects from nearby repositories

2014-03-26 Thread Junio C Hamano
Andrew Keller  writes:

> On Mar 25, 2014, at 6:17 PM, Junio C Hamano  wrote:
> ...
>>> I think that the standard practice with the existing toolset is to
>>> clone with reference and then repack.  That is:
>>> 
>>>$ git clone --reference  git://over/there mine
>>>$ cd mine
>>>$ git repack -a -d
>>> 
>>> And then you can try this:
>>> 
>>>$ mv .git/objects/info/alternates .git/objects/info/alternates.disabled
>>>$ git fsck
>>> 
>>> to make sure that you are no longer borrowing anything from the
>>> borrowee.  Once you are satisfied, you can remove the saved-away
>>> alternates.disabled file.
>> 
>> Oh, I forgot to say that I am not opposed if somebody wants to teach
>> "git clone" a new option to copy its objects from two places,
>> (hopefully) the majority from near-by reference repository and the
>> remainder over the network, without permanently relying on the
>> former via the alternates mechanism.  The implementation of such a
>> feature could even literally be "clone with reference first and then
>> repack" at least initially but even in the final version.

[Administrivia: please wrap your lines to a reasonable length]

> That was actually one of my first ideas - adding some sort of
> '--auto-repack' option to git-clone.  It's a relatively small
> change, and would work.  However, keeping in mind my end goal of
> automating the feature to the point where you could run simply
> 'git clone ', an '--auto-repack' option is more difficult to
> undo.  You would need a new parameter to disable the automatic
> adding of reference repositories, and a new parameter to undo
> '--auto-repack', and you'd have to remember to actually undo both
> of those settings.
>
> In contrast, if the new feature was '--borrow', and the evolution
> of the feature was a global configuration 'fetch.autoBorrow', then
> to turn it off temporarily, one only needs a single new parameter
> '--no-auto-borrow'.  I think this is a cleaner approach than the
> former, although much more work.

I think you may have misread me.  With the "new option", I was
hinting that the "clone --reference && repack && rm alternates"
will be an acceptable internal implementation of the "--borrow"
option that was mentioned in the thread.  I am not sure where you
got the "auto-repack" from.

One of the reasons you may have misread me may be because I made it
sound as if "this may work and when it works you will be happy, but
if it does not work you did not lose very much" by mentioning "mv &&
fsck".  That wasn't what I meant.

The "repack -a" procedure is to make the borrower repository no
longer dependent on the borrowee, and it is supposed to always work.
In fact, this behaviour was the whole reason why "repack" later
learned its "-l" option to disable it, because people who cloned
with "--reference" in order to reduce the disk footprint by sharing
older and more common objects [*1*] were rightfully surprised to see
that the borrowed objects were copied over to their borrower
repository when they ran "repack" [*2*].

Because this is "clone", there is nothing complex to "undo".  Either
it succeeds, or you remove the whole new directory if anything
fails.

I said "even in the final version" for a simple reason: you cannot
cannot do realistically any better than the "clone --reference &&
repack -a d && rm alternates" sequence.

But you would need to know a few things about how Git works in order
to come to that realisation.  Here are some:

 * "clone --borrow" (or whatever we end up calling the option) must
   talk to two repositories:

- We will need to have one upload-pack session with the distant
  origin repository over the network, which will send a complete
  pack.

- We need to also copy objects that weren't sent from the
  distant origin to our repository from the reference one.

 * A single "repack -a -d" (without "-l") after "clone --reference"
   is already a way to do exactly what you need---enumerate what are
   missing in the packfile that was received from the distant origin
   and come up with packfile(s) that contain all and only objects
   the cloned repository needs.

 * You cannot easily concatenate multiple packfiles into a single
   one (or append runs of objects to an existing packfile) to come
   up with a single packfile.

You _could_ shoehorn the logic to "enumerate and read from the
reference, and append them at the end of the packfile received from
the distant origin repository" into the part that talks to the
distant origin repository, but the object layout in the resulting
packfile will be suboptimal [*3*] and the code complexity required
to do so is not worth it [*4*].


[Footnotes]

*1* From the point of view of supporting both camps, i.e. those who
want their borrower repositories to keep sharing the objects
with the borrowee repository and those who want to use a
borrowee repository temporarily while cloning only to reduce the
network cost from the distant upstream, the

Re: Borrowing objects from nearby repositories

2014-03-26 Thread Andrew Keller
On Mar 25, 2014, at 6:17 PM, Junio C Hamano  wrote:

> Junio C Hamano  writes:
> 
>> Ævar Arnfjörð Bjarmason  writes:
>> 
 1) Introduce '--borrow' to `git-fetch`.  This would behave similarly
>>> to '--reference', except that it operates on a temporary basis, and
>>> does not assume that the reference repository will exist after the
>>> operation completes, so any used objects are copied into the local
>>> objects database.  In theory, this mechanism would be distinct from
>>> --reference', so if both are used, some objects would be copied, and
>>> some objects would be accessible via a reference repository referenced
>>> by the alternates file.
>>> 
>>> Isn't this the same as git clone --reference  --no-hardlinks
>>>  ?
>>> 
>>> Also without --no-hardlinks we're not assuming that the other repo
>>> doesn't go away (you could rm-rf it), just that the files won't be
>>> *modified*, which Git won't do, but you could manually do with other
>>> tools, so the default is to hardlink.
>> 
>> I think that the standard practice with the existing toolset is to
>> clone with reference and then repack.  That is:
>> 
>>$ git clone --reference  git://over/there mine
>>$ cd mine
>>$ git repack -a -d
>> 
>> And then you can try this:
>> 
>>$ mv .git/objects/info/alternates .git/objects/info/alternates.disabled
>>$ git fsck
>> 
>> to make sure that you are no longer borrowing anything from the
>> borrowee.  Once you are satisfied, you can remove the saved-away
>> alternates.disabled file.
> 
> Oh, I forgot to say that I am not opposed if somebody wants to teach
> "git clone" a new option to copy its objects from two places,
> (hopefully) the majority from near-by reference repository and the
> remainder over the network, without permanently relying on the
> former via the alternates mechanism.  The implementation of such a
> feature could even literally be "clone with reference first and then
> repack" at least initially but even in the final version.

That was actually one of my first ideas - adding some sort of '--auto-repack' 
option to git-clone.  It's a relatively small change, and would work.  However, 
keeping in mind my end goal of automating the feature to the point where you 
could run simply 'git clone ', an '--auto-repack' option is more difficult 
to undo.  You would need a new parameter to disable the automatic adding of 
reference repositories, and a new parameter to undo '--auto-repack', and you'd 
have to remember to actually undo both of those settings.

In contrast, if the new feature was '--borrow', and the evolution of the 
feature was a global configuration 'fetch.autoBorrow', then to turn it off 
temporarily, one only needs a single new parameter '--no-auto-borrow'.  I think 
this is a cleaner approach than the former, although much more work.

Thanks,
 - Andrew Keller

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Borrowing objects from nearby repositories

2014-03-25 Thread Junio C Hamano
Junio C Hamano  writes:

> Ævar Arnfjörð Bjarmason  writes:
>
>>> 1) Introduce '--borrow' to `git-fetch`.  This would behave similarly
>> to '--reference', except that it operates on a temporary basis, and
>> does not assume that the reference repository will exist after the
>> operation completes, so any used objects are copied into the local
>> objects database.  In theory, this mechanism would be distinct from
>> --reference', so if both are used, some objects would be copied, and
>> some objects would be accessible via a reference repository referenced
>> by the alternates file.
>>
>> Isn't this the same as git clone --reference  --no-hardlinks
>>  ?
>>
>> Also without --no-hardlinks we're not assuming that the other repo
>> doesn't go away (you could rm-rf it), just that the files won't be
>> *modified*, which Git won't do, but you could manually do with other
>> tools, so the default is to hardlink.
>
> I think that the standard practice with the existing toolset is to
> clone with reference and then repack.  That is:
>
> $ git clone --reference  git://over/there mine
> $ cd mine
> $ git repack -a -d
>
> And then you can try this:
>
> $ mv .git/objects/info/alternates .git/objects/info/alternates.disabled
> $ git fsck
>
> to make sure that you are no longer borrowing anything from the
> borrowee.  Once you are satisfied, you can remove the saved-away
> alternates.disabled file.

Oh, I forgot to say that I am not opposed if somebody wants to teach
"git clone" a new option to copy its objects from two places,
(hopefully) the majority from near-by reference repository and the
remainder over the network, without permanently relying on the
former via the alternates mechanism.  The implementation of such a
feature could even literally be "clone with reference first and then
repack" at least initially but even in the final version.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Borrowing objects from nearby repositories

2014-03-25 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>> 1) Introduce '--borrow' to `git-fetch`.  This would behave similarly
> to '--reference', except that it operates on a temporary basis, and
> does not assume that the reference repository will exist after the
> operation completes, so any used objects are copied into the local
> objects database.  In theory, this mechanism would be distinct from
> --reference', so if both are used, some objects would be copied, and
> some objects would be accessible via a reference repository referenced
> by the alternates file.
>
> Isn't this the same as git clone --reference  --no-hardlinks
>  ?
>
> Also without --no-hardlinks we're not assuming that the other repo
> doesn't go away (you could rm-rf it), just that the files won't be
> *modified*, which Git won't do, but you could manually do with other
> tools, so the default is to hardlink.

I think that the standard practice with the existing toolset is to
clone with reference and then repack.  That is:

$ git clone --reference  git://over/there mine
$ cd mine
$ git repack -a -d

And then you can try this:

$ mv .git/objects/info/alternates .git/objects/info/alternates.disabled
$ git fsck

to make sure that you are no longer borrowing anything from the
borrowee.  Once you are satisfied, you can remove the saved-away
alternates.disabled file.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Borrowing objects from nearby repositories

2014-03-25 Thread Andrew Keller
On Mar 24, 2014, at 5:21 PM, Ævar Arnfjörð Bjarmason  wrote:
> On Wed, Mar 12, 2014 at 4:37 AM, Andrew Keller  wrote:
>> Hi all,
>> 
>> I am considering developing a new feature, and I'd like to poll the group 
>> for opinions.
>> 
>> Background: A couple years ago, I wrote a set of scripts that speed up 
>> cloning of frequently used repositories.  The scripts utilize a bare Git 
>> repository located at a known location, and automate providing a --reference 
>> parameter to `git clone` and `git submodule update`.  Recently, some 
>> coworkers of mine expressed an interest in using the scripts, so I published 
>> the current version of my scripts, called `git repocache`, described at the 
>> bottom of .
>> 
>> Slowly, it has occurred to me that this feature, or something similar to it, 
>> may be worth adding to Git, so I've been thinking about the best approach.  
>> Here's my best idea so far:
>> 
>> 1)  Introduce '--borrow' to `git-fetch`.  This would behave similarly to 
>> '--reference', except that it operates on a temporary basis, and does not 
>> assume that the reference repository will exist after the operation 
>> completes, so any used objects are copied into the local objects database.  
>> In theory, this mechanism would be distinct from '--reference', so if both 
>> are used, some objects would be copied, and some objects would be accessible 
>> via a reference repository referenced by the alternates file.
> 
> Isn't this the same as git clone --reference  --no-hardlinks  ?

'--reference` adds an entry to 'info/alternates' inside the objects folder.  
When an object is looked up, any objects folder listed in 
'objects/info/alternates' is considered to be an extension of the local objects 
folder.  So, when, for example, fetch runs, when it goes to decide whether or 
not it already has a blob locally, it may decide "yes", and not download the 
blob at all, because it already exists in one of the reference repositories.  
If I clone one of my 80 GB repositories over SSH using a reference repository, 
the resulting clone is only about 175 KB, because it's assuming the reference 
repository will exist going forward, so it doesn't actually own any objects 
itself at all.

The '--no-hardlinks' option is only applicable when hard linking is available 
in the first place - i.e., when cloning from one local folder to another on the 
same filesystem (assuming the filesystem supports hard links).

Thanks,
 - Andrew

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Borrowing objects from nearby repositories

2014-03-24 Thread Ævar Arnfjörð Bjarmason
On Wed, Mar 12, 2014 at 4:37 AM, Andrew Keller  wrote:
> Hi all,
>
> I am considering developing a new feature, and I'd like to poll the group for 
> opinions.
>
> Background: A couple years ago, I wrote a set of scripts that speed up 
> cloning of frequently used repositories.  The scripts utilize a bare Git 
> repository located at a known location, and automate providing a --reference 
> parameter to `git clone` and `git submodule update`.  Recently, some 
> coworkers of mine expressed an interest in using the scripts, so I published 
> the current version of my scripts, called `git repocache`, described at the 
> bottom of .
>
> Slowly, it has occurred to me that this feature, or something similar to it, 
> may be worth adding to Git, so I've been thinking about the best approach.  
> Here's my best idea so far:
>
> 1)  Introduce '--borrow' to `git-fetch`.  This would behave similarly to 
> '--reference', except that it operates on a temporary basis, and does not 
> assume that the reference repository will exist after the operation 
> completes, so any used objects are copied into the local objects database.  
> In theory, this mechanism would be distinct from '--reference', so if both 
> are used, some objects would be copied, and some objects would be accessible 
> via a reference repository referenced by the alternates file.

Isn't this the same as git clone --reference  --no-hardlinks  ?

Also without --no-hardlinks we're not assuming that the other repo
doesn't go away (you could rm-rf it), just that the files won't be
*modified*, which Git won't do, but you could manually do with other
tools, so the default is to hardlink.

> 2)  Teach `git fetch` to read 'repocache.path' (or a better-named 
> configuration), and use it to automatically activate borrowing.

So a default path for --reference  --no-hardlinks ?

> 3)  For consistency, `git clone`, `git pull`, and `git submodule update` 
> should probably all learn '--borrow', and forward it to `git fetch`.
>
> 4)  In some scenarios, it may be necessary to temporarily not automatically 
> borrow, so `git fetch`, and everything that calls it may need an argument to 
> do that.
>
> Intended outcome: With 'repocache.path' set, and the cached repository 
> properly updated, one could run `git clone `, and the operation would 
> complete much faster than it does now due to less load on the network.
>
> Things I haven't figured out yet:
>
> *  What's the best approach to copying the needed objects?  It's probably 
> inefficient to copy individual objects out of pack files one at a time, but 
> it could be wasteful to copy entire pack files just because you need one 
> object.  Hard-linking could help, but that won't always be available.  One of 
> my previous ideas was to add a '--auto-repack' option to `git-clone`, which 
> solves this problem better, but introduces some other front-end usability 
> problems.
> *  To maintain optimal effectiveness, users would have to regularly run a 
> fetch in the cache repository.  Not all users know how to set up a scheduled 
> task on their computer, so this might become a maintenance problem for the 
> user.  This kind of problem I think brings into question the viability of the 
> underlying design here, assuming that the ultimate goal is to clone faster, 
> with very little or no change in the use of git.
>
>
> Thoughts?
>
> Thanks,
> Andrew Keller
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Borrowing objects from nearby repositories

2014-03-23 Thread Phil Hord
On Tue, Mar 11, 2014 at 11:37 PM, Andrew Keller  wrote:
> I am considering developing a new feature, and I'd like to poll the group for 
> opinions.
>
> Background: A couple years ago, I wrote a set of scripts that speed up 
> cloning of frequently used repositories.  The scripts utilize a bare Git 
> repository located at a known location, and automate providing a --reference 
> parameter to `git clone` and `git submodule update`.  Recently, some 
> coworkers of mine expressed an interest in using the scripts, so I published 
> the current version of my scripts, called `git repocache`, described at the 
> bottom of .
>
> Slowly, it has occurred to me that this feature, or something similar to it, 
> may be worth adding to Git, so I've been thinking about the best approach.  
> Here's my best idea so far:
>
> 1)  Introduce '--borrow' to `git-fetch`.  This would behave similarly to 
> '--reference', except that it operates on a temporary basis, and does not 
> assume that the reference repository will exist after the operation 
> completes, so any used objects are copied into the local objects database.  
> In theory, this mechanism would be distinct from '--reference', so if both 
> are used, some objects would be copied, and some objects would be accessible 
> via a reference repository referenced by the alternates file.

Interesting.  I do something similar on my CI Server to reduce
workload on Gerrit. Having a built-in to support submodules would be
nice.  Currently my script does this:

MIRROR=/path/to/local/mirror
NEW=ssh://gerrit-server
git clone ${MIRROR}/project && cd project

#-- Init/update submodules from our local mirror if possible
git submodule update --recursive --init

#-- Switch to the remote server URL
git config remote.origin.url $(git config remote.origin.url|sed -e
"s|^${MIRROR}|${NEW}|")
git submodule sync #--recursive ; recursive not supported :-[

#-- Checkout remote updates
git pull --ff-only --recurse-submodules origin ${BRANCH}
git submodule update --recursive --init


Is that about the same as you are aiming for?


> 2)  Teach `git fetch` to read 'repocache.path' (or a better-named 
> configuration), and use it to automatically activate borrowing.

Seems like this could be trouble if a local repo is coincidentally
named the same as some unrelated repo you want to clone.  But I can
see the value.

What about something similar to url.insteadOf?   Maybe
'url.${SERVER}.autoBorrow = ${MIRROR}', with replacement semantics
similar to insteadOf.

> 3)  For consistency, `git clone`, `git pull`, and `git submodule update` 
> should probably all learn '--borrow', and forward it to `git fetch`.
>
> 4)  In some scenarios, it may be necessary to temporarily not automatically 
> borrow, so `git fetch`, and everything that calls it may need an argument to 
> do that.

--no-borrow

Phil
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Borrowing objects from nearby repositories

2014-03-11 Thread Andrew Keller
Hi all,

I am considering developing a new feature, and I'd like to poll the group for 
opinions.

Background: A couple years ago, I wrote a set of scripts that speed up cloning 
of frequently used repositories.  The scripts utilize a bare Git repository 
located at a known location, and automate providing a --reference parameter to 
`git clone` and `git submodule update`.  Recently, some coworkers of mine 
expressed an interest in using the scripts, so I published the current version 
of my scripts, called `git repocache`, described at the bottom of 
.

Slowly, it has occurred to me that this feature, or something similar to it, 
may be worth adding to Git, so I've been thinking about the best approach.  
Here's my best idea so far:

1)  Introduce '--borrow' to `git-fetch`.  This would behave similarly to 
'--reference', except that it operates on a temporary basis, and does not 
assume that the reference repository will exist after the operation completes, 
so any used objects are copied into the local objects database.  In theory, 
this mechanism would be distinct from '--reference', so if both are used, some 
objects would be copied, and some objects would be accessible via a reference 
repository referenced by the alternates file.

2)  Teach `git fetch` to read 'repocache.path' (or a better-named 
configuration), and use it to automatically activate borrowing.

3)  For consistency, `git clone`, `git pull`, and `git submodule update` should 
probably all learn '--borrow', and forward it to `git fetch`.

4)  In some scenarios, it may be necessary to temporarily not automatically 
borrow, so `git fetch`, and everything that calls it may need an argument to do 
that.

Intended outcome: With 'repocache.path' set, and the cached repository properly 
updated, one could run `git clone `, and the operation would complete much 
faster than it does now due to less load on the network.

Things I haven't figured out yet:

*  What's the best approach to copying the needed objects?  It's probably 
inefficient to copy individual objects out of pack files one at a time, but it 
could be wasteful to copy entire pack files just because you need one object.  
Hard-linking could help, but that won't always be available.  One of my 
previous ideas was to add a '--auto-repack' option to `git-clone`, which solves 
this problem better, but introduces some other front-end usability problems.
*  To maintain optimal effectiveness, users would have to regularly run a fetch 
in the cache repository.  Not all users know how to set up a scheduled task on 
their computer, so this might become a maintenance problem for the user.  This 
kind of problem I think brings into question the viability of the underlying 
design here, assuming that the ultimate goal is to clone faster, with very 
little or no change in the use of git.


Thoughts?

Thanks,
Andrew Keller

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html