Re: Need some help on patching buildin-files // was: Looking for feedback and help with a git-mirror for local usage
On Fri, Jun 12, 2015 at 12:52:44PM +0200, Bernd Naumann wrote: Hello again, After digging the code I may have got a clue where to start but I would still appreciate some help from a developer, cause I have never learned to write C. (Some basics at school which happened over a decade ago.) Currently I have questions on: * How to patch clone: would cmd_clone() a good place? Or are there other calls which might be better. I think about to insert the check if a mirror will be setup or just updated, right after dest_exists. If you'd still like to modify git clone itself, then the cmd_clone entry point is certainly the place to start. I would suggest exploring other alternatives, though. Is it possible to use a caching HTTP proxy, so that git clone goes through a local caching proxy? I haven't tried this myself, so maybe it's not even possible, but that seems like a natural http-ish solution. Another idea is to use Git's URL rewriting feature. If your clone URLs all follow a similar pattern then they can automatically be rewritten to point to some other URL. e.g. in ~/.gitconfig: [url file:///home/git/mirror/github.com/] insteadOf = https://github.com/; This will make git clone from /home/git/mirror/github.com/ whenever it sees https://github.com/ URLs. This is not perfect because it ends up cloning from your local copies rather than setting up the references via --mirror, but at least it avoids hitting the network. You'll need to periodically update your local mirrors, though. If you prefer to keep ~/.gitconfig pristine then you could do it in a wrapper script by injecting e.g. the -c config flags, git \ -c url.file://foo/bar/.insteadOf=https://github.com/ \ clone ... [...snip...] I often build in example 'openwrt' with various build-scripts which depends heavily on a fresh or clean environment and they omit many sources via `git clone`, which results sometimes in over 100 MB of traffic just for one build. /* Later needed .tar.gz source archives are stored in a symlinked download directory which is supported by 'openwrt/.config' since a few months... to reduce network traffic. */ Why does a rebuild delete existing Git repositories? That seems like a bad practice, and shouldn't be needed. If possible, it would be worth improving the build scripts. For example, a clone can be made pristine by doing git reset --hard git clean -fdx. Deleting a repository just so that it can be re-cloned is very wasteful. My connection to the internet is not the fastest in world and sometimes unstable, so I wanted to have some kind of local bare repository mirror, which is possible with `git clone --mirror`. From these repositories I can later clone from, by calling `git clone --reference /path/to.git url`, but I do not wish to edit all the build-scripts and Makefiles. Maybe it'd be possible to make just the git clone part of the build scripts configurable? That'd make it really easy to inject a wrapper script that scans the arguments and injects the needed --mirror arguments, in the case that the above options won't work. So I wrote a git wrapper script (`$HOME/bin/git`), which checks if `git` was called with 'clone', and if so, then it will first clones the repository as a mirror and then clones from that local mirror. If the mirror already exists, then it will only be updated (`git remote update`). This works for now. [...snip...] Ok, so far, so good, but the implementation of the current shell-prototype looks way too hacky [0] and I have found some edge cases on which my script will fail: The script depends on the fact that the last, or at least the second last argument is a valid git-url, but the following is a valid call, too : `git --no-pager \ clone g...@github.com:openwrt/packages.git openwrt-packages --depth 1` But this is not valid: `git clone https://github.com/openwrt/packages.git --reference packages.git packages-2` or `git clone --verbose https://github.com/openwrt/packages.git packages-2 --reference packages.git` I found out that git-clone actually also can only make a guess what is the url and what not. Another option is to rewrite the wrapper script in a better language. For example, Python's argparse module can handle the above cases with minimal fuss. Anyways, as I said before, the root problem is really the build scripts. I bet modifying the build scripts to reuse existing git repositories is easier than modifying git clone. cheers, -- David -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help on patching buildin-files // was: Looking for feedback and help with a git-mirror for local usage
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello again, After digging the code I may have got a clue where to start but I would still appreciate some help from a developer, cause I have never learned to write C. (Some basics at school which happened over a decade ago.) Currently I have questions on: * How to patch clone: would cmd_clone() a good place? Or are there other calls which might be better. I think about to insert the check if a mirror will be setup or just updated, right after dest_exists. * Is it correct that a new config key just get specified via a config file or by cmd_init_db()? So later, a check on that value is enough? Would be the section 'user' a good place for this key or is it something that would get a own/new section? * Have I missed a relevant file? git/git.c git/builtin/clone.c git/builtin/fetch.c git/builtin/push.c git/buildin/remote.c along with the translation and Documentation, of course. If you have some comments on that, please share these with me, and if you are interested in helping me to got this implemented, I would appreciate that :) Sincere regards, Bernd On 06/11/2015 10:44 PM, Bernd Naumann wrote: Hello, I have came up with an idea # Yep I know, exactly that kind of e-mail everyone wants to read ;) and I'm working currently on a shell-prototype to face the following situation and problem and need some feedback/advise: I often build in example 'openwrt' with various build-scripts which depends heavily on a fresh or clean environment and they omit many sources via `git clone`, which results sometimes in over 100 MB of traffic just for one build. /* Later needed .tar.gz source archives are stored in a symlinked download directory which is supported by 'openwrt/.config' since a few months... to reduce network traffic. */ My connection to the internet is not the fastest in world and sometimes unstable, so I wanted to have some kind of local bare repository mirror, which is possible with `git clone --mirror`. From these repositories I can later clone from, by calling `git clone --reference /path/to.git url`, but I do not wish to edit all the build-scripts and Makefiles. So I wrote a git wrapper script (`$HOME/bin/git`), which checks if `git` was called with 'clone', and if so, then it will first clones the repository as a mirror and then clones from that local mirror. If the mirror already exists, then it will only be updated (`git remote update`). This works for now. /* To be able to have multiple identical named repositories, the script builds paths like: ~/var/cache/gitmirror $ find . -name *.git ./github.com/openwrt-management/packages.git ./github.com/openwrt/packages.git ./github.com/openwrt-routing/packages.git ./nbd.name/packages.git ./git.openwrt.org/packages.git ./git.openwrt.org/openwrt.git It strips the schema from the url and replaces : with / in case a port is specified or a svn link is provided. The remaining should be a valid linux file and directory structure, if I guess correctly!? */ Ok, so far, so good, but the implementation of the current shell-prototype looks way too hacky [0] and I have found some edge cases on which my script will fail: The script depends on the fact that the last, or at least the second last argument is a valid git-url, but the following is a valid call, too : `git --no-pager \ clone g...@github.com:openwrt/packages.git openwrt-packages --depth 1` But this is not valid: `git clone https://github.com/openwrt/packages.git --reference packages.git packages-2` or `git clone --verbose https://github.com/openwrt/packages.git packages-2 --reference packages.git` I found out that git-clone actually also can only make a guess what is the url and what not. However, now I'm looking for a way to write something like a submodul for git which will check for a *new* git-config value like user.mirror (or something...) which points to a directory, and will be used to clone from, and in case of 'fetch', 'pull' or 'remote update' update the mirror first, and then the update of the current working directory is gotten from that mirror. (And in case of 'push' the mirror would be updated from the working dir, of course.) I would like to hear some toughs on that, and how I could start to build this submodul, or if someone more talented, then I am, is willed to spent some time on that. If requested/wished I could send a link to the shell-prototype. [0] For a reason I have to do ugly things like `$( eval exec /usr/bin/git clone --mirror $REPO_URL ) 21 /dev/null` cause otherwise in case of just `eval exec` the script stops after execution, and without `eval exec` arguments with spaces will interpreted as seperated arguments, which is no good, because of failing . Thanks for your time! Yours faithfully, Bernd -- To unsubscribe from this list: send the line unsubscribe git in the body of a