Re: Rebase performance
On Thu, Feb 25, 2016 at 5:31 PM, Ævar Arnfjörð Bjarmasonwrote: > On Wed, Feb 24, 2016 at 11:09 PM, Christian Couder > wrote: > > [Resent because I was accidentally in GMail's HTML mode and the ML rejected > it] > >> If there was a config option called maybe "rebase.taskset" or >> "rebase.setcpuaffinity" that could be set to ask the OS for all the >> rebase child processes to be run on the same core, people who run many >> rebases on big repos on big servers as we do at Booking.com could >> easily benefit from a nice speed up. >> >> Technically the option may make git-rebase--am.sh call "git am" using >> "taskset" (if taskset is available on the current OS). > > I think aside from issues with git-apply this would be an interesting > feature to have in git. I.e. some general facility to intercept > commands and inject a prefix command in front of them, whether that's > taskset, nice/ionice, strace etc. I started to take a look at that. It could be done in git.c but would be a bit involved as we would have to check the config option or the env variable that describe the prefix command early and exec the prefixed command while disabling the config option or the env variable. I wonder if people would prefer an env variable or a config option by the way. >> Another possibility would be to libify the "git apply" functionality >> and then to use the libified "git apply" in run_apply() instead of >> launching a separate "git apply" process. One benefit from this is >> that we could probably get rid of the read_cache_from() call at the >> end of run_apply() and this would likely further speed up things. Also >> avoiding to launch separate processes might be a win especially on >> Windows. > > Yeah that should help in this particular case and make the taskset > redundant since the whole sequence of operations would all be on one > core, right? Yeah, all the applying would happen in one process, so hopefully it would not be moved often to another core or cpu. > At the risk of derailing this thread, a thing that would make rebase > even faster I think would be to change it so that instead of applying > a patch at a time to the working tree the whole operation takes place > on temporary trees & commits and then we'll eventually move the branch > pointer to that once it's finished. > > I.e. there's no reason for why a sequence of 1000 patches where a > FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably > slower than applying the same changes with git-fast-import. > > Of course this would require a lot of nuances, e.g. if there's a > conflict we'd need to change the working tree & index as we do now > before continuing. > > Has anyone looked into some advanced refactoring of the rebase process > that would work like this, or has some feedback on why this would be > dumb or that there's a better way to do it? My opinion is that such advanced refactoring can happen on top of libifying the "git apply" functionality, so I started to work on that. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
On Thu, Feb 25, 2016 at 10:42 AM, Duy Nguyenwrote: > On Thu, Feb 25, 2016 at 7:50 AM, Duy Nguyen wrote: >> On Thu, Feb 25, 2016 at 5:09 AM, Christian Couder >> wrote: >>> Another possibility would be to libify the "git apply" functionality >>> and then to use the libified "git apply" in run_apply() instead of >>> launching a separate "git apply" process. One benefit from this is >>> that we could probably get rid of the read_cache_from() call at the >>> end of run_apply() and this would likely further speed up things. Also >>> avoiding to launch separate processes might be a win especially on >>> Windows. >> >> The amount of global variables in apply.c is just scary. Libification >> will need some cleanup first (i'm not against it though). Out of >> curiosity, how long does it take to do "git update-index > path>" on this repo? That would cover read_cache_from() and >> write_cache(). Sure I get: $ time git update-index > While you're measuring, maybe sprinkle some >> trace_performance() in apply.c:apply_patch() to get an idea where time >> is most spent in? Each call to read_cache() takes around 0.07 seconds and each call to check_patch_list() takes around 0.045 seconds, the rest of apply_patch() is not significant. > I did some experiment. Thanks for those experiments. They are very interesting! > The calls from am are basically > > for i in $PATCHES; do git apply --cached $i; git commit; done > > and we can approximate the libification version of that with > > git apply --cached $PATCHES > > (I hacked git-apply to do write-tree after each patch, close enough to > git-commit). > > I tried it on my shallow-deepen series, 26 patches. The "for; do > git-apply;done" command took 0m0.482s (real's time), taskset does not > affect much for me, while the "libification" took just 0m0.105s. > That's a very impressive number to aim for, and git.git is a small > repo. Yeah, from my tests it also looks like there is a lot that could be gained from the libification. Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
On Fri, Feb 26, 2016 at 7:45 AM, Johannes Schindelinwrote: > Hi Matthieu, > > On Thu, 25 Feb 2016, Matthieu Moy wrote: > >> Ævar Arnfjörð Bjarmason writes: >> >> > At the risk of derailing this thread, a thing that would make rebase >> > even faster I think would be to change it so that instead of applying >> > a patch at a time to the working tree the whole operation takes place >> > on temporary trees & commits and then we'll eventually move the branch >> > pointer to that once it's finished. >> > >> > I.e. there's no reason for why a sequence of 1000 patches where a >> > FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably >> > slower than applying the same changes with git-fast-import. >> >> Also, not touching the worktree during rebase would have the advantage >> that if the final result doesn't change a file, we wouldn't need to >> touch this file at all, hence the next "make" (or whatever >> timestamp-using build system the user runs) would consider this file >> unchanged. > > We still have to write all blobs. So I would still expect this to be I/O > bound. But if there is an IO bound process, the only way to make it faster is to reduce its IO, which was the proposal here? I agree that it probably is not enough to shift it from being IO bound to say CPU bounded. Thanks, Stefan > > Ciao, > Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
Hi Matthieu, On Thu, 25 Feb 2016, Matthieu Moy wrote: > Ævar Arnfjörð Bjarmasonwrites: > > > At the risk of derailing this thread, a thing that would make rebase > > even faster I think would be to change it so that instead of applying > > a patch at a time to the working tree the whole operation takes place > > on temporary trees & commits and then we'll eventually move the branch > > pointer to that once it's finished. > > > > I.e. there's no reason for why a sequence of 1000 patches where a > > FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably > > slower than applying the same changes with git-fast-import. > > Also, not touching the worktree during rebase would have the advantage > that if the final result doesn't change a file, we wouldn't need to > touch this file at all, hence the next "make" (or whatever > timestamp-using build system the user runs) would consider this file > unchanged. We still have to write all blobs. So I would still expect this to be I/O bound. Ciao, Dscho
Re: Rebase performance
Ævar Arnfjörð Bjarmasonwrites: > At the risk of derailing this thread, a thing that would make rebase > even faster I think would be to change it so that instead of applying > a patch at a time to the working tree the whole operation takes place > on temporary trees & commits and then we'll eventually move the branch > pointer to that once it's finished. > > I.e. there's no reason for why a sequence of 1000 patches where a > FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably > slower than applying the same changes with git-fast-import. Also, not touching the worktree during rebase would have the advantage that if the final result doesn't change a file, we wouldn't need to touch this file at all, hence the next "make" (or whatever timestamp-using build system the user runs) would consider this file unchanged. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
On Wed, Feb 24, 2016 at 11:09 PM, Christian Couderwrote: [Resent because I was accidentally in GMail's HTML mode and the ML rejected it] > If there was a config option called maybe "rebase.taskset" or > "rebase.setcpuaffinity" that could be set to ask the OS for all the > rebase child processes to be run on the same core, people who run many > rebases on big repos on big servers as we do at Booking.com could > easily benefit from a nice speed up. > > Technically the option may make git-rebase--am.sh call "git am" using > "taskset" (if taskset is available on the current OS). I think aside from issues with git-apply this would be an interesting feature to have in git. I.e. some general facility to intercept commands and inject a prefix command in front of them, whether that's taskset, nice/ionice, strace etc. > Another possibility would be to libify the "git apply" functionality > and then to use the libified "git apply" in run_apply() instead of > launching a separate "git apply" process. One benefit from this is > that we could probably get rid of the read_cache_from() call at the > end of run_apply() and this would likely further speed up things. Also > avoiding to launch separate processes might be a win especially on > Windows. Yeah that should help in this particular case and make the taskset redundant since the whole sequence of operations would all be on one core, right? At the risk of derailing this thread, a thing that would make rebase even faster I think would be to change it so that instead of applying a patch at a time to the working tree the whole operation takes place on temporary trees & commits and then we'll eventually move the branch pointer to that once it's finished. I.e. there's no reason for why a sequence of 1000 patches where a FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably slower than applying the same changes with git-fast-import. Of course this would require a lot of nuances, e.g. if there's a conflict we'd need to change the working tree & index as we do now before continuing. Has anyone looked into some advanced refactoring of the rebase process that would work like this, or has some feedback on why this would be dumb or that there's a better way to do it? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
On Thu, Feb 25, 2016 at 7:50 AM, Duy Nguyenwrote: > On Thu, Feb 25, 2016 at 5:09 AM, Christian Couder > wrote: >> Another possibility would be to libify the "git apply" functionality >> and then to use the libified "git apply" in run_apply() instead of >> launching a separate "git apply" process. One benefit from this is >> that we could probably get rid of the read_cache_from() call at the >> end of run_apply() and this would likely further speed up things. Also >> avoiding to launch separate processes might be a win especially on >> Windows. > > The amount of global variables in apply.c is just scary. Libification > will need some cleanup first (i'm not against it though). Out of > curiosity, how long does it take to do "git update-index path>" on this repo? That would cover read_cache_from() and > write_cache(). While you're measuring, maybe sprinkle some > trace_performance() in apply.c:apply_patch() to get an idea where time > is most spent in? I did some experiment. The calls from am are basically for i in $PATCHES; do git apply --cached $i; git commit; done and we can approximate the libification version of that with git apply --cached $PATCHES (I hacked git-apply to do write-tree after each patch, close enough to git-commit). I tried it on my shallow-deepen series, 26 patches. The "for; do git-apply;done" command took 0m0.482s (real's time), taskset does not affect much for me, while the "libification" took just 0m0.105s. That's a very impressive number to aim for, and git.git is a small repo. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
Duy Nguyenwrites: > On Thu, Feb 25, 2016 at 5:09 AM, Christian Couder > wrote: >> Another possibility would be to libify the "git apply" functionality >> and then to use the libified "git apply" in run_apply() instead of >> launching a separate "git apply" process. One benefit from this is >> that we could probably get rid of the read_cache_from() call at the >> end of run_apply() and this would likely further speed up things. Also >> avoiding to launch separate processes might be a win especially on >> Windows. > > The amount of global variables in apply.c is just scary. Libification > will need some cleanup first (i'm not against it though). The global variables do not scare me. You can just throw them into a single "apply_state" structure and pass it around the callchain just fine--the helper functions in the file all take only a small number of parameters, and a new parameter that is a pointer to the state structure that consistently comes at the beginning would not make things harder to read or maintain. What does scare me (and what you should be scared) more is the way the current code handles erroneous input. It was one of the programs written in early days with "just do one thing well and exit, the larger program structure will be scripted around us" mentality and liberally die() without cleaning things up. I do agree with others that libification of it is a very good thing, but the second step (the first step is the easier "global to state structure" one, which should be more or less mechanical) towards it is to design how the errors are handled and resources are released. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
On Thu, Feb 25, 2016 at 10:02 AM, Junio C Hamanowrote: > Duy Nguyen writes: > >> On Thu, Feb 25, 2016 at 5:09 AM, Christian Couder >> wrote: >>> Another possibility would be to libify the "git apply" functionality >>> and then to use the libified "git apply" in run_apply() instead of >>> launching a separate "git apply" process. One benefit from this is >>> that we could probably get rid of the read_cache_from() call at the >>> end of run_apply() and this would likely further speed up things. Also >>> avoiding to launch separate processes might be a win especially on >>> Windows. >> >> The amount of global variables in apply.c is just scary. Libification >> will need some cleanup first (i'm not against it though). > > The global variables do not scare me. You can just throw them into > a single "apply_state" structure and pass it around the callchain > just fine--the helper functions in the file all take only a small > number of parameters, and a new parameter that is a pointer to the > state structure that consistently comes at the beginning would not > make things harder to read or maintain. There are a bunch of shadow variables in this file. If you are not careful, it's easy to mistake local var "x" with global "x" and replace it with global_state->x. > What does scare me (and what you should be scared) more is the way > the current code handles erroneous input. It was one of the > programs written in early days with "just do one thing well and > exit, the larger program structure will be scripted around us" > mentality and liberally die() without cleaning things up. I do > agree with others that libification of it is a very good thing, but > the second step (the first step is the easier "global to state > structure" one, which should be more or less mechanical) towards it > is to design how the errors are handled and resources are released. Yeah.. thorough study of apply code may be needed before anybody does any surgery in this code -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
On Thu, Feb 25, 2016 at 5:09 AM, Christian Couderwrote: > Another possibility would be to libify the "git apply" functionality > and then to use the libified "git apply" in run_apply() instead of > launching a separate "git apply" process. One benefit from this is > that we could probably get rid of the read_cache_from() call at the > end of run_apply() and this would likely further speed up things. Also > avoiding to launch separate processes might be a win especially on > Windows. The amount of global variables in apply.c is just scary. Libification will need some cleanup first (i'm not against it though). Out of curiosity, how long does it take to do "git update-index " on this repo? That would cover read_cache_from() and write_cache(). While you're measuring, maybe sprinkle some trace_performance() in apply.c:apply_patch() to get an idea where time is most spent in? This is a big guess, but if we spend more time parsing/validating the patch than applying, then that part could be parallelized, we only need to apply patches in sequence. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
On Wed, Feb 24, 2016 at 4:15 PM, Jacob Kellerwrote: > On Wed, Feb 24, 2016 at 2:09 PM, Christian Couder > wrote: >> Hi, >> Another possibility would be to libify the "git apply" functionality >> and then to use the libified "git apply" in run_apply() instead of >> launching a separate "git apply" process. One benefit from this is >> that we could probably get rid of the read_cache_from() call at the >> end of run_apply() and this would likely further speed up things. Also >> avoiding to launch separate processes might be a win especially on >> Windows. >> > > This is the route I would go, since the addition of a taskset option > seems pretty difficult to get correct, and may not be portable. This > above solution likely improves more in general, and is more portable. > Not exactly sure how easy it would be to "libify" the required bits of > apply, however.. it may be that this is already available as well but > we just didn't go that route during the port of git-am into C code. > > Regards, > Jake IIRC Junio started libifying am after Paul Tan rewrote it in C, See origin/jc/am-mailinfo-direct (tip at 4b98bae2cbc6b). That part however only libifyies the mailinfo which is used by apply (one layer below), so I am not sure if the run_apply has been attempted to libify. I would also encourage to rather try to not call out to a child (libifying) instead of adding the taskset hack for servers. Thanks, Stefan > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
On Wed, Feb 24, 2016 at 2:09 PM, Christian Couderwrote: > Hi, > Another possibility would be to libify the "git apply" functionality > and then to use the libified "git apply" in run_apply() instead of > launching a separate "git apply" process. One benefit from this is > that we could probably get rid of the read_cache_from() call at the > end of run_apply() and this would likely further speed up things. Also > avoiding to launch separate processes might be a win especially on > Windows. > This is the route I would go, since the addition of a taskset option seems pretty difficult to get correct, and may not be portable. This above solution likely improves more in general, and is more portable. Not exactly sure how easy it would be to "libify" the required bits of apply, however.. it may be that this is already available as well but we just didn't go that route during the port of git-am into C code. Regards, Jake -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rebase performance
Hi, Using GIT_TRACE_PERFORMANCE it looks like a lot of time in a regular rebase is spent in run_apply() in builtin/am.c. This function first sets up a 'struct child_process cp' to launch "git apply" on a patch and then uses run_command() to actually launch the "git apply". Then this function calls discard_cache() and read_cache_from() to get the index created by the "git apply". On a Linux server with many CPUs and many cores on each CPU, it is strange because the same rebase of 13 commits on a big repo is significantly slower than on a laptop (typically around 9 seconds versus 6 seconds). Both the server and the laptop have that has SSD storage. It appears that the server is trying to run the "git apply" processes on different cores or cpus perhaps to try to spread the load on many of its cores. Anyway adding something like "taskset -c 7" in front of the "git rebase..." command, when launching it on the server, speeds it up, so that it takes around the same amount of time as it does on the laptop (6 seconds). "taskset -c 7" is just asking Linux to run a process and its children on core number 7, and it appears that doing that results in a much better cpu (or core) cache usage which explains the speed up. If there was a config option called maybe "rebase.taskset" or "rebase.setcpuaffinity" that could be set to ask the OS for all the rebase child processes to be run on the same core, people who run many rebases on big repos on big servers as we do at Booking.com could easily benefit from a nice speed up. Technically the option may make git-rebase--am.sh call "git am" using "taskset" (if taskset is available on the current OS). Another possibility would be to libify the "git apply" functionality and then to use the libified "git apply" in run_apply() instead of launching a separate "git apply" process. One benefit from this is that we could probably get rid of the read_cache_from() call at the end of run_apply() and this would likely further speed up things. Also avoiding to launch separate processes might be a win especially on Windows. Suggestions? Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html